zarrs_reencode

Reencode/rechunk a Zarr V2/V3 to a Zarr v3 array.

Installation

zarrs_reencode packaged by default with zarrs_tools and requires no extra features.

Prebuilt Binaries

# Requires cargo-binstall https://github.com/cargo-bins/cargo-binstall
cargo binstall zarrs_tools

From Source

cargo install zarrs_tools

Usage

zarrs_reencode --help
Reencode a Zarr array

Usage: zarrs_reencode [OPTIONS] <PATH_IN> <PATH_OUT>

Arguments:
  <PATH_IN>
          The zarr array input path or URL

  <PATH_OUT>
          The zarr array output directory

Options:
  -d, --data-type <DATA_TYPE>
          The data type as a string
          
          Valid data types:
            - bool
            - int8, int16, int32, int64
            - uint8, uint16, uint32, uint64
            - float16, float32, float64, bfloat16
            - complex64, complex 128
            - r* (raw bits, where * is a multiple of 8)

  -f, --fill-value <FILL_VALUE>
          Fill value. See <https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#fill-value>
          
          The fill value must be compatible with the data type.
          
          Examples:
            int/uint: 0 100 -100
            float: 0.0 "NaN" "Infinity" "-Infinity"
            r*: "[0, 255]"

      --separator <SEPARATOR>
          The chunk key encoding separator. Either . or /

  -c, --chunk-shape <CHUNK_SHAPE>
          Chunk shape. A comma separated list of the chunk size along each array dimension.
          
          If any dimension has size zero, it will be set to match the array shape.

  -s, --shard-shape <SHARD_SHAPE>
          Shard shape. A comma separated list of the shard size along each array dimension.
          
          If specified, the array is encoded using the sharding codec.
          If any dimension has size zero, it will be set to match the array shape.

      --array-to-array-codecs <ARRAY_TO_ARRAY_CODECS>
          Array to array codecs.
          
          JSON holding an array of array to array codec metadata.
          
          Examples:
            '[ { "name": "transpose", "configuration": { "order": [0, 2, 1] } } ]'
            '[ { "name": "bitround", "configuration": { "keepbits": 9 } } ]'

      --array-to-bytes-codec <ARRAY_TO_BYTES_CODEC>
          Array to bytes codec.
          
          JSON holding array to bytes codec metadata.
          
          Examples:
            '{ "name": "bytes", "configuration": { "endian": "little" } }'
            '{ "name": "pcodec", "configuration": { "level": 12 } }'
            '{ "name": "zfp", "configuration": { "mode": "fixedprecision", "precision": 19 } }'

      --bytes-to-bytes-codecs <BYTES_TO_BYTES_CODECS>
          Bytes to bytes codecs.
          
          JSON holding an array bytes to bytes codec configurations.
          
          Examples:
            '[ { "name": "blosc", "configuration": { "cname": "blosclz", "clevel": 9, "shuffle": "bitshuffle", "typesize": 2, "blocksize": 0 } } ]'
            '[ { "name": "bz2", "configuration": { "level": 9 } } ]'
            '[ { "name": "crc32c" } ]'
            '[ { "name": "gzip", "configuration": { "level": 9 } } ]'
            '[ { "name": "zstd", "configuration": { "level": 22, "checksum": false } } ]'

      --dimension-names <DIMENSION_NAMES>
          Dimension names (optional). Comma separated.

      --attributes <ATTRIBUTES>
          Attributes (optional).
          
          JSON holding array attributes.

      --attributes-append <ATTRIBUTES_APPEND>
          Attributes to append (optional).
          
          JSON holding array attributes.

      --concurrent-chunks <CONCURRENT_CHUNKS>
          Number of concurrent chunks

      --ignore-checksums
          Ignore checksums.
          
          If set, checksum validation in codecs (e.g. crc32c) is skipped.

      --validate
          Validate written data

  -v, --verbose
          Print verbose information, such as the array header

      --cache-size <CACHE_SIZE>
          An optional chunk cache size (in bytes)

      --cache-chunks <CACHE_CHUNKS>
          An optional chunk cache size (in chunks)

      --cache-size-thread <CACHE_SIZE_THREAD>
          An optional per-thread chunk cache size (in bytes)

      --cache-chunks-thread <CACHE_CHUNKS_THREAD>
          An optional per-thread chunk cache size (in chunks)

      --write-shape <WRITE_SHAPE>
          Write shape (optional). A comma separated list of the write size along each array dimension.
          
          Use this parameter to incrementally write shards in batches of chunks of the specified write shape.
          The write shape defaults to the shard shape for sharded arrays.
          This parameter is ignored for unsharded arrays (the write shape is the chunk shape).
          
          Prefer to set the write shape to an integer multiple of the chunk shape to avoid unnecessary reads.

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Example

Reencode array.zarr (uint16) with:

  • a chunk shape of [32, 32, 32],
  • a shard shape of [128, 128, 0]
    • the last dimension of the shard shape will match the array shape to the nearest multiple of the chunk shape
  • level 9 blosclz compression with bitshuffling
  • an input chunk cache with a size of 1GB
zarrs_reencode \
--cache-size 1000000000 \
--chunk-shape 32,32,32 \
--shard-shape 128,128,0 \
--bytes-to-bytes-codecs '[ { "name": "blosc", "configuration": { "cname": "blosclz", "clevel": 9, "shuffle": "bitshuffle", "typesize": 2, "blocksize": 0 } } ]' \
array.zarr array_reencode.zarr