zarrs_ncvar2zarr

Convert a NetCDF variable to a Zarr v3 array. Multi-file variables are supported.

Installation

zarrs_ncvar2zarr is installed with the ncvar2zarr feature of zarrs_tools.

Prebuilt Binaries

# Requires cargo-binstall https://github.com/cargo-bins/cargo-binstall
cargo binstall zarrs_tools

From Source

cargo install --features=ncvar2zarr zarrs_tools

Usage

zarrs_ncvar2zarr --help
Convert a netCDF variable to a Zarr V3 array

Usage: zarrs_ncvar2zarr [OPTIONS] --fill-value <FILL_VALUE> --chunk-shape <CHUNK_SHAPE> <INPUT> <VARIABLE> <OUT>

Arguments:
  <INPUT>
          The path to a netCDF file or a directory of netcdf files

  <VARIABLE>
          The name of the netCDF variable

  <OUT>
          The output directory for the zarr array

Options:
  -f, --fill-value <FILL_VALUE>
          Fill value. See https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#fill-value
          
          The fill value must be compatible with the data type.
          
          Examples:
            int/uint: 0 100 -100
            float: 0.0 "NaN" "Infinity" "-Infinity"
            r*: "[0, 255]"

      --separator <SEPARATOR>
          The chunk key encoding separator. Either . or /
          
          [default: /]

  -c, --chunk-shape <CHUNK_SHAPE>
          Chunk shape. A comma separated list of the chunk size along each array dimension.
          
          If any dimension has size zero, it will be set to match the array shape.

  -s, --shard-shape <SHARD_SHAPE>
          Shard shape (optional). A comma separated list of the shard size along each array dimension.
          
          If specified, the array is encoded using the sharding codec.
          If any dimension has size zero, it will be set to match the array shape.

      --array-to-array-codecs <ARRAY_TO_ARRAY_CODECS>
          Array to array codecs (optional).
          
          JSON holding an array of array to array codec metadata.
          
          Examples:
            '[ { "name": "transpose", "configuration": { "order": [0, 2, 1] } } ]'
            '[ { "name": "bitround", "configuration": { "keepbits": 9 } } ]'

      --array-to-bytes-codec <ARRAY_TO_BYTES_CODEC>
          Array to bytes codec (optional).
          
          JSON holding array to bytes codec metadata.
          If unspecified, this defaults to the `bytes` codec.
          
          The sharding codec can be used by setting `shard_shape`, but this can also be done explicitly here.
          
          Examples:
            '{ "name": "bytes", "configuration": { "endian": "little" } }'
            '{ "name": "pcodec", "configuration": { "level": 12 } }'
            '{ "name": "zfp", "configuration": { "mode": "fixedprecision", "precision": 19 } }'

      --bytes-to-bytes-codecs <BYTES_TO_BYTES_CODECS>
          Bytes to bytes codecs (optional).
          
          JSON holding an array of bytes to bytes codec configurations.
          
          Examples:
            '[ { "name": "blosc", "configuration": { "cname": "blosclz", "clevel": 9, "shuffle": "bitshuffle", "typesize": 2, "blocksize": 0 } } ]'
            '[ { "name": "bz2", "configuration": { "level": 9 } } ]'
            '[ { "name": "crc32c" ]'
            '[ { "name": "gzip", "configuration": { "level": 9 } } ]'
            '[ { "name": "zstd", "configuration": { "level": 22, "checksum": false } } ]'

      --attributes <ATTRIBUTES>
          Attributes (optional).
          
          JSON holding array attributes.

      --concurrent-chunks <CONCURRENT_CHUNKS>
          Number of concurrent chunks

      --memory-test
          Write to memory

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Example

tomoLoRes_nc is a directory of netCDF files, each containing a "tomo" 3D variable, which has been split along dimension 0

  • (depth, height, width) = (1209, 480, 480)
  • data type = uint16
tree --du -h tomoLoRes_nc
[532M]  tomoLoRes_nc
├── [528M]  block00000000.nc
└── [4.0M]  block00000001.nc

With the following command, the image is encoded as a zarr array with the sharding codec with a shard shape of (128, 480, 480)

  • inner chunks in each shard have a chunk shape of (32, 32, 32)
  • inner chunks are compressed using the blosc codec
zarrs_ncvar2zarr \
--fill-value -32768 \
--separator '.' \
--chunk-shape 32,32,32 \
--shard-shape 128,0,0 \
--bytes-to-bytes-codecs '[ { "name": "blosc", "configuration": { "cname": "blosclz", "clevel": 9, "shuffle": "bitshuffle", "typesize": 2, "blocksize": 0 } } ]' \
tomoLoRes_nc \
tomo \
tomoLoRes_nc.zarr
tree --du -h tomoLoRes_nc.zarr
[329M]  tomoLoRes_nc.zarr
├── [ 30M]  c.0.0.0
├── [ 35M]  c.1.0.0
├── [ 36M]  c.2.0.0
├── [ 36M]  c.3.0.0
├── [ 36M]  c.4.0.0
├── [ 36M]  c.5.0.0
├── [ 36M]  c.6.0.0
├── [ 36M]  c.7.0.0
├── [ 35M]  c.8.0.0
├── [ 14M]  c.9.0.0
└── [1.5K]  zarr.json