Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Writing Arrays

Array write methods are separated based on two storage traits:

  • [Async]WritableStorageTraits methods perform write operations exclusively, and
  • [Async]ReadableWritableStorageTraits methods perform write operations and may perform read operations.

Warning

Misuse of [Async]ReadableWritableStorageTraits Array methods can result in data loss due to partial writes being lost. zarrs does not currently offer a “synchronisation” API for locking chunks or array subsets.

Write-Only Methods

The [Async]WritableStorageTraits grouped methods exclusively perform write operations:

Store a Chunk

#![allow(unused)]
fn main() {
extern crate zarrs;
extern crate ndarray;
use zarrs::array::{Array, ArrayBuilder, DataType};
use ndarray::ArrayD;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
    .build(store.clone(), "/array")?;
let chunk_indices: Vec<u64> = vec![1, 2];
let chunk_bytes: Vec<u8> = vec![0u8; 4 * 4 * 4]; // 4x4 chunk of f32
array.store_chunk(&chunk_indices, chunk_bytes)?;
let chunk_elements: Vec<f32> = vec![1.0; 4 * 4];
array.store_chunk_elements(&chunk_indices, &chunk_elements)?;
let chunk_array = ArrayD::<f32>::from_shape_vec(
    vec![4, 4], // chunk shape
    chunk_elements
)?;
array.store_chunk_ndarray(&chunk_indices, chunk_array)?;
Ok::<_, Box<dyn std::error::Error>>(())
}

Tip

If a chunk is written more than once, its element values depend on whichever operation wrote to the chunk last.

Store Chunks

store_chunks (and variants) will dissasemble the input into chunks, and encode and store them in parallel.

#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
    .build(store.clone(), "/array")?;
let chunks = ArraySubset::new_with_ranges(&[0..2, 0..2]);
let chunks_bytes: Vec<u8> = vec![0u8; 2 * 2 * 4 * 4 * 4]; // 2x2 chunks of 4x4 f32
array.store_chunks(&chunks, chunks_bytes)?;
// store_chunks_elements, store_chunks_ndarray...
Ok::<_, Box<dyn std::error::Error>>(())
}

Store an Encoded Chunk

An encoded chunk can be stored directly with store_encoded_chunk, bypassing the zarrs codec pipeline.

#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
    .build(store.clone(), "/array")?;
let chunk_indices: Vec<u64> = vec![1, 2];
let encoded_chunk_bytes: Vec<u8> = vec![0u8; 4 * 4 * 4]; // pre-encoded bytes
// SAFETY: the encoded bytes are valid for the chunk (bytes codec only defaulted to native endianness)
unsafe { array.store_encoded_chunk(&chunk_indices, encoded_chunk_bytes.into())? };
Ok::<_, Box<dyn std::error::Error>>(())
}

Tip

Currently, the most performant path for uncompressed writing on Linux is to reuse page aligned buffers via store_encoded_chunk with direct IO enabled for the FilesystemStore. See zarrs GitHub issue #58 for a discussion of this method.

Read-Write Methods

The [Async]ReadableWritableStorageTraits grouped methods perform write operations and may perform read operations:

These methods perform partial encoding. Codecs that do not support true partial encoding will retrieve chunks in their entirety, then decode, update, and store them.

It is the responsibility of zarrs consumers to ensure:

  • store_chunk_subset is not called concurrently on the same chunk, and
  • store_array_subset is not called concurrently on array subsets sharing chunks.

Partial writes to a chunk may be lost if these rules are not respected.

Store a Chunk Subset

#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![16, 8], vec![4, 4], DataType::Float32, 0.0f32)
    .build(store.clone(), "/array")?;
array.store_chunk_subset_elements::<f32>(
    // chunk indices
    &[3, 1],
    // subset within chunk
    &ArraySubset::new_with_ranges(&[1..2, 0..4]),
    // subset elements
    &[-4.0; 4],
)?;
Ok::<_, Box<dyn std::error::Error>>(())
}

Store an Array Subset

#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
    .build(store.clone(), "/array")?;
array.store_array_subset_elements::<f32>(
    &ArraySubset::new_with_ranges(&[0..8, 6..7]),
    &[123.0; 8],
)?;
Ok::<_, Box<dyn std::error::Error>>(())
}

Partial Encoding with the Sharding Codec

In zarrs, the sharding_indexed codec is the only codec that supports real partial encoding if the Experimental Partial Encoding option is enabled. If disabled (default), chunks are always fully decoded and updated before being stored.

To enable partial encoding:

#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::codec::CodecOptions;
// Set experimental_partial_encoding to true by default
zarrs::config::global_config_mut().set_experimental_partial_encoding(true);

// Manually set experimental_partial_encoding to true for an operation
let mut options = CodecOptions::default();
options.set_experimental_partial_encoding(true);
}

Warning

The asynchronous API does not yet support partial encoding.

This enables Array::store_array_subset, Array::store_chunk_subset, Array::partial_encoder, and variants to use partial encoding for sharded arrays. Inner chunks can be written in an append-only fashion without reading previously written inner chunks (if their elements do not require updating).

Warning

Since partial encoding is append-only for sharded arrays, updating a chunk does not remove the originally encoded data. Make sure to align writes to the inner chunks, otherwise your shards will be much larger than they should be.