Writing Arrays
Array write methods are separated based on two storage traits:
[Async]WritableStorageTraitsmethods perform write operations exclusively, and[Async]ReadableWritableStorageTraitsmethods perform write operations and may perform read operations.
Warning
Misuse of
[Async]ReadableWritableStorageTraitsArraymethods can result in data loss due to partial writes being lost.zarrsdoes not currently offer a “synchronisation” API for locking chunks or array subsets.
Write-Only Methods
The [Async]WritableStorageTraits grouped methods exclusively perform write operations:
Store a Chunk
#![allow(unused)]
fn main() {
extern crate zarrs;
extern crate ndarray;
use zarrs::array::{Array, ArrayBuilder, DataType};
use ndarray::ArrayD;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
.build(store.clone(), "/array")?;
let chunk_indices: Vec<u64> = vec![1, 2];
let chunk_bytes: Vec<u8> = vec![0u8; 4 * 4 * 4]; // 4x4 chunk of f32
array.store_chunk(&chunk_indices, chunk_bytes)?;
let chunk_elements: Vec<f32> = vec![1.0; 4 * 4];
array.store_chunk_elements(&chunk_indices, &chunk_elements)?;
let chunk_array = ArrayD::<f32>::from_shape_vec(
vec![4, 4], // chunk shape
chunk_elements
)?;
array.store_chunk_ndarray(&chunk_indices, chunk_array)?;
Ok::<_, Box<dyn std::error::Error>>(())
}
Tip
If a chunk is written more than once, its element values depend on whichever operation wrote to the chunk last.
Store Chunks
store_chunks (and variants) will dissasemble the input into chunks, and encode and store them in parallel.
#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
.build(store.clone(), "/array")?;
let chunks = ArraySubset::new_with_ranges(&[0..2, 0..2]);
let chunks_bytes: Vec<u8> = vec![0u8; 2 * 2 * 4 * 4 * 4]; // 2x2 chunks of 4x4 f32
array.store_chunks(&chunks, chunks_bytes)?;
// store_chunks_elements, store_chunks_ndarray...
Ok::<_, Box<dyn std::error::Error>>(())
}
Store an Encoded Chunk
An encoded chunk can be stored directly with store_encoded_chunk, bypassing the zarrs codec pipeline.
#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
.build(store.clone(), "/array")?;
let chunk_indices: Vec<u64> = vec![1, 2];
let encoded_chunk_bytes: Vec<u8> = vec![0u8; 4 * 4 * 4]; // pre-encoded bytes
// SAFETY: the encoded bytes are valid for the chunk (bytes codec only defaulted to native endianness)
unsafe { array.store_encoded_chunk(&chunk_indices, encoded_chunk_bytes.into())? };
Ok::<_, Box<dyn std::error::Error>>(())
}
Tip
Currently, the most performant path for uncompressed writing on Linux is to reuse page aligned buffers via
store_encoded_chunkwith direct IO enabled for theFilesystemStore. See zarrs GitHub issue #58 for a discussion of this method.
Read-Write Methods
The [Async]ReadableWritableStorageTraits grouped methods perform write operations and may perform read operations:
These methods perform partial encoding. Codecs that do not support true partial encoding will retrieve chunks in their entirety, then decode, update, and store them.
It is the responsibility of zarrs consumers to ensure:
store_chunk_subsetis not called concurrently on the same chunk, andstore_array_subsetis not called concurrently on array subsets sharing chunks.
Partial writes to a chunk may be lost if these rules are not respected.
Store a Chunk Subset
#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![16, 8], vec![4, 4], DataType::Float32, 0.0f32)
.build(store.clone(), "/array")?;
array.store_chunk_subset_elements::<f32>(
// chunk indices
&[3, 1],
// subset within chunk
&ArraySubset::new_with_ranges(&[1..2, 0..4]),
// subset elements
&[-4.0; 4],
)?;
Ok::<_, Box<dyn std::error::Error>>(())
}
Store an Array Subset
#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::{Array, ArrayBuilder, DataType};
use zarrs::array_subset::ArraySubset;
let store = std::sync::Arc::new(zarrs::storage::store::MemoryStore::new());
let array = ArrayBuilder::new(vec![8, 8], vec![4, 4], DataType::Float32, 0.0f32)
.build(store.clone(), "/array")?;
array.store_array_subset_elements::<f32>(
&ArraySubset::new_with_ranges(&[0..8, 6..7]),
&[123.0; 8],
)?;
Ok::<_, Box<dyn std::error::Error>>(())
}
Partial Encoding with the Sharding Codec
In zarrs, the sharding_indexed codec is the only codec that supports real partial encoding if the Experimental Partial Encoding option is enabled.
If disabled (default), chunks are always fully decoded and updated before being stored.
To enable partial encoding:
#![allow(unused)]
fn main() {
extern crate zarrs;
use zarrs::array::codec::CodecOptions;
// Set experimental_partial_encoding to true by default
zarrs::config::global_config_mut().set_experimental_partial_encoding(true);
// Manually set experimental_partial_encoding to true for an operation
let mut options = CodecOptions::default();
options.set_experimental_partial_encoding(true);
}
Warning
The asynchronous API does not yet support partial encoding.
This enables Array::store_array_subset, Array::store_chunk_subset, Array::partial_encoder, and variants to use partial encoding for sharded arrays.
Inner chunks can be written in an append-only fashion without reading previously written inner chunks (if their elements do not require updating).
Warning
Since partial encoding is append-only for sharded arrays, updating a chunk does not remove the originally encoded data. Make sure to align writes to the inner chunks, otherwise your shards will be much larger than they should be.