Zarr Stores

A Zarr store is a system that can be used to store and retrieve data from a Zarr hierarchy. For example: a filesystem, HTTP server, FTP server, Amazon S3 bucket, etc. A store implements a key/value store interface for storing, retrieving, listing, and erasing keys.

The Zarr V3 storage API is detailed here in the Zarr V3 specification.

The Sync and Async API

Zarr Groups and Arrays are the core components of a Zarr hierarchy. In zarrs, both structures have both a synchronous and asynchronous API. The applicable API depends on the storage that the group or array is created with.

Async API methods typically have an async_ prefix. In subsequent chapters, async API method calls are shown commented out below their sync equivalent.

warning

The async API is still considered experimental, and it requires the async feature.

Synchronous Stores

Memory

zarrs_storage_repo zarrs_storage_ver zarrs_storage_doc

MemoryStore is a synchronous in-memory store available in the zarrs_storage crate (re-exported as zarrs::storage).

#![allow(unused)]
fn main() {
use zarrs::storage::ReadableWritableListableStorage;
use zarrs::storage::store::MemoryStore;

let store: ReadableWritableListableStorage = Arc::new(MemoryStore::new());
}

Note that in-memory stores do not persist data, and they are not suited to distributed (i.e. multi-process) usage.

Filesystem

zarrs_filesystem_repo zarrs_filesystem_ver zarrs_filesystem_doc

FilesystemStore is a synchronous filesystem store available in the zarrs_filesystem crate (re-exported as zarrs::filesystem with the filesystem feature).

#![allow(unused)]
fn main() {
use zarrs::storage::ReadableWritableListableStorage;
use zarrs::filesystem::FilesystemStore;

let base_path = "/";
let store: ReadableWritableListableStorage =
    Arc::new(FilesystemStore::new(base_path));
}

The base path is the root of the filesystem store. Node paths are relative to the base path.

The filesystem store also has a new_with_options constructor. Currently the only option available for filesystem stores is whether or not to enable direct I/O on Linux.

HTTP

zarrs_http_repo zarrs_http_ver zarrs_http_doc

HTTPStore is a read-only synchronous HTTP store available in the zarrs_http crate.

#![allow(unused)]
fn main() {
use zarrs::storage::ReadableStorage;
use zarrs_http::HTTPStore;

let http_store: ReadableStorage = Arc::new(HTTPStore::new("http://...")?);
}

note

The HTTP stores provided by object_store and opendal (see below) provide a more comprehensive feature set.

Asynchronous Stores

object_store

zarrs_object_store_repo zarrs_object_store_ver zarrs_object_store_doc

The object_store crate is an async object store library for interacting with object stores. Supported object stores include:

zarrs_object_store::AsyncObjectStore wraps object_store::ObjectStore stores.

#![allow(unused)]
fn main() {
use zarrs::storage::::AsyncReadableStorage;
use zarrs_object_store::AsyncObjectStore;

let options = object_store::ClientOptions::new().with_allow_http(true);
let store = object_store::http::HttpBuilder::new()
    .with_url("http://...")
    .with_client_options(options)
    .build()?;
let store: AsyncReadableStorage = Arc::new(AsyncObjectStore::new(store));
}

OpenDAL

zarrs_opendal_repo zarrs_opendal_ver zarrs_opendal_doc

The opendal crate offers a unified data access layer, empowering users to seamlessly and efficiently retrieve data from diverse storage services. It supports a huge range of services and layers to extend their behaviour.

zarrs_object_store::AsyncOpendalStore wraps opendal::Operator.

#![allow(unused)]
fn main() {
use zarrs::storage::::AsyncReadableStorage;
use zarrs_opendal::AsyncOpendalStore;

let builder = opendal::services::Http::default().endpoint("http://...");
let operator = opendal::Operator::new(builder)?.finish();
let store: AsyncReadableStorage =
    Arc::new(AsyncOpendalStore::new(operator));
}

note

Some opendal stores can also be used in a synchronous context with zarrs_object_store::OpendalStore, which wraps opendal::BlockingOperator.

Icechunk

icechunk is a transactional storage engine for Zarr designed for use on cloud object storage.

#![allow(unused)]
fn main() {
// Create an icechunk store
let storage = Arc::new(icechunk::ObjectStorage::new_in_memory_store(None));
let icechunk_store = icechunk::Store::new_from_storage(storage).await?;
let store =
    Arc::new(zarrs_icechunk::AsyncIcechunkStore::new(icechunk_store));

// Do some array/metadata manipulation with zarrs, then commit a snapshot
let snapshot0 = store.commit("Initial commit").await?;

// Do some more array/metadata manipulation, then commit another snapshot
let snapshot1 = store.commit("Update data").await?;

// Checkout the first snapshot
store.checkout(icechunk::zarr::VersionInfo::SnapshotId(snapshot0)).await?;
}

Storage Adapters

Storage adapters can be layered on top of stores to change their functionality.

The below storage adapters are all available in the zarrs::storage submodule (via the zarrs_storage crate).

Async to Sync

Asynchronous stores can be used in a synchronous context with the AsyncToSyncStorageAdapter.

The AsyncToSyncBlockOn trait must be implemented for a runtime or runtime handle in order to block on futures. See the below tokio example:

#![allow(unused)]
fn main() {
use zarrs::storage::storage_adapter::async_to_sync::AsyncToSyncBlockOn;

struct TokioBlockOn(tokio::runtime::Runtime); // or handle

impl AsyncToSyncBlockOn for TokioBlockOn {
    fn block_on<F: core::future::Future>(&self, future: F) -> F::Output {
        self.0.block_on(future)
    }
}
}
#![allow(unused)]
fn main() {
use zarrs::storage::::{AsyncReadableStorage, ReadableStorage};

// Create an async store as normal
let builder = opendal::services::Http::default().endpoint(path);
let operator = opendal::Operator::new(builder)?.finish();
let storage: AsyncReadableStorage =
    Arc::new(AsyncOpendalStore::new(operator));

// Create a tokio runtime and adapt the store to sync
let block_on = TokioBlockOn(tokio::runtime::Runtime::new()?);
let store: ReadableStorage =
    Arc::new(AsyncToSyncStorageAdapter::new(storage, block_on))
}

warning

Many async stores are not runtime-agnostic (i.e. only support tokio).

Usage Log

The UsageLogStorageAdapter logs storage method calls.

It is intended to aid in debugging and optimising performance by revealing storage access patterns.

#![allow(unused)]
fn main() {
let store = Arc::new(MemoryStore::new());
let log_writer = Arc::new(Mutex::new(
    // std::io::BufWriter::new(
    std::io::stdout(),
    //    )
));
let store = Arc::new(UsageLogStorageAdapter::new(store, log_writer, || {
    chrono::Utc::now().format("[%T%.3f] ").to_string()
}));
}

Performance Metrics

The PerformanceMetricsStorageAdapter accumulates metrics, such as bytes read and written.

It is intended to aid in testing by allowing the application to validate that metrics (e.g., bytes read/written, total read/write operations) match expected values for specific operations.

#![allow(unused)]
fn main() {
let store = Arc::new(MemoryStore::new());
let store = Arc::new(PerformanceMetricsStorageAdapter::new(store));

assert_eq!(store.bytes_read(), ...);
}