Zarr Stores
A Zarr store is a system that can be used to store and retrieve data from a Zarr hierarchy. For example: a filesystem, HTTP server, FTP server, Amazon S3 bucket, etc. A store implements a key/value store interface for storing, retrieving, listing, and erasing keys.
The Zarr V3 storage API is detailed here in the Zarr V3 specification.
The Sync and Async API
Zarr Group
s and Array
s are the core components of a Zarr hierarchy.
In zarrs
, both structures have both a synchronous and asynchronous API.
The applicable API depends on the storage that the group or array is created with.
Async API methods typically have an async_
prefix.
In subsequent chapters, async API method calls are shown commented out below their sync equivalent.
warning
The async API is still considered experimental, and it requires the async
feature.
Synchronous Stores
Memory
MemoryStore
is a synchronous in-memory store available in the zarrs_storage
crate (re-exported as zarrs::storage
).
#![allow(unused)] fn main() { use zarrs::storage::ReadableWritableListableStorage; use zarrs::storage::store::MemoryStore; let store: ReadableWritableListableStorage = Arc::new(MemoryStore::new()); }
Note that in-memory stores do not persist data, and they are not suited to distributed (i.e. multi-process) usage.
Filesystem
FilesystemStore
is a synchronous filesystem store available in the zarrs_filesystem
crate (re-exported as zarrs::filesystem
with the filesystem
feature).
#![allow(unused)] fn main() { use zarrs::storage::ReadableWritableListableStorage; use zarrs::filesystem::FilesystemStore; let base_path = "/"; let store: ReadableWritableListableStorage = Arc::new(FilesystemStore::new(base_path)); }
The base path is the root of the filesystem store. Node paths are relative to the base path.
The filesystem store also has a new_with_options
constructor.
Currently the only option available for filesystem stores is whether or not to enable direct I/O on Linux.
HTTP
HTTPStore
is a read-only synchronous HTTP store available in the zarrs_http
crate.
#![allow(unused)] fn main() { use zarrs::storage::ReadableStorage; use zarrs_http::HTTPStore; let http_store: ReadableStorage = Arc::new(HTTPStore::new("http://...")?); }
note
The HTTP stores provided by object_store
and opendal
(see below) provide a more comprehensive feature set.
Asynchronous Stores
object_store
The object_store
crate is an async
object store library for interacting with object stores.
Supported object stores include:
- AWS S3
- Azure Blob Storage
- Google Cloud Storage
- Local files
- Memory
- HTTP/WebDAV Storage
- Custom implementations
zarrs_object_store::AsyncObjectStore
wraps object_store::ObjectStore
stores.
#![allow(unused)] fn main() { use zarrs::storage::::AsyncReadableStorage; use zarrs_object_store::AsyncObjectStore; let options = object_store::ClientOptions::new().with_allow_http(true); let store = object_store::http::HttpBuilder::new() .with_url("http://...") .with_client_options(options) .build()?; let store: AsyncReadableStorage = Arc::new(AsyncObjectStore::new(store)); }
OpenDAL
The opendal
crate offers a unified data access layer, empowering users to seamlessly and efficiently retrieve data from diverse storage services.
It supports a huge range of services and layers to extend their behaviour.
zarrs_object_store::AsyncOpendalStore
wraps opendal::Operator
.
#![allow(unused)] fn main() { use zarrs::storage::::AsyncReadableStorage; use zarrs_opendal::AsyncOpendalStore; let builder = opendal::services::Http::default().endpoint("http://..."); let operator = opendal::Operator::new(builder)?.finish(); let store: AsyncReadableStorage = Arc::new(AsyncOpendalStore::new(operator)); }
note
Some opendal
stores can also be used in a synchronous context with zarrs_object_store::OpendalStore
, which wraps opendal::BlockingOperator
.
Icechunk
icechunk
is a transactional storage engine for Zarr designed for use on cloud object storage.
#![allow(unused)] fn main() { // Create an icechunk store let storage = Arc::new(icechunk::ObjectStorage::new_in_memory_store(None)); let icechunk_store = icechunk::Store::new_from_storage(storage).await?; let store = Arc::new(zarrs_icechunk::AsyncIcechunkStore::new(icechunk_store)); // Do some array/metadata manipulation with zarrs, then commit a snapshot let snapshot0 = store.commit("Initial commit").await?; // Do some more array/metadata manipulation, then commit another snapshot let snapshot1 = store.commit("Update data").await?; // Checkout the first snapshot store.checkout(icechunk::zarr::VersionInfo::SnapshotId(snapshot0)).await?; }
Storage Adapters
Storage adapters can be layered on top of stores to change their functionality.
The below storage adapters are all available in the zarrs::storage
submodule (via the zarrs_storage
crate).
Async to Sync
Asynchronous stores can be used in a synchronous context with the AsyncToSyncStorageAdapter
.
The AsyncToSyncBlockOn
trait must be implemented for a runtime or runtime handle in order to block on futures.
See the below tokio
example:
#![allow(unused)] fn main() { use zarrs::storage::storage_adapter::async_to_sync::AsyncToSyncBlockOn; struct TokioBlockOn(tokio::runtime::Runtime); // or handle impl AsyncToSyncBlockOn for TokioBlockOn { fn block_on<F: core::future::Future>(&self, future: F) -> F::Output { self.0.block_on(future) } } }
#![allow(unused)] fn main() { use zarrs::storage::::{AsyncReadableStorage, ReadableStorage}; // Create an async store as normal let builder = opendal::services::Http::default().endpoint(path); let operator = opendal::Operator::new(builder)?.finish(); let storage: AsyncReadableStorage = Arc::new(AsyncOpendalStore::new(operator)); // Create a tokio runtime and adapt the store to sync let block_on = TokioBlockOn(tokio::runtime::Runtime::new()?); let store: ReadableStorage = Arc::new(AsyncToSyncStorageAdapter::new(storage, block_on)) }
warning
Many async stores are not runtime-agnostic (i.e. only support tokio
).
Usage Log
The UsageLogStorageAdapter
logs storage method calls.
It is intended to aid in debugging and optimising performance by revealing storage access patterns.
#![allow(unused)] fn main() { let store = Arc::new(MemoryStore::new()); let log_writer = Arc::new(Mutex::new( // std::io::BufWriter::new( std::io::stdout(), // ) )); let store = Arc::new(UsageLogStorageAdapter::new(store, log_writer, || { chrono::Utc::now().format("[%T%.3f] ").to_string() })); }
Performance Metrics
The PerformanceMetricsStorageAdapter
accumulates metrics, such as bytes read and written.
It is intended to aid in testing by allowing the application to validate that metrics (e.g., bytes read/written, total read/write operations) match expected values for specific operations.
#![allow(unused)] fn main() { let store = Arc::new(MemoryStore::new()); let store = Arc::new(PerformanceMetricsStorageAdapter::new(store)); assert_eq!(store.bytes_read(), ...); }