Skip to content

FEAT: Implement <dj_zarr> Custom Type Adaptor #1263

@dimitri-yatsenko

Description

@dimitri-yatsenko

Feature Request

Problem

Scientific research, particularly in fields like neuroscience and imaging, generates massive n-dimensional arrays that are too large to be stored efficiently in traditional blob fields. The Zarr format is an industry standard for storing chunked, compressed array data, enabling parallel I/O and efficient partial access (slicing). Without native support for Zarr, users are forced to manage these datasets manually by storing file paths, which disconnects the data from the DataJoint pipeline and forfeits all the benefits of integrated data management and integrity checks.

Requirements

A successful implementation of this improvement should provide a built-in Custom Type Adaptor for handling Zarr arrays, leveraging the object type for external storage. This implementation must adhere to the DataJoint 2.0 Specification.

The core requirements are:

Create a dj.CustomType Adaptor:

  • A new class must be implemented that inherits from dj.CustomType as a plugin

Implement the Standard Interface:

  • The type_name property must return the string <dj_zarr>.
  • The stored_type property must return the object.
  • The put method must accept a Zarr-compatible array object (e.g., a NumPy array) and write it to the configured external object store as a Zarr dataset.
  • The get method must read the Zarr dataset from the external store and return it as a lazy-loading Zarr array object.

Default Registration:

  • The <dj_zarr> adaptor should be registered by default with the DataJoint client, making it available out-of-the-box.

Justification

Providing native support for Zarr arrays via a type adaptor will be a transformative feature for DataJoint. It will enable the seamless management of petabyte-scale array data in a format that is optimized for high-performance, parallel computing and cloud environments. This directly addresses the needs of modern data-intensive science and solidifies DataJoint's position as a cutting-edge platform for scientific data management.

Metadata

Metadata

Labels

featureIndicates new features

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions