Skip to content

FEAT: Custom Types 2.0 #1260

@dimitri-yatsenko

Description

@dimitri-yatsenko

Feature Request

Problem

Scientific workflows often rely on specialized data structures and objects (e.g., Pandas DataFrames, custom analysis objects, specific file format handlers) that do not map directly to DataJoint's core attribute types. The current approach requires developers to write boilerplate code to manually serialize these objects before insertion and deserialize them after fetching. This process is repetitive, clutters the main application logic, is prone to error, and lacks a standardized method, leading to inconsistencies across different parts of a pipeline or between collaborators.

Requirements

Introduce a formal plugin system for "Custom Type Adaptors" that allows for seamless, bidirectional conversion between complex Python objects and a supported underlying storage type. This implementation must adhere to the DataJoint 2.0 Specification

The core requirements are:

Custom Type Declaration:

[ ] Support a new syntax in the table definition string for declaring attributes with a custom type, using the format

attribute: <adaptor_name> 

The angle brackets are part of the syntax.

For example, the following tables uses two custom types, zarr_array and dj_blob. These types are translated into core types object and binary, respectively.

class ProcessedData(dj.Computed):
    definition = """
    -> upstream.Analysis
    ---
    computed_matrix : <zarr_array>    # Uses the <zarr_array> custom type
    result_summary  : <dj_blob>       # Uses the <dj_blob> custom type
    """

dj.CustomType Interface:

  • The dj.CustomType base class allows users to define their own type adaptors.
  • Any class inheriting from dj.CustomType MUST implement the following standard interface:
    • type_name (property): A unique string identifier for the custom type (e.g., <zarr_array>).
    • stored_type (property): The underlying DataJoint core attribute type used for storage (e.g., object, blob, varchar(255)).
    • put(self, user_object: object) -> object: A method that takes a user-provided Python object and converts it into the format specified by stored_type.
  • get(self, stored_object: object) -> object: A method that takes the data retrieved from the database and converts it back into the user-level Python object.

Registration Mechanism:

  • Provide a mechanism for registering custom type adaptors with the DataJoint client using the modern Python plugin architecture.
  • Registration must occur before any schemas that use the custom types are declared or activated.

Metadata

Metadata

Labels

featureIndicates new features

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions