Skip to content

FEAT: Core attributes types 2.0 #1261

@dimitri-yatsenko

Description

@dimitri-yatsenko

Problem

Currently, DataJoint's type system is closely tied to the underlying SQL backend (e.g., MySQL, PostgreSQL), exposing database-specific type names like TINYINT UNSIGNED or DOUBLE. This approach has several drawbacks:

  • Lack of Portability: A schema defined for MySQL may not work on PostgreSQL without modification, hindering pipeline portability.
  • Inconsistent User Experience: Scientists and developers are forced to learn the nuances of different SQL dialects.
  • Less Intuitive Naming: Scientific programmers are more familiar with terms like uint8 and float64 than their SQL equivalents.

A standardized, abstract layer for attribute types is needed to create a consistent, portable, and user-friendly experience.

Requirements

The following core attribute types SHALL be supported by DataJoint implementations:

Category Type Description
UUID uuid Universally Unique Identifier (RFC 4122). Default values are not supported for uuid attributes.
Integers int8, uint8, int16, uint16,
int32, uint32, int64, uint64
Standard signed and unsigned integer types of varying bit widths.
Floating-Point float32, float64 Single-precision (32-bit) and double-precision (64-bit) floating-point numbers. Note: NaN (Not a Number) behavior MAY vary by backend; e.g., MySQL does not natively support NaN in indexed FLOAT columns.
Decimal decimal(M,N) Fixed-point decimal number with a total precision of M digits and N digits after the decimal point (scale).
Character Strings char(N), varchar(N) Fixed-length (char) or variable-length (varchar) character strings, where N specifies the maximum length.
Enumeration enum('val1', 'val2', ...) A type restricted to a predefined set of allowed string values.
Date date Represents a calendar date (year, month, day) in ISO 8601 format (YYYY-MM-DD). A special default value of NOW MAY be used to set the current date upon insertion.
Time / Timestamp timestamp Represents a point in time, typically with microsecond precision, stored in UTC. Values SHOULD conform to ISO 8601 format. A special default value of NOW MAY be used to set the current timestamp upon insertion.
Binary Large Object blob Stores large binary data directly within the database row (inline storage). Suitable for moderately sized binary objects.
Object Reference object Stores a reference (e.g., a key or path) to an external data object managed by DataJoint but stored outside the primary database (e.g., in an object store or file system). See Object Types.
Custom Type <adaptor_name> A user-defined type managed by a Custom Type Adaptor, allowing for specialized storage and handling of complex data structures.

Distinction: blob vs. object Attribute Types

The blob and object types both handle non-scalar data, but differ in their storage strategy and typical use cases:

Type Intended Use Data Storage Location
blob Raw binary data stored directly within the database table row. Suitable for relatively small to moderately sized binary data where inline storage is acceptable and efficient. Database System (e.g., as BLOB or BYTEA columns in MySQL/PostgreSQL).
object References to data entities stored externally to the primary database. Suitable for large files, datasets, or complex objects where external storage is preferred for scalability or management. External Storage Systems (e.g., file systems, cloud object stores like S3/GCS/Azure Blob, network-attached storage). The database stores metadata and a reference key.

Metadata

Metadata

Labels

breakingNot backward compatible changesfeatureIndicates new features

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions