-
Notifications
You must be signed in to change notification settings - Fork 90
Open
Labels
breakingNot backward compatible changesNot backward compatible changesfeatureIndicates new featuresIndicates new features
Milestone
Description
Problem
Currently, DataJoint's type system is closely tied to the underlying SQL backend (e.g., MySQL, PostgreSQL), exposing database-specific type names like TINYINT UNSIGNED
or DOUBLE
. This approach has several drawbacks:
- Lack of Portability: A schema defined for MySQL may not work on PostgreSQL without modification, hindering pipeline portability.
- Inconsistent User Experience: Scientists and developers are forced to learn the nuances of different SQL dialects.
- Less Intuitive Naming: Scientific programmers are more familiar with terms like uint8 and float64 than their SQL equivalents.
A standardized, abstract layer for attribute types is needed to create a consistent, portable, and user-friendly experience.
Requirements
The following core attribute types SHALL be supported by DataJoint implementations:
Category | Type | Description |
---|---|---|
UUID | uuid |
Universally Unique Identifier (RFC 4122). Default values are not supported for uuid attributes. |
Integers | int8 , uint8 , int16 , uint16 ,int32 , uint32 , int64 , uint64 |
Standard signed and unsigned integer types of varying bit widths. |
Floating-Point | float32 , float64 |
Single-precision (32-bit) and double-precision (64-bit) floating-point numbers. Note: NaN (Not a Number) behavior MAY vary by backend; e.g., MySQL does not natively support NaN in indexed FLOAT columns. |
Decimal | decimal(M,N) |
Fixed-point decimal number with a total precision of M digits and N digits after the decimal point (scale). |
Character Strings | char(N) , varchar(N) |
Fixed-length (char ) or variable-length (varchar ) character strings, where N specifies the maximum length. |
Enumeration | enum('val1', 'val2', ...) |
A type restricted to a predefined set of allowed string values. |
Date | date |
Represents a calendar date (year, month, day) in ISO 8601 format (YYYY-MM-DD). A special default value of NOW MAY be used to set the current date upon insertion. |
Time / Timestamp | timestamp |
Represents a point in time, typically with microsecond precision, stored in UTC. Values SHOULD conform to ISO 8601 format. A special default value of NOW MAY be used to set the current timestamp upon insertion. |
Binary Large Object | blob |
Stores large binary data directly within the database row (inline storage). Suitable for moderately sized binary objects. |
Object Reference | object |
Stores a reference (e.g., a key or path) to an external data object managed by DataJoint but stored outside the primary database (e.g., in an object store or file system). See Object Types. |
Custom Type | <adaptor_name> |
A user-defined type managed by a Custom Type Adaptor, allowing for specialized storage and handling of complex data structures. |
Distinction: blob
vs. object
Attribute Types
The blob
and object
types both handle non-scalar data, but differ in their storage strategy and typical use cases:
Type | Intended Use | Data Storage Location |
---|---|---|
blob |
Raw binary data stored directly within the database table row. Suitable for relatively small to moderately sized binary data where inline storage is acceptable and efficient. | Database System (e.g., as BLOB or BYTEA columns in MySQL/PostgreSQL). |
object |
References to data entities stored externally to the primary database. Suitable for large files, datasets, or complex objects where external storage is preferred for scalability or management. | External Storage Systems (e.g., file systems, cloud object stores like S3/GCS/Azure Blob, network-attached storage). The database stores metadata and a reference key. |
Metadata
Metadata
Assignees
Labels
breakingNot backward compatible changesNot backward compatible changesfeatureIndicates new featuresIndicates new features