Skip to content

refactor: decouple extension ABI from arrow-rs with Arrow C Data Interfaces#6337

Merged
rchowell merged 9 commits intomainfrom
rchowell/ext-decouple
Mar 6, 2026
Merged

refactor: decouple extension ABI from arrow-rs with Arrow C Data Interfaces#6337
rchowell merged 9 commits intomainfrom
rchowell/ext-decouple

Conversation

@rchowell
Copy link
Contributor

@rchowell rchowell commented Mar 4, 2026

Remove arrow-rs dependency from daft-ext-abi by introducing owned Arrow C Data Interface types (ArrowSchema, ArrowArray, ArrowArrayStream) that are layout compatible with the C ABI. This allows extensions to use any Arrow implementation (arrow-rs, arrow2, etc.) without coupling to a specific version.

Changes Made

  • Add ArrowSchema, ArrowArray, ArrowArrayStream C types to daft-ext-abi
  • Add ArrowData (schema + array pair) for safe FFI data transfer
  • Move FFI conversion helpers into daft-ext-abi/src/ffi/arrow.rs
  • Update DaftScalarFunction trait to use ABI types instead of arrow-rs
  • Update DaftSession trait to use ABI types
  • Remove arrow-rs dependency from daft-ext-abi Cargo.toml
  • Add arrow.rs helpers in daft-ext-core for arrow-rs <-> ABI conversions
  • Update hello example to use new ABI types

Related Issues

@rchowell rchowell requested a review from a team as a code owner March 4, 2026 01:02
@github-actions github-actions bot added the feat label Mar 4, 2026
@greptile-apps

This comment was marked as resolved.

daft-ext = {path = "../../src/daft-ext"}
arrow-array = {version = "57.1.0", features = ["chrono-tz"]}
arrow-schema = "57.1.0"
arrow = {version = "58", features = ["ffi"]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the different major version now

…e types

Remove arrow-rs dependency from daft-ext-abi by introducing owned Arrow C Data
Interface types (ArrowSchema, ArrowArray, ArrowArrayStream) that are layout-
compatible with the C ABI. This allows extensions to use any Arrow implementation
(arrow-rs, arrow2, etc.) without coupling to a specific version.

Key changes:
- Add ArrowSchema, ArrowArray, ArrowArrayStream C types to daft-ext-abi
- Add ArrowData (schema + array pair) for safe FFI data transfer
- Move FFI conversion helpers into daft-ext-abi/src/ffi/arrow.rs
- Update DaftScalarFunction trait to use ABI types instead of arrow-rs
- Update DaftSession trait to use ABI types
- Remove arrow-rs dependency from daft-ext-abi Cargo.toml
- Add arrow.rs helpers in daft-ext-core for arrow-rs <-> ABI conversions
- Update hello example to use new ABI types
@rchowell rchowell force-pushed the rchowell/ext-decouple branch from 5ce0830 to 38c3856 Compare March 4, 2026 01:26
@rchowell
Copy link
Contributor Author

rchowell commented Mar 4, 2026

@greptileai please provide a critical review, we don't need a diagram.

Copy link
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to share my perspective, though I'm happy to disagree and commit if the team prefers this direction.

I'd generally favor ergonomics over ABI purity here. Someone writing their first Daft extension shouldn't need to understand C Data Interface types and macro-generated helpers just to get started.


The macro generates a bunch of import/export functions in every crate that calls it, and it forces extension authors to add daft_ext::define_arrow_helpers!() as a mandatory invocation before they can really do anything.

Compare the authoring experience:

This PR:

use daft_ext::prelude::*;
daft_ext::define_arrow_helpers!(); 

fn call(&self, args: &[ArrowData]) -> DaftResult<ArrowData> {
    let input = import_array(unsafe { ArrowData::take_arg(args, 0) })?;
    // ... work with input ...
    Ok(export_array(&result)?)
}

My draft (#6322):

use daft_ext::prelude::*;

fn call(&self, args: &[ArrayRef]) -> DaftResult<ArrayRef> {
    let names = args[0].as_string::<i64>();
    // ... just use arrow types directly ...
    Ok(Arc::new(builder.finish()))
}

In #6322, extension authors work with ArrayRef and Field directly via familiar arrow-rs types, no conversion ceremony, no macro.

I understand the argument about "true decoupling" from any Arrow implementation, but I think we should be practical about our actual users. If we intend for these to be primarily rust based, then it naturally makes sense to include arrow-rs as a dependency. The C/C++ extension use case is theoretical at this point (which would still be technically supported by using the arrow abi). Meanwhile, the ergonomic cost is IMO quite noticeable, every extension needs import/export wrapping, unsafe blocks for ArrowData::take_arg, and a magic macro invocation.

My bias would be towards simpler extension authoring experience over theoretical purity at the ABI layer.

But again, happy to go with consensus here.

@rchowell
Copy link
Contributor Author

rchowell commented Mar 5, 2026

@universalmind303 thanks, I don't disagree on ergonomics. We can always improve ergonomics on top of these APIs with higher-level helpers, primarily macro based. The important thing is to have proper decoupling where we actually interface. The pointer cast solution adds ergonomics to the lowest level interface, whereas I am intentionally choosing to go for purity on the lowest interface and we can build ergonomics up from there.

Ideally we are in a future like,

#[daft_scalar_function]
pub fn my_add(x: i64, y: i64) -> i64 {
   x + y
}

And all the arrow details disappear. This is how pgrx handles things, while hiding all the messy details behind the macro. I will continue with the proper decoupling and commit to improving ergonomics on top of this foundation.

@rchowell rchowell changed the title feat: decouple extension ABI from arrow-rs with Arrow C Data Interfaces refactor: decouple extension ABI from arrow-rs with Arrow C Data Interfaces Mar 5, 2026
@github-actions github-actions bot added refactor and removed feat labels Mar 5, 2026
@rchowell
Copy link
Contributor Author

rchowell commented Mar 5, 2026

Ok, I have found some better ergonomics @universalmind303.

I will add helpers for using raw pointers with our stable Arrow C types, and then feature-flag some arrow-rs versions. This removes the customer's boilerplate to get into our stable types, and if they are using arrow-rs then they just add the feature in their Cargo.toml.

Updating this now.

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 78.25000% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.80%. Comparing base (69a9285) to head (911fdeb).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-ext-core/src/session.rs 0.00% 38 Missing ⚠️
src/daft-ext-abi/src/arrow.rs 85.78% 30 Missing ⚠️
src/daft-ext-core/src/function.rs 81.18% 19 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6337      +/-   ##
==========================================
- Coverage   74.95%   74.80%   -0.15%     
==========================================
  Files        1021     1020       -1     
  Lines      135873   136113     +240     
==========================================
- Hits       101839   101820      -19     
- Misses      34034    34293     +259     
Files with missing lines Coverage Δ
src/daft-ext-abi/src/lib.rs 100.00% <ø> (ø)
src/daft-ext-core/src/error.rs 100.00% <100.00%> (ø)
src/daft-ext-internal/src/function.rs 88.54% <100.00%> (+0.54%) ⬆️
src/daft-ext-core/src/function.rs 85.71% <81.18%> (-1.96%) ⬇️
src/daft-ext-abi/src/arrow.rs 85.78% <85.78%> (ø)
src/daft-ext-core/src/session.rs 56.31% <0.00%> (-19.02%) ⬇️

... and 53 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rchowell rchowell enabled auto-merge (squash) March 6, 2026 21:34
@rchowell rchowell merged commit fe526ea into main Mar 6, 2026
36 checks passed
@rchowell rchowell deleted the rchowell/ext-decouple branch March 6, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants