Skip to content

GH-48470: [Python] Construct UuidArray from list of UuidScalars#48746

Merged
rok merged 9 commits intoapache:mainfrom
tadeja:48470-extension-scalar
Mar 18, 2026
Merged

GH-48470: [Python] Construct UuidArray from list of UuidScalars#48746
rok merged 9 commits intoapache:mainfrom
tadeja:48470-extension-scalar

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Jan 6, 2026

Rationale for this change

Fixes #48470. Also fixes all extension types, not just UUID.

What changes are included in this PR?

An extension scalar is unwrapped to its storage type when building arrays.

Are these changes tested?

Yes, new test_array_from_extension_scalars covers builtin (uuid, bool8, json_, opaque) and custom types across all storage types (int, float, bool, string, binary, large string/binary, decimal, fixed-size binary, struct, timestamp, duration, date).

Are there any user-facing changes?

Now user can run such an example to get the output below instead of ArrowInvalid message.
This now works for any extension type, not just UUID.

import pyarrow as pa
pa.array([pa.scalar(b'1'*16, type=pa.uuid())], type=pa.uuid())
<pyarrow.lib.UuidArray object at 0x128186970>
[
  31313131313131313131313131313131
]

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

⚠️ GitHub issue #48470 has been automatically assigned in GitHub to PR creator.

@tadeja tadeja force-pushed the 48470-extension-scalar branch from 043c8a8 to dbf1194 Compare March 4, 2026 19:25
@tadeja tadeja marked this pull request as ready for review March 4, 2026 19:43
@tadeja tadeja requested review from AlenkaF, raulcd and rok as code owners March 4, 2026 19:43
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor change and more extensive tests suggestions.

Comment on lines +587 to +594
// Helper function to unwrap extension scalar to its storage scalar
const Scalar& GetStorageScalar(const Scalar& scalar) {
if (scalar.type->id() == Type::EXTENSION) {
return *checked_cast<const ExtensionScalar&>(scalar).value;
}
return scalar;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Result here?

Suggested change
// Helper function to unwrap extension scalar to its storage scalar
const Scalar& GetStorageScalar(const Scalar& scalar) {
if (scalar.type->id() == Type::EXTENSION) {
return *checked_cast<const ExtensionScalar&>(scalar).value;
}
return scalar;
}
// Helper function to unwrap extension scalar to its storage scalar
Result<const Scalar*> GetStorageScalar(const Scalar& scalar) {
if (scalar.type->id() != Type::EXTENSION) {
return &scalar;
}
const auto& extension_scalar = checked_cast<const ExtensionScalar&>(scalar);
return extension_scalar.value.get();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Result is not needed here - if its kind of failure path is unlikely then const Scalar& might be better?
Looking at the line 757 check if (PyValue::IsNull(this->options_, value)) { ...
( For an (unlikely?) non-None extension scalar with internal storage .value of a null shared_ptr both approaches don't avoid it? )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result is probably not needed indeed, since no error can be returned here. As you prefer @tadeja.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Mar 5, 2026
Co-authored-by: Rok Mihevc <rok@mihevc.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 11, 2026
@tadeja tadeja force-pushed the 48470-extension-scalar branch from a38cfda to 483e866 Compare March 11, 2026 19:28
@tadeja tadeja requested a review from rok March 17, 2026 17:13
@tadeja
Copy link
Contributor Author

tadeja commented Mar 17, 2026

For the C++ part of changes, @pitrou would you have a chance to take a look?

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 17, 2026


def test_array_from_extension_scalars():
import datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move this stdlib import to the top of file.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat @tadeja ! Just minor suggestions below.

@rok rok merged commit d08d5e6 into apache:main Mar 18, 2026
17 checks passed
@rok rok removed the awaiting merge Awaiting merge label Mar 18, 2026
@rok
Copy link
Member

rok commented Mar 18, 2026

Thanks for sanity check @pitrou! Merging.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d08d5e6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Cannot construct UuidArray from list of UuidScalars

3 participants