Skip to content

GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556

Draft
gounthar wants to merge 2 commits intoapache:mainfrom
gounthar:feat/riscv64-python-wheels
Draft

GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556
gounthar wants to merge 2 commits intoapache:mainfrom
gounthar:feat/riscv64-python-wheels

Conversation

@gounthar
Copy link

@gounthar gounthar commented Mar 19, 2026

Describe the enhancement requested

Add riscv64 to the PyArrow wheel build pipeline using manylinux_2_39 and native RISE riscv64 runners.

Changes

  • dev/tasks/tasks.yml: Add ("manylinux", "riscv64", "2-39", "manylinux_2_39_riscv64") to the Linux wheel matrix
  • dev/tasks/python-wheels/github.linux.yml: Add ubuntu-24.04-riscv runner selection and riscv64 ARCH mapping

What's still needed

  • Docker image for riscv64 wheel builds (apache/arrow-dev:riscv64-python-*-wheel-manylinux-2_39-vcpkg-*)
  • vcpkg dependency verification for riscv64
  • Testing via Crossbow

This draft PR enables the CI pipeline. The Docker image creation is a separate effort — happy to work on that as well, or coordinate with the team.

Evidence

  • Arrow C++ native build on BananaPi F3 (SpacemiT K1, rv64gc, GCC 14.2.0): SUCCESS (1h13m)
  • PyArrow install from source (Parquet, CSV, JSON, Compute, Filesystem): SUCCESS
  • import pyarrow; print(pyarrow.__version__)24.0.0a1.dev1

CI Runners

Native riscv64 runners are available for free via RISE RISC-V runners. numpy, llama.cpp, and pytorch already use them.

Fixes #49555

Note: this work is part of the RISE Project effort to improve Python ecosystem support on riscv64 platforms.

Add riscv64 to the Linux wheel build pipeline using manylinux_2_39
(first manylinux with riscv64 support) and RISE native runners
(ubuntu-24.04-riscv).

Changes:
- dev/tasks/tasks.yml: add ("manylinux", "riscv64", "2-39",
  "manylinux_2_39_riscv64") entry to the wheel matrix
- dev/tasks/python-wheels/github.linux.yml: add riscv64 runner
  selection (ubuntu-24.04-riscv) and ARCH mapping

Arrow C++ and PyArrow both build successfully on native riscv64
hardware (BananaPi F3, SpacemiT K1, rv64gc).

Note: Docker image for riscv64 wheel builds still needs to be created
(following the aarch64 pattern). This PR enables the CI pipeline;
Docker image creation is tracked separately.

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions github-actions bot added the awaiting review Awaiting review label Mar 19, 2026
@gounthar gounthar changed the title ci: add riscv64 manylinux wheel builds to release pipeline GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline Mar 19, 2026
@github-actions
Copy link

⚠️ GitHub issue #49555 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #49555 has no components, please add labels for components.

- ci/vcpkg/riscv64-linux-static-{release,debug}.cmake: vcpkg triplets
  for riscv64 (following arm64 pattern)
- compose.yaml: add python-wheel-manylinux-2-39 service and ccache
  volume for riscv64 wheel builds (using quay.io/pypa/manylinux_2_39_riscv64)

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Hi @gounthar , the big question here is what happens for ongoing maintenance. I think none of the currently active Arrow maintainers has a RISC-V box at home, and debugging on CI can be painful. But I might be overstating the risks.

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

also cc @raulcd

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require the following app: https://github.com/apps/rise-risc-v-runners

To be set-up as part of the Apache organization. A couple of questions about that. Do we know what are the required permissions? Has this been set-up on other Apache projects or are we the first ones? Asking as might require some ASF INFRA research.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou That's a fair concern. A few things that mitigate it:

  1. RISE runners are native hardware: not QEMU emulation. Debugging and iteration is much faster (we validated the full Arrow C++ + pyarrow build on native hardware in ~1.5h total).

  2. I maintain two BananaPi F3 boards running 24/7 as riscv64 build/test machines. Happy to help debug any riscv64-specific issues that come up.

  3. The riscv64 job would be additive: if it breaks, it doesn't affect existing x86_64/aarch64 builds. Same as how aarch64 was initially added.

  4. Arrow C++ is already riscv64-compatible (PR ARROW-17440: [C++] Support RISC-V architecture #13902, 2022). The risk of riscv64-specific breakage in the wheel pipeline is low since the code itself work; it's mainly CI plumbing.

  5. RISE is committed to maintaining riscv64 CI for key projects. Ludovic Henry (RISE TSC Co-Chair) is actively providing runners and support; this isn't a fire-and-forget contribution.

I'm also happy to be a point of contact for riscv64 issues in Arrow. I've been doing this across several Python projects this week and can respond quickly.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@raulcd Great question. The RISE runners app requires these permissions:

  • Actions: Read and write (to register/deregister runners)
  • Metadata: Read-only

It's a GitHub App that provides self-hosted runners, similar to how some projects use BuildJet or Actuated for ARM runners.

As for Apache org setup: I'm not aware of other Apache projects using RISE runners yet, you'd likely be the first. I can check with Ludovic Henry (@luhenry, RISE TSC Co-Chair) about the process for Apache org installation. He's been setting up runners for numpy, pytorch, llama.cpp, and pyca this week.

Alternatively, we could initially run the riscv64 job on a separate fork (like we did with riseproject-dev/numpy) and upstream once the runner setup is confirmed. That way the PR can be reviewed independently of the infra question.

I'll ping Ludovic about Apache org support and report back.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@raulcd @pitrou: Ludovic Henry (@luhenry, RISE TSC Co-Chair) should respond directly on this PR about the Apache org runner setup. He hasn't set up RISE runners on Apache projects before but is very interested in making it happen.

@luhenry
Copy link

luhenry commented Mar 19, 2026

To be set-up as part of the Apache organization. A couple of questions about that. Do we know what are the required permissions? Has this been set-up on other Apache projects or are we the first ones? Asking as might require some ASF INFRA research.

The requested permissions are:

  • At repository level:
  • At organization level:
    • Self-hosted runners: read and write; necessary to add the self-hosted runners and runner group

We are not going to require any more credentials as we only want to be able to dynamically register self-hosted runners, and that's it. You can find all the code of the app at https://github.com/riseproject-dev/riscv-runner-app, and a more descriptive website at https://riseproject-dev.github.io/riscv-runner/

RISE is committed to maintaining riscv64 CI for key projects

RISE is part of Linux Foundation EU, and is committed to this service. We see it as a critical piece to enable RISC-V Software more broadly, which is our entire raison d'être. We are also working with PyTorch, Llama.cpp, and many other projects to enable CI on RISC-V.

Also happy to drastically increase the number of runners available for the Apache organization, given the overall importance of everything that you're doing!

For any direct board access, we are also working on a service of remote, on-demand machines accessible via SSH. Exactly for this kind of purpose where someone needs to debug a sticky issue and for which it's a great productivity loss to go through CI.

Let me know if you have any other questions.

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Thanks for the answers @gounthar . Can we perhaps start by having a regular C++ CI job on RISC-V? The Python wheel CI builds do not run the C++ test suite.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou That makes a lot of sense, start with the foundation. I'll rework this PR to add a C++ CI job on riscv64 instead of jumping straight to wheel builds.

I've already verified that Arrow C++ builds and passes import pyarrow on native riscv64. I'll look at the existing C++ CI jobs (ci/docker/ubuntu-*-cpp.dockerfile and the corresponding workflows) and add a riscv64 variant.

Would you prefer:

  1. A Docker-based build (like the existing CI), or
  2. A native build on the RISE runner directly (simpler, but less isolated)?

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Definitely a Docker-based job! You can take a look here for inspiration:

Note that this uses archery docker, which relies on compose.yaml. See https://arrow.apache.org/docs/developers/continuous_integration/docker.html, and feel free to ask any questions!

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou Thanks for the pointer! I'll study the archery docker approach and the compose.yaml service definitions for the C++ CI job. 🙏

I'll rework this PR to add a ubuntu-cpp-riscv64 service in compose.yaml and a corresponding CI workflow, following the pattern from cpp.yml#L74. Will likely need to figure out the Docker image base (manylinux_2_39_riscv64 or a Ubuntu-based image for riscv64). 🤔

Will report back once I have a working prototype. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Awaiting review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add riscv64 Python wheel builds to release pipeline

4 participants