GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556
GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556gounthar wants to merge 2 commits intoapache:mainfrom
Conversation
Add riscv64 to the Linux wheel build pipeline using manylinux_2_39
(first manylinux with riscv64 support) and RISE native runners
(ubuntu-24.04-riscv).
Changes:
- dev/tasks/tasks.yml: add ("manylinux", "riscv64", "2-39",
"manylinux_2_39_riscv64") entry to the wheel matrix
- dev/tasks/python-wheels/github.linux.yml: add riscv64 runner
selection (ubuntu-24.04-riscv) and ARCH mapping
Arrow C++ and PyArrow both build successfully on native riscv64
hardware (BananaPi F3, SpacemiT K1, rv64gc).
Note: Docker image for riscv64 wheel builds still needs to be created
(following the aarch64 pattern). This PR enables the CI pipeline;
Docker image creation is tracked separately.
Signed-off-by: Bruno Verachten <gounthar@gmail.com>
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
|
|
- ci/vcpkg/riscv64-linux-static-{release,debug}.cmake: vcpkg triplets
for riscv64 (following arm64 pattern)
- compose.yaml: add python-wheel-manylinux-2-39 service and ccache
volume for riscv64 wheel builds (using quay.io/pypa/manylinux_2_39_riscv64)
Signed-off-by: Bruno Verachten <gounthar@gmail.com>
|
Hi @gounthar , the big question here is what happens for ongoing maintenance. I think none of the currently active Arrow maintainers has a RISC-V box at home, and debugging on CI can be painful. But I might be overstating the risks. |
|
also cc @raulcd |
raulcd
left a comment
There was a problem hiding this comment.
This would require the following app: https://github.com/apps/rise-risc-v-runners
To be set-up as part of the Apache organization. A couple of questions about that. Do we know what are the required permissions? Has this been set-up on other Apache projects or are we the first ones? Asking as might require some ASF INFRA research.
|
@pitrou That's a fair concern. A few things that mitigate it:
I'm also happy to be a point of contact for riscv64 issues in Arrow. I've been doing this across several Python projects this week and can respond quickly. |
|
@raulcd Great question. The RISE runners app requires these permissions:
It's a GitHub App that provides self-hosted runners, similar to how some projects use BuildJet or Actuated for ARM runners. As for Apache org setup: I'm not aware of other Apache projects using RISE runners yet, you'd likely be the first. I can check with Ludovic Henry (@luhenry, RISE TSC Co-Chair) about the process for Apache org installation. He's been setting up runners for numpy, pytorch, llama.cpp, and pyca this week. Alternatively, we could initially run the riscv64 job on a separate fork (like we did with riseproject-dev/numpy) and upstream once the runner setup is confirmed. That way the PR can be reviewed independently of the infra question. I'll ping Ludovic about Apache org support and report back. |
The requested permissions are:
We are not going to require any more credentials as we only want to be able to dynamically register self-hosted runners, and that's it. You can find all the code of the app at https://github.com/riseproject-dev/riscv-runner-app, and a more descriptive website at https://riseproject-dev.github.io/riscv-runner/
RISE is part of Linux Foundation EU, and is committed to this service. We see it as a critical piece to enable RISC-V Software more broadly, which is our entire raison d'être. We are also working with PyTorch, Llama.cpp, and many other projects to enable CI on RISC-V. Also happy to drastically increase the number of runners available for the Apache organization, given the overall importance of everything that you're doing! For any direct board access, we are also working on a service of remote, on-demand machines accessible via SSH. Exactly for this kind of purpose where someone needs to debug a sticky issue and for which it's a great productivity loss to go through CI. Let me know if you have any other questions. |
|
Thanks for the answers @gounthar . Can we perhaps start by having a regular C++ CI job on RISC-V? The Python wheel CI builds do not run the C++ test suite. |
|
@pitrou That makes a lot of sense, start with the foundation. I'll rework this PR to add a C++ CI job on riscv64 instead of jumping straight to wheel builds. I've already verified that Arrow C++ builds and passes Would you prefer:
|
|
Definitely a Docker-based job! You can take a look here for inspiration: arrow/.github/workflows/cpp.yml Line 74 in d08d5e6 Note that this uses |
|
@pitrou Thanks for the pointer! I'll study the I'll rework this PR to add a Will report back once I have a working prototype. 🤞 |
Describe the enhancement requested
Add riscv64 to the PyArrow wheel build pipeline using
manylinux_2_39and native RISE riscv64 runners.Changes
dev/tasks/tasks.yml: Add("manylinux", "riscv64", "2-39", "manylinux_2_39_riscv64")to the Linux wheel matrixdev/tasks/python-wheels/github.linux.yml: Addubuntu-24.04-riscvrunner selection andriscv64ARCH mappingWhat's still needed
apache/arrow-dev:riscv64-python-*-wheel-manylinux-2_39-vcpkg-*)This draft PR enables the CI pipeline. The Docker image creation is a separate effort — happy to work on that as well, or coordinate with the team.
Evidence
import pyarrow; print(pyarrow.__version__)→24.0.0a1.dev1CI Runners
Native riscv64 runners are available for free via RISE RISC-V runners. numpy, llama.cpp, and pytorch already use them.
Fixes #49555
Note: this work is part of the RISE Project effort to improve Python ecosystem support on riscv64 platforms.