Skip to content

Conversation

@mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Nov 25, 2025

Description

This PR adds a new example to demonstrate pipelining when reading parquet sources with the new hybrid scan reader in multithreaded environment.

Checklist

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 25, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Nov 25, 2025
@mhaseeb123 mhaseeb123 added feature request New feature or request 2 - In Progress Currently a work in progress non-breaking Non-breaking change cuIO cuIO issue labels Nov 25, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rapids-bot bot pushed a commit that referenced this pull request Dec 21, 2025
…d wrappers (#20861)

Contributes to #20722 and #20879

This PR replaces the use of `thrust::copy_if` and `thrust::count_if` in Parquet and Hybrid scan readers with custom `CUB` based implementations using pinned memory to copy the result from device.

Note: I will create one last PR after this one replacing `thrust` utils with their (CUB based) cudf counterparts in `cudf/detail/utilities/algorithm.cuh` across libcudf.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - https://github.com/apps/pre-commit-ci
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - MithunR (https://github.com/mythrocks)

URL: #20861
@mhaseeb123 mhaseeb123 changed the title Example to demonstrate pipelining with the hybrid scan reader Example to demonstrate inter-parquet-file pipelining using hybrid scan APIs Jan 5, 2026
rapids-bot bot pushed a commit that referenced this pull request Jan 9, 2026
Contributes to #20722

This PR replaces the use of small host vectors with pinned vectors to avoid pageable copies and improve pipeline performance when reading parquet files using multiple threads (each using a separate non-blocking stream)

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #20820
@mhaseeb123 mhaseeb123 removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 9, 2026
cudf::detail::cuda_memcpy_async(
cudf::host_span<size_t>(h_initial_str_offsets.data(), initial_str_offsets.size()),
cudf::device_span<size_t const>(initial_str_offsets.data(), initial_str_offsets.size()),
cudf::host_span<size_t>{h_initial_str_offsets.data(), initial_str_offsets.size()},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply using {} instead of ()

auto host_null_masks = std::vector<bitmask_type*>{};
auto host_begin_bits = std::vector<cudf::size_type>{};
auto host_end_bits = std::vector<cudf::size_type>{};
auto null_masks = std::vector<bitmask_type*>{};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply changed host_xx to xx and instead using a pinned_ prefix for the pinned versions below

return bitmask;
} else {
auto bitmask = cudf::detail::make_host_vector<bitmask_type>(num_bitmasks, stream);
auto bitmask = cudf::detail::make_pinned_vector_async<bitmask_type>(num_bitmasks, stream);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the missed host to pinned conversion.

auto resource = create_memory_resource(is_pool_used);
auto default_stream = cudf::get_default_stream();
auto stream_pool = rmm::cuda_stream_pool(thread_count);
auto stream_pool = rmm::cuda_stream_pool(thread_count, rmm::cuda_stream::flags::non_blocking);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create non-blocking streams

"input source == thread count\n";
for (size_t idx = 0; thread_count > static_cast<int>(parquet_files.size()); idx++) {
parquet_files.emplace_back(parquet_files[idx % initial_size]);
if (parquet_files.size() < thread_count) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only print that we are appending the sources if we need to

@mhaseeb123
Copy link
Member Author

pre-commit.ci autofix

@mhaseeb123 mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 10, 2026
@mhaseeb123 mhaseeb123 marked this pull request as ready for review January 10, 2026 00:59
@mhaseeb123 mhaseeb123 requested review from a team as code owners January 10, 2026 00:59
@mhaseeb123 mhaseeb123 requested review from ttnghia and vyasr January 10, 2026 00:59
@GregoryKimball GregoryKimball moved this to Burndown in libcudf Jan 12, 2026
@mhaseeb123
Copy link
Member Author

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

Status: Burndown

Development

Successfully merging this pull request may close these issues.

1 participant