-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators #132896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
49011aa
to
ba1d5d4
Compare
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis patch extends segmented iterator optimizations, previously applied to Addresses a subtask of #102817.
|
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/ranges_robust_against_copying_comparators.pass.cpp
Outdated
Show resolved
Hide resolved
ba1d5d4
to
c113266
Compare
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
a7041cc
to
a2e451d
Compare
16438be
to
047acfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! I left some comments but I think this is going to be a nice optimization.
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
0aad396
to
5a7b6eb
Compare
libcxx/test/benchmarks/algorithms/nonmodifying/for_each_join_view.bench.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/benchmarks/algorithms/nonmodifying/for_each_n.bench.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
198fe3b
to
f5d13ab
Compare
d14bde4
to
8a5bcdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the scope of this patch is getting a bit out of hand. The title says that you're optimizing ranges::for_each{,_n}
, but you're also back-porting the std::for_each
optimization to C++03, adding and adding an optimization to std::for_each_n
. Could we split this up to make it clear what changes are required for what optimizations? Also, why do we want to back-port the std::for_each
optimization now? Do we think the extra complexity is worth the improved performance?
Thank you for your feedback! I agree that the scope of the patch has expanded beyond its original intent. Initially, the goal was simple: only to extend the optimization for However, I agree that this made the patch diverge from its original purpose and may complicate the review process. Following your suggestion, I will work on splitting it to make it clear what this patch focuses on. -------------- Update --------------
This separation allows the current PR to focus exclusively on the optimization of the ranges algorithms. I will rebase my current patch on the above split pieces once they are landed. |
8a5bcdc
to
5a225dd
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
5a225dd
to
b366e93
Compare
b366e93
to
216b957
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once comments are addressed. Thanks a lot for this series of refactorings / optimizations!
libcxx/docs/ReleaseNotes/21.rst
Outdated
resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` and 24.9x for ``join_view`` of | ||
``vector<vector<char>>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should report this optimization on the same line as the std::for_each
optimization above -- I don't think there is much to be gained from having nearly-duplicate release notes since these algorithms are very similar. While we aim for a good level of completeness in our release notes, we also want to make them as useful to users as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rerun the benchmarks multiple times, and I got similar and consistent speedups for the ranges algorithms. It is a bit strange these numbers seem greater than those reported earlier for the classical std
algorithms. Ideally, these numbers should match. I haven't identified a clear reason why this is not the case. My guess is that the numbers reported earlier for the classical std
algorithms were obtained from comparison between std::for_each
with/without segmented iterator optimization, while the numbers in this patch compare the ranges algorithm std::ranges::for_each
with/without optimization. The difference here is that the comparisons for std::for_each
did not have the noise such as the std::invoke
call and projection call, whereas the comparisons for the ranges algorithms do. This noise might account for the difference. This is the only difference I could possibly think of at this moment.
To avoid confusion, I will not report these numbers in this patch. Instead, I will stick to the previously reported and smaller numbers (which suffice to show the performance improvements).
for (; __first != __last; ++__first) | ||
std::invoke(__func, std::invoke(__proj, *__first)); | ||
return {std::move(__first), std::move(__func)}; | ||
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { | |
// In the case where we have different iterator and sentinel types, the segmented iterator optimization | |
// in std::for_each will not kick in. Therefore, we prefer std::for_each_n in that case (whenever we can | |
// obtain the `n`). | |
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { |
->Arg(1024) | ||
->Arg(4096) | ||
->Arg(8192) | ||
->Arg(1 << 20); | ||
->Arg(1 << 14) | ||
->Arg(1 << 16) | ||
->Arg(1 << 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be better to leave the old benchmark values in place. They are less comprehensive but we need to achieve a tradeoff between comprehensiveness and the time it takes to run these benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we were running a test case with a very large n = (1 << 20)
. To save some time, I replaced this large test case with 3 smaller test cases with n = (1 << 14)
, 1 << 16
, 1 << 18
. I think the total execution time of these three test cases is actually lower than running a single test case with n = (1 << 20)
. Please let me know if I misunderstood you.
->Arg(8) | ||
->Arg(32) | ||
->Arg(50) // non power-of-two | ||
->Arg(1024) | ||
->Arg(4096) | ||
->Arg(8192) | ||
->Arg(1 << 14) | ||
->Arg(1 << 16) | ||
->Arg(1 << 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for the benchmark sizes.
bm.operator()<std::list<int>>("std::for_each_n(list<int>)", std_for_each_n); | ||
bm.operator()<std::vector<int>>("rng::for_each_n(vector<int>)", std::ranges::for_each_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the same numbers as for the std::for_each
benchmarks.
275c254
to
df7ac69
Compare
df7ac69
to
b525b74
Compare
Previously, the segmented iterator optimization was limited to
std::{for_each, for_each_n}
. This patch aims to extend the optimization tostd::ranges::for_each
andstd::ranges::for_each_n
, ensuring consistent optimizations across these algorithms. This patch first generalizes thestd
algorithms by introducing aProjection
parameter, which is set to__identity
for thestd
algorithms. Then we let theranges
algorithms to directly call theirstd
counterparts with a general__proj
argument. Benchmarks demonstrate performance improvements of up to 21.4x forstd::deque::iterator
and 22.3x forjoin_view
ofvector<vector<char>>
.Addresses a subtask of #102817.
Summary of speedups for
deque
iteratorsSummary of speedups for
join_view
iteratorsBenchmarks:
std::ranges::for_each
withdeque
iteratorsstd::ranges::for_each_n
withdeque
iteratorsstd::ranges::for_each
withjoin_view
iteratorsstd::ranges::for_each_n
withjoin_view
iterators