tr: add ASCII range translation fast path#12118
Open
parasol-aser wants to merge 2 commits intouutils:mainfrom
Open
tr: add ASCII range translation fast path#12118parasol-aser wants to merge 2 commits intouutils:mainfrom
parasol-aser wants to merge 2 commits intouutils:mainfrom
Conversation
added 2 commits
May 1, 2026 22:36
Files: src/uu/tr/src/operation.rs, src/uu/tr/src/simd.rs. Mechanism: detect translation tables that change one contiguous ASCII range by a constant wrapping delta, then process those chunks with an AVX2 range-compare and masked add kernel with scalar fallback. The existing single-byte and table-lookup paths remain for non-affine translations. Predicted delta: tr/tr_lower_to_upper_large_text_stdout_discarded should improve by 10-20% versus the 0.065s baseline on AVX2 hosts.
Covers ASCII range translation for a-z to A-Z at input lengths 0, 1, 31, 32, and 33 around the AVX2 lane width, plus all byte values and a UTF-8 boundary/non-ASCII case. Assertions cover exit code success, empty stderr, byte-exact stdout, and GNU tr parity when a GNU tr binary is available on PATH. Test command used in this repo: cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries. The requested package-scoped command cargo test --release -p uu_tr --test test_tr -- --nocapture --test-threads=1 test_ascii_range_translate_alignment_boundaries is unavailable because uu_tr has no test_tr target.
|
GNU testsuite comparison: |
Contributor
|
You need to add spell-checker: ignore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a narrow fast path for bytewise ASCII range translations such as:
The change detects translation tables that modify one contiguous ASCII range by a constant wrapping delta, then processes that range with an AVX2 range compare plus masked add on x86/x86_64 hosts that support AVX2. Other translations continue to use the existing single-byte or table-lookup paths, and non-AVX2 hosts use the scalar fallback.
Why
Before this change,
tr 'a-z' 'A-Z'mapped every byte through a scalar 256-byte translation table. This is a common case, and it can be handled more directly by checking whether each byte falls within the translated ASCII range and adding the fixed delta.Measurements
Environment:
rustc 1.92.0perf/P0024b5a2af7a916910bfeaf46b298a963d8a038565ahyperfinewas not installed, so this used/usr/bin/time, 2 warmups, and 12 measured runs.Input was
corpus/large_text.txtrepeated 16 times, 1,342,178,256 bytes total.The candidate is 2.54x faster than the uutils baseline on this workload, a 60.6% wall-time reduction. The earlier 80 MiB pipeline benchmark also showed a 53.8% reduction, from 0.065 s to 0.030 s.
Correctness
For the 1.3 GB input, baseline uutils, candidate uutils, and GNU
trproduced the same transformed output SHA256:Tests
cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries cargo clippy --release -p uu_tr -- -D warnings cargo fmt --check --package uu_trThe regression test covers 0, 1, 31, 32, and 33 byte inputs around the AVX2 lane width, all byte values, and a UTF-8/non-ASCII boundary case, with GNU parity when GNU
tris available.Caveats
The speedup is from the AVX2 path on this x86_64 host. Non-AVX2 targets use the scalar fallback and should be behavior-preserving, but I did not benchmark those targets here.