Skip to content

tr: add ASCII range translation fast path#12118

Open
parasol-aser wants to merge 2 commits intouutils:mainfrom
parasol-aser:perf/P002
Open

tr: add ASCII range translation fast path#12118
parasol-aser wants to merge 2 commits intouutils:mainfrom
parasol-aser:perf/P002

Conversation

@parasol-aser
Copy link
Copy Markdown

What

Adds a narrow fast path for bytewise ASCII range translations such as:

tr 'a-z' 'A-Z'

The change detects translation tables that modify one contiguous ASCII range by a constant wrapping delta, then processes that range with an AVX2 range compare plus masked add on x86/x86_64 hosts that support AVX2. Other translations continue to use the existing single-byte or table-lookup paths, and non-AVX2 hosts use the scalar fallback.

Why

Before this change, tr 'a-z' 'A-Z' mapped every byte through a scalar 256-byte translation table. This is a common case, and it can be handled more directly by checking whether each byte falls within the translated ASCII range and adding the fixed delta.

Measurements

Environment:

  • CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
  • OS: Linux x86_64
  • Rust: rustc 1.92.0
  • Candidate branch: perf/P002
  • Baseline commit: 4b5a2af7a916910bfeaf46b298a963d8a038565a
  • hyperfine was not installed, so this used /usr/bin/time, 2 warmups, and 12 measured runs.

Input was corpus/large_text.txt repeated 16 times, 1,342,178,256 bytes total.

/usr/bin/time -f '%e %M' ./runs/P002-rerun/bin/tr-baseline 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null
/usr/bin/time -f '%e %M' ./runs/P002-rerun/bin/tr-p002 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null
/usr/bin/time -f '%e %M' /usr/bin/tr 'a-z' 'A-Z' < runs/P002-rerun/input/large_text_x16.txt > /dev/null
implementation mean wall time stddev throughput
uutils baseline 1.068 s 0.129 s 1198.1 MiB/s
uutils candidate 0.421 s 0.029 s 3041.6 MiB/s
GNU tr 1.251 s 0.209 s 1023.3 MiB/s

The candidate is 2.54x faster than the uutils baseline on this workload, a 60.6% wall-time reduction. The earlier 80 MiB pipeline benchmark also showed a 53.8% reduction, from 0.065 s to 0.030 s.

Correctness

For the 1.3 GB input, baseline uutils, candidate uutils, and GNU tr produced the same transformed output SHA256:

6f2d6cb371ca0b423a90a5690ee7f6dac0be6a7d889f308ff5b15f2957e853db

Tests

cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries
cargo clippy --release -p uu_tr -- -D warnings
cargo fmt --check --package uu_tr

The regression test covers 0, 1, 31, 32, and 33 byte inputs around the AVX2 lane width, all byte values, and a UTF-8/non-ASCII boundary case, with GNU parity when GNU tr is available.

Caveats

The speedup is from the AVX2 path on this x86_64 host. Non-AVX2 targets use the scalar fallback and should be behavior-preserving, but I did not benchmark those targets here.

jeffhuang added 2 commits May 1, 2026 22:36
Files: src/uu/tr/src/operation.rs, src/uu/tr/src/simd.rs.

Mechanism: detect translation tables that change one contiguous ASCII range by a constant wrapping delta, then process those chunks with an AVX2 range-compare and masked add kernel with scalar fallback. The existing single-byte and table-lookup paths remain for non-affine translations.

Predicted delta: tr/tr_lower_to_upper_large_text_stdout_discarded should improve by 10-20% versus the 0.065s baseline on AVX2 hosts.
Covers ASCII range translation for a-z to A-Z at input lengths 0, 1, 31, 32, and 33 around the AVX2 lane width, plus all byte values and a UTF-8 boundary/non-ASCII case.

Assertions cover exit code success, empty stderr, byte-exact stdout, and GNU tr parity when a GNU tr binary is available on PATH.

Test command used in this repo: cargo test --release --test tests -- --nocapture --test-threads=1 test_tr::test_ascii_range_translate_alignment_boundaries. The requested package-scoped command cargo test --release -p uu_tr --test test_tr -- --nocapture --test-threads=1 test_ascii_range_translate_alignment_boundaries is unavailable because uu_tr has no test_tr target.
@parasol-aser parasol-aser marked this pull request as ready for review May 2, 2026 03:12
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/cut/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/printf/printf-surprise is now being skipped but was previously passing.

@oech3
Copy link
Copy Markdown
Contributor

oech3 commented May 2, 2026

You need to add spell-checker: ignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants