Simplify `format_integer_with_underscore_sep` #141369

yotamofek · 2025-05-21T21:14:13Z

Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes.
Second commit is completely unrelated, just simplified some code I wrote a while back 😁

rustbot · 2025-05-21T21:14:17Z

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

lolbinarycat

a few issues but mainly seems like a decent idea to slim down an overly flexible test helper.

lolbinarycat · 2025-05-21T23:56:56Z

src/librustdoc/clean/utils.rs

+        num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(char::from),
+    )


using as_bytes and char::from like this is technically a logic error, as it will turn every byte of a UTF-8 sequence into the corresponding unicode codepoint.

it does happen to work here since we're never using non-ascii (if we were then reversing a string bytewise would be extremely problematic), but String::from_utf8(...).unwrap() represents the intent much more cleanly.

Not that this function is hot enough to prefer performance over correctness,
but I think in this case it's quite obvious that this is ok since Displaying integers will always result in an ASCII-only string. We can also collect into a vector of chars, like the old version.

If we do want to care about performance here, I would just use String::from_utf8_unchecked. This should actually be more performant than char::from since it fully erases any non-ascii code. it will also at least give us a panic on miri instead of just silent corruption if somehow this gets rewritten into something that does produce non-ascii.

there's also u8.as_ascii(), since String impls Extend<ascii::Char>.

There's no way unsafe code would be justified here.

as_ascii seems like the right way to do this, since it panics if it's given non-ASCII text, but doesn't make the code much more complex:

fn format_integer_with_underscore_sep(num: u128, is_negative: bool) -> String { let num = num.to_string(); let mut result = if is_negative { "-" } else { "" }.to_string(); result.extend( num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(|b| b.as_ascii().unwrap()) ); result }

If we wouldn't accept unsafe here, then I think we definitely shouldn't accept the potential of silent logic errors.

One nice thing is that String::from_utf8 has a concrete signature, so you can use collect without type hints, but I'm really fine with any solution which isn't char::from. If nothing else using char::from in this way is a bad habit I don't think we should encourage.

Ended up using str::as_ascii so there's only one unwrap, IMHO it looks kinda nice now!

src/librustdoc/clean/utils.rs

yotamofek · 2025-05-22T10:20:48Z

a few issues but mainly seems like a decent idea to slim down an overly flexible test helper.

BTW, this is not a test helper, it's used in the generation of docs :)

lolbinarycat · 2025-05-22T17:10:18Z

BTW, this is not a test helper, it's used in the generation of docs :)

yeah i noticed that after your comment, github put the diff break in a very confusing spot :p

Only ever needs to handle decimal reprs

notriddle · 2025-05-23T16:13:19Z

@bors r+ rollup

bors · 2025-05-23T16:13:21Z

📌 Commit 5c735d1 has been approved by notriddle

It is now in the queue for this repository.

…r_with_underscore_sep, r=notriddle Simplify `format_integer_with_underscore_sep` Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes. Second commit is completely unrelated, just simplified some code I wrote a while back 😁

Rollup of 7 pull requests Successful merges: - #138896 (std: fix aliasing bug in UNIX process implementation) - #140832 (aarch64-linux: Default to FramePointer::NonLeaf) - #141065 (Updated std doctests for wasm) - #141369 (Simplify `format_integer_with_underscore_sep`) - #141374 (make shared_helpers exe function work for both cygwin and non-cygwin hosts) - #141398 (chore: fix typos in comment) - #141457 (Update mdbook to 0.4.50) Failed merges: - #141405 (GetUserProfileDirectoryW is now documented to always store the size) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #141369 - yotamofek:pr/rustdoc/format_integer_with_underscore_sep, r=notriddle Simplify `format_integer_with_underscore_sep` Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes. Second commit is completely unrelated, just simplified some code I wrote a while back 😁

rustbot assigned notriddle May 21, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels May 21, 2025

lolbinarycat reviewed May 22, 2025

View reviewed changes

yotamofek added 2 commits May 23, 2025 12:37

Simplify format_integer_with_underscore_sep

5b47d34

Only ever needs to handle decimal reprs

Small cleanup for qpath_to_string

5c735d1

yotamofek force-pushed the pr/rustdoc/format_integer_with_underscore_sep branch from 2c9305a to 5c735d1 Compare May 23, 2025 12:38

yotamofek mentioned this pull request May 23, 2025

Add FromIterator impls for ascii::Chars to Strings #141445

Open

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 23, 2025

notriddle approved these changes May 23, 2025

View reviewed changes

This was referenced May 23, 2025

Rollup of 2 pull requests #141461

Closed

Rollup of 7 pull requests #141463

Merged

bors merged commit a4836e9 into rust-lang:master May 23, 2025
6 checks passed

rustbot added this to the 1.89.0 milestone May 23, 2025

yotamofek deleted the pr/rustdoc/format_integer_with_underscore_sep branch May 23, 2025 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify `format_integer_with_underscore_sep` #141369

Simplify `format_integer_with_underscore_sep` #141369

Uh oh!

yotamofek commented May 21, 2025

Uh oh!

rustbot commented May 21, 2025

Uh oh!

lolbinarycat left a comment

Uh oh!

lolbinarycat May 21, 2025

Uh oh!

yotamofek May 22, 2025

Uh oh!

lolbinarycat May 22, 2025

Uh oh!

notriddle May 22, 2025

Uh oh!

lolbinarycat May 22, 2025

Uh oh!

yotamofek May 23, 2025

Uh oh!

Uh oh!

yotamofek commented May 22, 2025

Uh oh!

lolbinarycat commented May 22, 2025

Uh oh!

notriddle commented May 23, 2025

Uh oh!

bors commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

		num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(char::from),
		)

Simplify format_integer_with_underscore_sep #141369

Simplify format_integer_with_underscore_sep #141369

Uh oh!

Conversation

yotamofek commented May 21, 2025

Uh oh!

rustbot commented May 21, 2025

Uh oh!

lolbinarycat left a comment

Choose a reason for hiding this comment

Uh oh!

lolbinarycat May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yotamofek May 22, 2025

Choose a reason for hiding this comment

Uh oh!

lolbinarycat May 22, 2025

Choose a reason for hiding this comment

Uh oh!

notriddle May 22, 2025

Choose a reason for hiding this comment

Uh oh!

lolbinarycat May 22, 2025

Choose a reason for hiding this comment

Uh oh!

yotamofek May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yotamofek commented May 22, 2025

Uh oh!

lolbinarycat commented May 22, 2025

Uh oh!

notriddle commented May 23, 2025

Uh oh!

bors commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Simplify `format_integer_with_underscore_sep` #141369

Simplify `format_integer_with_underscore_sep` #141369