Skip to content

Simplify format_integer_with_underscore_sep #141369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yotamofek
Copy link
Contributor

Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes.
Second commit is completely unrelated, just simplified some code I wrote a while back 😁

@rustbot
Copy link
Collaborator

rustbot commented May 21, 2025

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels May 21, 2025
Copy link
Contributor

@lolbinarycat lolbinarycat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few issues but mainly seems like a decent idea to slim down an overly flexible test helper.

Comment on lines 391 to 392
num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(char::from),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using as_bytes and char::from like this is technically a logic error, as it will turn every byte of a UTF-8 sequence into the corresponding unicode codepoint.

it does happen to work here since we're never using non-ascii (if we were then reversing a string bytewise would be extremely problematic), but String::from_utf8(...).unwrap() represents the intent much more cleanly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this function is hot enough to prefer performance over correctness,
but I think in this case it's quite obvious that this is ok since Displaying integers will always result in an ASCII-only string. We can also collect into a vector of chars, like the old version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do want to care about performance here, I would just use String::from_utf8_unchecked. This should actually be more performant than char::from since it fully erases any non-ascii code. it will also at least give us a panic on miri instead of just silent corruption if somehow this gets rewritten into something that does produce non-ascii.

there's also u8.as_ascii(), since String impls Extend<ascii::Char>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way unsafe code would be justified here.

as_ascii seems like the right way to do this, since it panics if it's given non-ASCII text, but doesn't make the code much more complex:

fn format_integer_with_underscore_sep(num: u128, is_negative: bool) -> String {
    let num = num.to_string();
    let mut result = if is_negative { "-" } else { "" }.to_string();
    result.extend(
        num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(|b| b.as_ascii().unwrap())
    );
    result
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wouldn't accept unsafe here, then I think we definitely shouldn't accept the potential of silent logic errors.

One nice thing is that String::from_utf8 has a concrete signature, so you can use collect without type hints, but I'm really fine with any solution which isn't char::from. If nothing else using char::from in this way is a bad habit I don't think we should encourage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up using str::as_ascii so there's only one unwrap, IMHO it looks kinda nice now!

@yotamofek
Copy link
Contributor Author

a few issues but mainly seems like a decent idea to slim down an overly flexible test helper.

BTW, this is not a test helper, it's used in the generation of docs :)

@lolbinarycat
Copy link
Contributor

BTW, this is not a test helper, it's used in the generation of docs :)

yeah i noticed that after your comment, github put the diff break in a very confusing spot :p

@yotamofek yotamofek force-pushed the pr/rustdoc/format_integer_with_underscore_sep branch from 2c9305a to 5c735d1 Compare May 23, 2025 12:38
@notriddle
Copy link
Contributor

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented May 23, 2025

📌 Commit 5c735d1 has been approved by notriddle

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 23, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 23, 2025
…r_with_underscore_sep, r=notriddle

Simplify `format_integer_with_underscore_sep`

Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes.
Second commit is completely unrelated, just simplified some code I wrote a while back 😁
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 23, 2025
…r_with_underscore_sep, r=notriddle

Simplify `format_integer_with_underscore_sep`

Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes.
Second commit is completely unrelated, just simplified some code I wrote a while back 😁
bors added a commit that referenced this pull request May 23, 2025
Rollup of 7 pull requests

Successful merges:

 - #138896 (std: fix aliasing bug in UNIX process implementation)
 - #140832 (aarch64-linux: Default to FramePointer::NonLeaf)
 - #141065 (Updated std doctests for wasm)
 - #141369 (Simplify `format_integer_with_underscore_sep`)
 - #141374 (make shared_helpers exe function work for both cygwin and non-cygwin hosts)
 - #141398 (chore: fix typos in comment)
 - #141457 (Update mdbook to 0.4.50)

Failed merges:

 - #141405 (GetUserProfileDirectoryW is now documented to always store the size)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants