-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Simplify format_integer_with_underscore_sep
#141369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Simplify format_integer_with_underscore_sep
#141369
Conversation
r? @notriddle rustbot has assigned @notriddle. Use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few issues but mainly seems like a decent idea to slim down an overly flexible test helper.
src/librustdoc/clean/utils.rs
Outdated
num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(char::from), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using as_bytes and char::from
like this is technically a logic error, as it will turn every byte of a UTF-8 sequence into the corresponding unicode codepoint.
it does happen to work here since we're never using non-ascii (if we were then reversing a string bytewise would be extremely problematic), but String::from_utf8(...).unwrap()
represents the intent much more cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that this function is hot enough to prefer performance over correctness,
but I think in this case it's quite obvious that this is ok since Display
ing integers will always result in an ASCII-only string. We can also collect into a vector of chars, like the old version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want to care about performance here, I would just use String::from_utf8_unchecked
. This should actually be more performant than char::from
since it fully erases any non-ascii code. it will also at least give us a panic on miri instead of just silent corruption if somehow this gets rewritten into something that does produce non-ascii.
there's also u8.as_ascii()
, since String impls Extend<ascii::Char>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way unsafe
code would be justified here.
as_ascii
seems like the right way to do this, since it panics if it's given non-ASCII text, but doesn't make the code much more complex:
fn format_integer_with_underscore_sep(num: u128, is_negative: bool) -> String {
let num = num.to_string();
let mut result = if is_negative { "-" } else { "" }.to_string();
result.extend(
num.as_bytes().rchunks(3).rev().intersperse(b"_").flatten().copied().map(|b| b.as_ascii().unwrap())
);
result
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wouldn't accept unsafe here, then I think we definitely shouldn't accept the potential of silent logic errors.
One nice thing is that String::from_utf8
has a concrete signature, so you can use collect
without type hints, but I'm really fine with any solution which isn't char::from
. If nothing else using char::from
in this way is a bad habit I don't think we should encourage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up using str::as_ascii
so there's only one unwrap
, IMHO it looks kinda nice now!
BTW, this is not a test helper, it's used in the generation of docs :) |
yeah i noticed that after your comment, github put the diff break in a very confusing spot :p |
Only ever needs to handle decimal reprs
2c9305a
to
5c735d1
Compare
@bors r+ rollup |
…r_with_underscore_sep, r=notriddle Simplify `format_integer_with_underscore_sep` Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes. Second commit is completely unrelated, just simplified some code I wrote a while back 😁
…r_with_underscore_sep, r=notriddle Simplify `format_integer_with_underscore_sep` Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes. Second commit is completely unrelated, just simplified some code I wrote a while back 😁
Rollup of 7 pull requests Successful merges: - #138896 (std: fix aliasing bug in UNIX process implementation) - #140832 (aarch64-linux: Default to FramePointer::NonLeaf) - #141065 (Updated std doctests for wasm) - #141369 (Simplify `format_integer_with_underscore_sep`) - #141374 (make shared_helpers exe function work for both cygwin and non-cygwin hosts) - #141398 (chore: fix typos in comment) - #141457 (Update mdbook to 0.4.50) Failed merges: - #141405 (GetUserProfileDirectoryW is now documented to always store the size) r? `@ghost` `@rustbot` modify labels: rollup
Noticed that this helper fn only ever gets called with decimal-base-formatted ints, so can be simplified a lot by not trying to handle hex and octal radixes.
Second commit is completely unrelated, just simplified some code I wrote a while back 😁