-
Notifications
You must be signed in to change notification settings - Fork 158
Improve utf8LengthByLeader #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 I'm inclined to keep it simple and use only the branching version, with the hope that branch prediction will work well enough even in non-English languages. I'm not qualified to have anything to say about comparing the branchless versions. I get pretty much the same numbers on my i7-10750H, GHC 9.12.1. |
Benchmarks on aarch64:
|
It would be interesting to figure out why. The assembly looks good to me: https://play.haskell.org/saved/ryiSjDbk |
A couple more lookup-based branchless versions that work slightly better: utf8LengthByLeader_lookupInt :: Word8 -> Int
utf8LengthByLeader_lookupInt w8 =
1 + (0x3 .&. unsafeShiftR a (unsafeShiftL (unsafeShiftR (w8ToI w8) 4) 1))
where
a = 0xE5000000 :: Int
{-# INLINE utf8LengthByLeader_lookupInt #-}
utf8LengthByLeader_lookupInt64 :: Word8 -> Int
utf8LengthByLeader_lookupInt64 w8 =
0xF .&. i64ToI (unsafeShiftR a (unsafeShiftL (unsafeShiftR (w8ToI w8) 4) 2))
where
a = 0x4322111111111111 :: Int64
{-# INLINE utf8LengthByLeader_lookupInt64 #-}
i64ToI :: Int64 -> Int
i64ToI = fromIntegral
w8ToI :: Word8 -> Int
w8ToI = fromIntegral
I agree that if you'd rather have just one version, it should be the branching one.
I wish I could explain it, but you'll need someone more knowledgeable in this area. |
I propose to move forward with the branching version. Neat trick to make the lookup-table fit an |
@Bodigrim what do you think? |
I don't have time to think about it right now, I'm happy to defer judgement to @Lysxia. So I guess you are good to go. |
utf8LengthByLeader
is widely used, appearing in most functions that perform iteration as it's used instream
,iter
, etc. The current definition istext/src/Data/Text/Internal/Encoding/Utf8.hs
Lines 83 to 99 in 5e57460
However,
Char
. This is done by branching on the result and checking if it's 1, 2, 3, or something else (4). This means that everythingutf8LengthByLeader
does is a waste! One can simply branch on the leading byte, as shown in the comment above.Here are the implementations:
I benchmarked the implementations on two situations:
sum
: branches on the resultlength
: does not branch on the resultThe results are
So I think it would be good to
utf8LengthByLeader
and use it where one needs to branch on the resultsimple
orlookup
.lookup
is sometimes faster but it means inlining that table everywhere.It would also be good if someone can repeat these on a different CPU or arch to see if anything changes. The results above are from GHC 9.10.1 on an x86 Ryzen 5 3600.
Full benchmark source below.
Show
The text was updated successfully, but these errors were encountered: