Commit a1805b9
committed
Merge branch 'Fix utf8 corner cases' into blead
There are around 20 different functions that take a UTF-8 sequence of
bytes and try to find the ordinal code point represented by them. It was
becoming clear that the existing tests in our suite were inadequate, not
finding glaring bugs. And UTF-8 handling is important, with failures in
it having been exploited by hackers in various products over the years
for various nefarious purposes.
I set out to improve the tests, spending way too much time before
realizing that adding band aids to the current scheme was not going to
work out. So I undertook rewriting the tests. This turned out to be way
harder and time consuming than I expected. And it still isn't ready to
go into blead. But along the way, I discovered that it was finding
corner case bugs that I would never have anticipated. This series of
commits fixes those, while simplifying the code and reducing redundancy.
The new test file needs clean-up, and probably ways to make it faster,
but it is finally far enough along that I believe it has caught most of
the bugs out there. So I'm submitting these now to get into v5.42. The
deadline for the test file is later in the development process.4 files changed
+569
-558
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2020 | 2020 | | |
2021 | 2021 | | |
2022 | 2022 | | |
| 2023 | + | |
| 2024 | + | |
| 2025 | + | |
2023 | 2026 | | |
2024 | 2027 | | |
| 2028 | + | |
2025 | 2029 | | |
2026 | 2030 | | |
2027 | 2031 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3244 | 3244 | | |
3245 | 3245 | | |
3246 | 3246 | | |
3247 | | - | |
3248 | 3247 | | |
3249 | 3248 | | |
3250 | 3249 | | |
| |||
0 commit comments