|
| 1 | +# Challenge 20: Verify the safety of char-related functions in str::pattern |
| 2 | + |
| 3 | +- **Status:** Open |
| 4 | +- **Tracking Issue:** [#277](https://github.com/model-checking/verify-rust-std/issues/277) |
| 5 | +- **Start date:** *2025-03-07* |
| 6 | +- **End date:** *2025-10-17* |
| 7 | +- **Reward:** *25000 USD* |
| 8 | + |
| 9 | +------------------- |
| 10 | +## Goal |
| 11 | +Verify the safety of char-related `Searcher` methods in `str::pattern`. |
| 12 | + |
| 13 | +## Motivation |
| 14 | + |
| 15 | +String and `str` types are widely used in Rust programs, so it is important that their associated functions do not cause undefined behavior. |
| 16 | + |
| 17 | +## Description |
| 18 | + |
| 19 | +The following str library functions are generic over the `Pattern` trait (https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html): |
| 20 | +- `contains` |
| 21 | +- `starts_with` |
| 22 | +- `ends_with` |
| 23 | +- `find` |
| 24 | +- `rfind` |
| 25 | +- `split` |
| 26 | +- `split_inclusive` |
| 27 | +- `rsplit` |
| 28 | +- `split_terminator` |
| 29 | +- `rsplit_terminator` |
| 30 | +- `splitn` |
| 31 | +- `rsplitn` |
| 32 | +- `split_once` |
| 33 | +- `rsplit_once` |
| 34 | +- `rmatches` |
| 35 | +- `match_indices` |
| 36 | +- `rmatch_indices` |
| 37 | +- `trim_matches` |
| 38 | +- `trim_start_matches` |
| 39 | +- `strip_prefix` |
| 40 | +- `strip_suffix` |
| 41 | +- `trim_end_matches` |
| 42 | +These functions accept a pattern as input, then call [into_searcher](https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html#tymethod.into_searcher) to create a [Searcher](https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html#associatedtype.Searcher) for the pattern. They use this `Searcher` to perform their desired operations (split, find, etc.). |
| 43 | +Those functions are implemented in (library/core/src/str/mod.rs), but the core of them are the searching algorithms which are implemented in (library/core/src/str/pattern.rs). |
| 44 | + |
| 45 | +### Assumptions |
| 46 | + |
| 47 | +**Important note:** for this challenge, you can assume: |
| 48 | +1. The safety and functional correctness of all functions in `slice` module. |
| 49 | +2. That all functions in (library/core/src/str/validations.rs) are functionally correct (consistent with the UTF-8 encoding description in https://en.wikipedia.org/wiki/UTF-8). |
| 50 | +3. That all the Searchers in (library/core/src/str/iter.rs) are created by the into_searcher(_, haystack) with haystack being a valid UTF-8 string (str). You can assume any UTF-8 string property of haystack. |
| 51 | + |
| 52 | +Verify the safety of the functions in (library/core/src/str/pattern.rs) listed in the next section. |
| 53 | + |
| 54 | +The safety properties we are targeting are: |
| 55 | +1. No undefined behavior occurs after the Searcher is created. |
| 56 | +2. The impls of unsafe traits `Searcher` and `ReverseSearcher` satisfy the SAFETY condition stated in the file: |
| 57 | +``` |
| 58 | +/// The trait is marked unsafe because the indices returned by the |
| 59 | +/// [`next()`][Searcher::next] methods are required to lie on valid utf8 |
| 60 | +/// boundaries in the haystack. This enables consumers of this trait to |
| 61 | +/// slice the haystack without additional runtime checks. |
| 62 | +``` |
| 63 | +This property should hold for next_back() of `ReverseSearcher` too. |
| 64 | + |
| 65 | + |
| 66 | +### Success Criteria |
| 67 | + |
| 68 | +Verify the safety of the following functions in (library/core/src/str/pattern.rs) : |
| 69 | +- `next` |
| 70 | +- `next_match` |
| 71 | +- `next_back` |
| 72 | +- `next_match_back` |
| 73 | +- `next_reject` |
| 74 | +- `next_back_reject` |
| 75 | +for the following `Searcher`s: |
| 76 | +- `CharSearcher` |
| 77 | +- `MultiCharEqSearcher` |
| 78 | +- `CharArraySearcher` |
| 79 | +- `CharArrayRefSearcher` |
| 80 | +- `CharSliceSearcher` |
| 81 | +- `CharPredicateSearcher` |
| 82 | + |
| 83 | +The verification is considered successful if for each `Searcher` above, you can specify a condition (a "type invariant") `C` and prove that: |
| 84 | +1. If the `Searcher` is created from any valid UTF-8 haystack, it satisfies `C`. |
| 85 | +2. If the `Searcher` satisfies `C`, it ensures the two safety properties mentioned in the previous section. |
| 86 | +3. If the `Searcher` satisfies `C`, after it calls any function above and gets modified, it still satisfies `C`. |
| 87 | + |
| 88 | +Furthermore, you must prove the absence of undefined behaviors listed in the next section. |
| 89 | + |
| 90 | +The verification must be unbounded---it must hold for inputs of arbitrary size. |
| 91 | + |
| 92 | +### List of UBs |
| 93 | + |
| 94 | +All proofs must automatically ensure the absence of the following undefined behaviors [ref](https://github.com/rust-lang/reference/blob/142b2ed77d33f37a9973772bd95e6144ed9dce43/src/behavior-considered-undefined.md): |
| 95 | + |
| 96 | +* Accessing (loading from or storing to) a place that is dangling or based on a misaligned pointer. |
| 97 | +* Reading from uninitialized memory except for padding or unions. |
| 98 | +* Mutating immutable bytes. |
| 99 | +* Producing an invalid value |
| 100 | + |
| 101 | + |
| 102 | +Note: All solutions to verification challenges need to satisfy the criteria established in the [challenge book](../general-rules.md) |
| 103 | +in addition to the ones listed above. |
0 commit comments