nucleo-matcher documentation: Please clarify what matching "indices" actually are.

The [Documentation](https://docs.rs/nucleo-matcher/0.3.1/nucleo_matcher/struct.Matcher.html) sounds like it would be character indices : `All .._indices functions will also compute the indices of the matched characters`.
The example code below shows that this is not the case.

Instead they are indices into nucleo's `Utf32Str`. And these are built by taking the first character of a grapheme(cluster):
https://github.com/helix-editor/nucleo/blob/5b74652e482f7c07d827f18c6d21e7540c242c69/matcher/src/chars.rs#L185-L207
But only if the unicode_segmentation feature is active (which is on by default), otherwise it actually is character indices.

And helix (which highlights matches in the picker, for example), also treats it as grapheme "indices":

https://github.com/helix-editor/helix/blob/cfb5158cd1b3e5e1962eda66e673c0c35b786046/helix-term/src/ui/picker.rs#L786-L790

Can you please clarify this in the doc, and perhaps provide an example showing how you would use those indices with the original haystack string.

A further note about the example below: It was compiled with the newest version from github. The crates.io version doesn't work at all and produces weird grapheme segmentation. So please push a new release.

```rust
use nucleo_matcher::pattern::{Atom, AtomKind, CaseMatching, Normalization};
use nucleo_matcher::Utf32Str;
use unicode_segmentation::UnicodeSegmentation;

fn test(haystack: &str, needle: &str) {
    let mut matcher = nucleo_matcher::Matcher::new(nucleo_matcher::Config::DEFAULT);

    let atom = Atom::new(
        needle,
        CaseMatching::default(),
        Normalization::default(),
        AtomKind::Substring,
        false,
    );

    let mut buf = Vec::new();
    let nucleo_string = Utf32Str::new(haystack, &mut buf);

    let characters = haystack.chars().collect::<Vec<_>>();
    let graphemes = UnicodeSegmentation::graphemes(haystack, true).collect::<Vec<_>>();
    let nucleo_chars = nucleo_string.chars().collect::<Vec<_>>();

    let matches = {
        let mut m = Vec::new();
        atom.indices(nucleo_string, &mut matcher, &mut m);
        m.into_iter().map(|a| a as usize).collect::<Vec<_>>()
    };

    println!("haystack: {}", haystack);
    println!("needle  : {}", needle);

    println!("characters  : {:?}", characters);
    println!("graphemes   : {:?}", graphemes);
    println!("nucleo chars: {:?}", nucleo_chars);

    println!("matching indices: {:?}", matches);
    println!("matching character: {:?}", characters.get(matches[0]));
    println!("matching grapheme: {:?}", graphemes.get(matches[0]));
    println!("matching nucleo chars: {:?}", nucleo_chars.get(matches[0]));
}

fn main() {
    test("abx", "x");
    println!();
    test("g̈bx", "x");
}
```
```
haystack: abx
needle  : x
characters  : ['a', 'b', 'x']
graphemes   : ["a", "b", "x"]
nucleo chars: ['a', 'b', 'x']
matching indices: [2]
matching character: Some('x')
matching grapheme: Some("x")
matching nucleo chars: Some('x')

haystack: g̈bx
needle  : x
characters  : ['g', '\u{308}', 'b', 'x']
graphemes   : ["g\u{308}", "b", "x"]
nucleo chars: ['g', 'b', 'x']
matching indices: [2]
matching character: Some('b')
matching grapheme: Some("x")
matching nucleo chars: Some('x')
```


	/// Nucleo cannot match graphemes as single units. To work around
	/// that we only use the first codepoint of each grapheme. This
	/// iterator returns the first character of each unicode grapheme
	/// in a string and is used for constructing `Utf32Str(ing)`.
	pub fn graphemes(text: &str) -> impl Iterator<Item = char> + '_ {
	#[cfg(feature = "unicode-segmentation")]
	let res = text.graphemes(true).map(\|grapheme\| {
	// we need to special-case this check since `\r\n` is a single grapheme and is
	// therefore the exception to the rule that normalization of a grapheme should
	// map to the first character.
	if grapheme == "\r\n" {
	'\n'
	} else {
	grapheme
	.chars()
	.next()
	.expect("graphemes must be non-empty")
	}
	});
	#[cfg(not(feature = "unicode-segmentation"))]
	let res = text.chars();
	res
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nucleo-matcher documentation: Please clarify what matching "indices" actually are. #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nucleo-matcher documentation: Please clarify what matching "indices" actually are. #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions