Description
Right now envy uses regex-based tokeniser (until #218 at least).
We have 2 drop-in replacement regex engines: default from Go RE2 and default from Ruby oniguruma.
Recent improvements done with Go module migration #219 surfaced a new issue: it seems that tokeniser produces a bit different results, depending on which regex engine is used :/
More specifically, the token frequencies built from linguist samples are different and high-level code-generator test catch by comparing with a fixture (pre-generated with RE2) and fail on oniguruma profiles like this #219 (comment)
We need to find the exact reason and depending on it decide, if we want to support 2 versions of fixtures or change something so there is no difference in output.
This also potentially affects #194