Introduce regex backend shim with opt-in feature flag#19982
Closed
LeonarddeR wants to merge 6 commits intonvaccess:masterfrom
Closed
Introduce regex backend shim with opt-in feature flag#19982LeonarddeR wants to merge 6 commits intonvaccess:masterfrom
LeonarddeR wants to merge 6 commits intonvaccess:masterfrom
Conversation
Collaborator
Author
|
Closing. Awaiting narrower scope and discussion in #19977 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to issue number:
Closes #19977.
Summary of the issue:
NVDA relies on Python's stdlib
remodule for all internal regular-expression work, which limits Unicode support (notably for Hebrew with niqqud, Arabic, CJK, and other non-Latin scripts), offers no variable-width lookbehind, does not support full Unicode case folding, and holds the GIL during matching. The third-partyregexmodule addresses all of these.Description of user facing changes:
A new combobox has been added to the Advanced settings panel: "Regular expression backend (requires restart)". Users can choose between the default (stdlib
re, preserving historical behavior),regex(opt-in), or explicitlyre. Changes require an NVDA restart to take effect.Description of developer facing changes:
_regexacts as a thin shim over the active regex backend. Allsource/modules that previously didimport renow doimport _regex as re.__getattr__to forward attribute access to whichever backend was selected at startup. The backend is chosen once by_regex.initialize()based on theregexBackendfeature-flag and is then frozen for the lifetime of the process.regexbackend is selected,regex.DEFAULT_VERSIONis set toVERSION1duringinitialize()so that modern semantics (proper zero-width split handling, scoped inline flags, set operations in character classes, full Unicode case folding underIGNORECASE) apply without having to touch every pattern._regex.initialize()is called fromcore.pyimmediately afterconfig.initialize().regexBackendoption has been added under[featureFlag]inconfigSpec.py.from re import error as RegexpErroringui/speechDict.pyhas been replaced withexcept re.errorso the error class tracks the active backend.regex-only features (e.g.\p{...}Unicode properties, variable-width lookbehind, fuzzy matching, set operations) is deferred to follow-up PRs.regexreleases the GIL automatically during matching on immutablestr/bytesinputs — this benefit is obtained automatically once a user opts in, no further code changes are required.Description of development approach:
The shim is deliberately minimal. Rather than hand-writing proxy functions for every
reAPI entry point, module-level__getattr__forwards on demand. Flag constants (IGNORECASE,DOTALL, etc.) are exposed from the active backend so their bit values match whatshim.compile(pattern, shim.IGNORECASE)feeds into the backend.shim.errorlikewise tracks the active backend soexcept re.errorworks correctly on either side.A migration to make the
regexbackend the default (rather than opt-in) should be deferred to an API-breaking release such as 2027.1, once the opt-in has received enough real-world exposure.Testing strategy:
tests/unit/test_regex.pycovers the shim: thatcompilereturns aPatternof the active backend's type, thatshim.errorcatches errors from the active backend, that flag constants are exposed, that basicmatch/search/findall/sub/escapeare callable, and that the API surface is callable pre-initialize()(fallback to stdlibre).Known issues with pull request:
Patternobjects are bound to whichever engine compiled them.Code Review Checklist: