Skip to content

Introduce regex backend shim with opt-in feature flag#19982

Closed
LeonarddeR wants to merge 6 commits intonvaccess:masterfrom
LeonarddeR:regex
Closed

Introduce regex backend shim with opt-in feature flag#19982
LeonarddeR wants to merge 6 commits intonvaccess:masterfrom
LeonarddeR:regex

Conversation

@LeonarddeR
Copy link
Copy Markdown
Collaborator

@LeonarddeR LeonarddeR commented Apr 20, 2026

Link to issue number:

Closes #19977.

Summary of the issue:

NVDA relies on Python's stdlib re module for all internal regular-expression work, which limits Unicode support (notably for Hebrew with niqqud, Arabic, CJK, and other non-Latin scripts), offers no variable-width lookbehind, does not support full Unicode case folding, and holds the GIL during matching. The third-party regex module addresses all of these.

Description of user facing changes:

A new combobox has been added to the Advanced settings panel: "Regular expression backend (requires restart)". Users can choose between the default (stdlib re, preserving historical behavior), regex (opt-in), or explicitly re. Changes require an NVDA restart to take effect.

Description of developer facing changes:

  • A new internal module _regex acts as a thin shim over the active regex backend. All source/ modules that previously did import re now do import _regex as re.
  • The shim uses PEP 562 module-level __getattr__ to forward attribute access to whichever backend was selected at startup. The backend is chosen once by _regex.initialize() based on the regexBackend feature-flag and is then frozen for the lifetime of the process.
  • When the regex backend is selected, regex.DEFAULT_VERSION is set to VERSION1 during initialize() so that modern semantics (proper zero-width split handling, scoped inline flags, set operations in character classes, full Unicode case folding under IGNORECASE) apply without having to touch every pattern.
  • _regex.initialize() is called from core.py immediately after config.initialize().
  • A new regexBackend option has been added under [featureFlag] in configSpec.py.
  • The previously hard-coded from re import error as RegexpError in gui/speechDict.py has been replaced with except re.error so the error class tracks the active backend.
  • No existing regex patterns have been rewritten. The scope of this PR is strictly "allow a drop-in backend swap". Taking advantage of regex-only features (e.g. \p{...} Unicode properties, variable-width lookbehind, fuzzy matching, set operations) is deferred to follow-up PRs.
  • regex releases the GIL automatically during matching on immutable str/bytes inputs — this benefit is obtained automatically once a user opts in, no further code changes are required.

Description of development approach:

The shim is deliberately minimal. Rather than hand-writing proxy functions for every re API entry point, module-level __getattr__ forwards on demand. Flag constants (IGNORECASE, DOTALL, etc.) are exposed from the active backend so their bit values match what shim.compile(pattern, shim.IGNORECASE) feeds into the backend. shim.error likewise tracks the active backend so except re.error works correctly on either side.

A migration to make the regex backend the default (rather than opt-in) should be deferred to an API-breaking release such as 2027.1, once the opt-in has received enough real-world exposure.

Testing strategy:

  • New tests/unit/test_regex.py covers the shim: that compile returns a Pattern of the active backend's type, that shim.error catches errors from the active backend, that flag constants are exposed, that basic match/search/findall/sub/escape are callable, and that the API surface is callable pre-initialize() (fallback to stdlib re).

Known issues with pull request:

  • Toggling the setting requires an NVDA restart; attempting to change backends live was explored but proved unsafe because already-compiled Pattern objects are bound to whichever engine compiled them.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@LeonarddeR
Copy link
Copy Markdown
Collaborator Author

Closing. Awaiting narrower scope and discussion in #19977

@LeonarddeR LeonarddeR closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate from Python re to regex module to improve unicode compatibility

1 participant