Skip to content

refactor(markdown): replace htmlparser2 with regex-based sanitizer#2040

Open
gary149 wants to merge 1 commit intomainfrom
fix/regex-sanitizer
Open

refactor(markdown): replace htmlparser2 with regex-based sanitizer#2040
gary149 wants to merge 1 commit intomainfrom
fix/regex-sanitizer

Conversation

@gary149
Copy link
Collaborator

@gary149 gary149 commented Jan 9, 2026

Summary

  • Replaces htmlparser2 (56KB gzipped) with a zero-dependency regex-based HTML sanitizer
  • Maintains Web Worker compatibility (no DOM required)
  • Same security guarantees with fail-closed approach

Changes

  • Remove htmlparser2 dependency from package.json
  • Replace DOM-based sanitization with regex-based sanitizeMediaHtml() function
  • Only allows video, audio, source tags with strict attribute allowlist
  • Blocks javascript:, vbscript:, and data:text/html URIs
  • If ANY disallowed content is detected, escapes the entire input

Bundle Size Impact

  • Before (htmlparser2): +56KB gzipped
  • After (regex): +0KB (no new dependencies)

Test plan

  • All existing tests pass (14/14)
  • TypeScript check passes
  • Video/audio tags render correctly
  • Event handlers stripped from media tags
  • JavaScript/VBScript URLs blocked
  • Disallowed tags escaped entirely

Replace htmlparser2 (56KB gzipped) with a zero-dependency regex-based
HTML sanitizer for video/audio/source tags.

Security approach: fail-closed
- Only video, audio, source tags allowed
- Strict attribute allowlist (src, controls, type, etc.)
- Block javascript:, vbscript:, and data:text/html URIs
- If ANY disallowed content detected, escape entire input
@gary149 gary149 force-pushed the fix/regex-sanitizer branch from 0f70018 to 310c50a Compare January 9, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant