Skip to content

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

@langdonr621

Description

@langdonr621

hi there,

i'm trying to use your library (0.7.0) but am facing a problem when parsing, for example, https://www.unicode.org/reports/tr29/#Word_Boundaries. the error thrown is similar to:

running 1 test
Failed :(  --> 1:23
 |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
 |                       ^---
 |
 = expected attribute key

here's a test function showing the problem:

    #[test]
    fn test_html_parser() {
        const HTML: &str = r#"<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html></html>"#;

        if let Err(x) = html_parser::Dom::parse(HTML) {
            println!("Failed :( {}", x)
        }
    }

this wikipedia page claims the directive's syntax is correct.

am i correct in assuming this is a bug with the grammar's rules used?

if not would appreciate it if you can indicate a work-around.

TIA + cheers;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions