Skip to content

Position of the parse result offset in case of status_bad_start_element #606

@lo-asys

Description

@lo-asys

Hi,
according to the docs, the offset field of a parse result points to the last successfully parsed character in the input data. In case of a status_bad_start_element, this seems to be the last scan position of the parser at the point the error was thrown.
I'd like to suggest a change here: The offset should point to the position of the opening '<' of the bad tag instead.
The specific use-case where this would be helpful is receiving a stream of XML messages over the network, where a single message may be split across multiple network packages like so:

P1: '<a x="y" /><b foo="bar" '
P2: ' baz="blob"></b><c />'

In this case, the receiver wants to store the substring containing the incomplete element b in package 1 and prepend it to the content of package 2 on the next iteration to fully parse it there.
Doing this would be much easier if pugixml reported the offset of the opening '<' here.

I'm not currently aware of other common usecases of the offset value in this error scenario (it's my first project using this library 😉), but if other users might find this helpful too, I'd be glad if you considered it.

Greetings, and thanks for your good work!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions