-
Notifications
You must be signed in to change notification settings - Fork 770
Description
Hi,
according to the docs, the offset
field of a parse result points to the last successfully parsed character in the input data. In case of a status_bad_start_element
, this seems to be the last scan position of the parser at the point the error was thrown.
I'd like to suggest a change here: The offset should point to the position of the opening '<' of the bad tag instead.
The specific use-case where this would be helpful is receiving a stream of XML messages over the network, where a single message may be split across multiple network packages like so:
P1: '<a x="y" /><b foo="bar" '
P2: ' baz="blob"></b><c />'
In this case, the receiver wants to store the substring containing the incomplete element b
in package 1 and prepend it to the content of package 2 on the next iteration to fully parse it there.
Doing this would be much easier if pugixml reported the offset of the opening '<' here.
I'm not currently aware of other common usecases of the offset value in this error scenario (it's my first project using this library 😉), but if other users might find this helpful too, I'd be glad if you considered it.
Greetings, and thanks for your good work!