Error-recovering streaming HTML5 and XML parsers for OCaml
Description
Markup.ml is a pair of parsers implementing the HTML5 and XML
specifications, including error recovery. Usage is simple, because each
parser is a function from byte streams to parsing signal streams.
In addition to being error-correcting, the parsers are:
- **streaming**: parsing partial input and emitting signals while more
input is still being received;
- **lazy**: not parsing input unless you have requested the next parsing
signal, so you can easily stop parsing part-way through a document;
- **non-blocking**: they can be used with Lwt, but still provide a
straightforward synchronous interface for simple usage; and
- **one-pass**: memory consumption is limited since the parsers don't
build up a document representation, nor buffer input beyond a small
amount of lookahead.
The parsers detect character encodings automatically, and emit everything
in UTF-8. The HTML parser understands SVG and MathML, in addition to
HTML5.