Next: Text objects: HTML elements
Up: From structured text to
Previous: HTML tokens
Parser: an algorithm (or a programme based on an algorithm)
- for analysing token streams into hierarchically arranged token groups, operationalising a function with the following components:
- a grammar, including
- syntax rules and
- a lexicon, containing:
- a definition and list of special tokens (in HTML: tags)
- a definition of atomic tokens
- a list of complex elements (text objects)
The parser defines an interpretation function which maps the
token stream into an HTML document structure:
PARSER: TOKENSTREAM
DOCUMENT_STRUCTURE
For the enthusiasts ...
In general, HTML can be defined formally as a context-free
(Chomsky Type 2) language, and a TEXT_STRUCTURE can be represented
as a tree.
© Dafydd Gibbon
Mon Jul 13 18:34:24 MET DST 1998