Next: HTML tokens
Up: From structured text to
Previous: Structured text: hierarchy and
- Tokenisation (lexical analysis): the analysis of a stream of characters into minimal interpretable character sequences (symbols, tokens)
- File: stream of characters
- Text: stream of tokens
- Token:
- enclosed in separators
- tag
- Separator:
- White space: SP (space), NL (carriage return and/or linefeed)
- Special characters (e.g. <, >, =)
- BOF (beginning of file)
- EOF (end of file)
© Dafydd Gibbon
Mon Jul 13 18:34:24 MET DST 1998