Next: Analysis: Tokenisation of HTML
Up: Abidjan Course on Hypertext
Previous: Some coding devices
- Characters:
- smallest units (cf. phonemes)
- in many document description languages coded with ASCII code sequences (exception: internal representations of word processors)
- Tokens, symbols:
- smallest interpretable units (cf. morphemes)
- identified by tokenisation, and
- generally describable with a regular grammar (or a finite state automaton)
- Objects:
- groups of tokens with complex meaning (cf. phrases)
- identified by parsing and
- generally describably with a context-free grammar (or a push-down automaton)
Dafydd Gibbon, Sat Oct 17 18:58:17 CEST 1998