Next: Analysis: Parsing HTML token
Up: Abidjan Course on Hypertext
Previous: Analysis: Tokenisation of HTML
- Single space: [ SP | NL ]
(i.e. a sequence of at least one white space character)
- Atoms:
- sequence of non-special alphanumeric characters bounded by separators
- sequence of any character except ", and enclosed by "..." (or any character except ', and enclosed by '...')
- Tags: Start-tag | End-tag
- Start-tag: < Atom Property* >, e.g.
<P ALIGN=CENTER>
- End-tag: < / Atom >
</P>, </TABLE>
- Property: Atom = Atom | Atom, e.g.
ALIGN=CENTER, NOSHADE
- Upper/lower case: atoms in HTML tags are case-insensitive; exceptions are addresses.
- Text: sequence of atoms separated by white space
Dafydd Gibbon, Sat Oct 17 18:58:17 CEST 1998