next up previous
Next: 9 Computer readable phonemic Up: No Title Previous: 7 Transcription and Labelling

8 Orthographic transcription

Orthographic transcription is used for large scale spoken language corpus work:

Computationally oriented projects pose the following requirements:

Orthography is lexical, canonical lexical forms are preferred.

Spelling and standard pronunciation are related (in pronunciation tables or by grapheme-phoneme rules).

Non-standard vocabulary items (noises, hesitation phenomena, fragmentation) need extensions.

Prosody needs extensions.

Non-standard pronunciation needs extensions (e.g. comments).

Uncertain identification (e.g. comments, comment marks).

For computation, `modified orthography' is not suitable for representing non-canonical pronunciation (style, dialect, social class).

A formal mapping to computational conventions is needed for interpretative systems (CHILDES, HIAT, Selting ...).



Dafydd Gibbon
Wed May 22 10:39:25 MET DST 1996