Next: Types of transcription
Up: 14.05.19984: Dialogue: levels
Previous: 14.05.19984: Dialogue: levels
A down-to-earth approach to this problem would list the chain of shared
activities which feed into most R&D in the field, for example:
- Transcription: `canonical' (orthographic, phonemic, prosodic, paralinguistic) and `detailed'; often by semi-automatic methods.
- Annotation (labelling): Alignment of transcription symbols with points or intervals in the speech signal, using signal processing and display software such as ESPS/waves+, or automatic alignment software.
- Normalisation: the transcription is normalised (parsed, corrected, re-formatted) for further processing.
- Lexicon extraction: the basic units (generally fully-inflected words) in the lexicon are extracted from the corpus to serve as the basis for further processing.
- Statistical training of speech recognisers using the lexicon and the labelled data.
- Construction of statistical language models using the lexicon and the transcription corpus.
- Development of linguistic lexica for parsing.
- ...
© Dafydd Gibbon
Sun May 24 11:09:33 MET DST 1998