next up previous
Next: Finite state technologies Up: Summary and Prospects Previous: Hyperlexica

Lexicon upscaling in speech technology

Another application area where rapid development is taking place is speech technology, particularly for dictation (speech recognition) and readback (text-to-speech) software. Lexicon size for such applications has leapt from a few hundred words in the early nineties to tens of thousands. Software technologies are being developed for generating all wordform variants from the stem forms ([Bleiching, Drexel & Gibbon 1996]), and for automatically inducing large lexica from text and transcription corpora with statistical and symbolic classification algorithms. The development of lexica for these purposes is a small but growing industry.

A problem for which solutions are being sought by recourse to morphological and other lexical generalisations is the out of vocabulary (OOV) item problem, for creating lexica for words not attested in a given corpus, whether morphological variants or entirely new stems in new semantic domains (see [Adda-Decker & Lamel (this volume)]). Companies which produce dictation software are heavily involved in this extremely labour-intensive area.



Dafydd Gibbon
Thu Nov 19 10:12:05 MET 1998