The VERBMOBIL demonstrator lexical word list was defined on the basis of consultations with all partners in 1993 and 1994, with the aim of defining the vocabulary for the VERBMOBIL demonstrator, due in February 1995.
The present document represents a stable state of the VERBMOBIL lexical word list representation conventions for orthography and phonology as developed in negotiations these negotiations. The document remains valid for lexical word lists of fully inflected forms for the VERBMOBIL research prototype (Forschungsprototyp), due in late 1996.
The lexical word list was first distributed in definitive form in May 1994. It was later extended to a full lexical database, with a number of versions from November 1994 to February 1995 ([1], [2]), containing attributes for spelling, pronunciation, morphology, syntax, semantics, pragmatics, and signal labels.
This Technical Document covers only conventions and encoding for the spelling and pronunciation attributes of fully inflected forms, i.e. the lexical word list or lexical pronunciation table in the traditional sense.
A lexical word list is not to be confused with a pure corpus word list, which is a list of the transcribed units occurring in a corpus. A lexical word list may in the simple case be identical with a corpus word list, but it may differ from a given corpus transcription in two main ways:
However, the lexical word list must be consistent with the related corpus transcription in the sense that there must be a well-defined function which maps items in the corpus transcription to items in the lexical word list (though not necessarily vice versa).
The lexical word list enterprise required the explicit development of somewhat complex lexicographic criteria for spoken language coverage and representation, which are motivated, outlined, and then specified in detail below.