next up previous contents
Next: Extensional and intensional Up: No Title Previous: Contents

History and scope

The VERBMOBIL demonstrator lexical word list was defined on the basis of consultations with all partners in 1993 and 1994, with the aim of defining the vocabulary for the VERBMOBIL demonstrator, due in February 1995.

The present document represents a stable state of the VERBMOBIL lexical word list representation conventions for orthography and phonology as developed in negotiations these negotiations. The document remains valid for lexical word lists of fully inflected forms for the VERBMOBIL research prototype (Forschungsprototyp), due in late 1996.

The lexical word list was first distributed in definitive form in May 1994. It was later extended to a full lexical database, with a number of versions from November 1994 to February 1995 ([1], [2]), containing attributes for spelling, pronunciation, morphology, syntax, semantics, pragmatics, and signal labels.

This Technical Document covers only conventions and encoding for the spelling and pronunciation attributes of fully inflected forms, i.e. the lexical word list or lexical pronunciation table in the traditional sense.

A lexical word list is not to be confused with a pure corpus word list, which is a list of the transcribed units occurring in a corpus. A lexical word list may in the simple case be identical with a corpus word list, but it may differ from a given corpus transcription in two main ways:

  1. by being less detailed and restricted to canonical orthographic or phonological forms, or
  2. by filling in `accidental gaps', such as additional inflectional forms or missing items in small word fields (like days of the week).

However, the lexical word list must be consistent with the related corpus transcription in the sense that there must be a well-defined function which maps items in the corpus transcription to items in the lexical word list (though not necessarily vice versa).

The lexical word list enterprise required the explicit development of somewhat complex lexicographic criteria for spoken language coverage and representation, which are motivated, outlined, and then specified in detail below.



next up previous contents
Next: Extensional and intensional Up: No Title Previous: Contents



Dafydd Gibbon
Fri Sep 1 19:40:09 MET DST 1995