next up previous contents
Next: Further developments Up: No Title Previous: Lexical spelling conventions

Lexical pronunciation conventions and encoding

A lexical pronunciation transcription was developed, based on the following phonological conventions and SAMPA-oriented encoding principles.

  1. The fundamental phonological transcription convention is defined as phonemic transcription according to the international SAMPA conventions for German.
    Note: Some local SAMPA variants differ slightly, e.g. in using /Q/ for glottal stop; however, this is defined in international SAMPA as a variety of open rounded back vowel and was therefore considered unsuitable.
  2. Length marks for long vowels are included, as these are widely used; they are, strictly speaking, redundant, and are not used in the Aussprachewörterbuch (Pronunciation Dictionary) of the Duden Publishing Company.
  3. Word stress information is included: Primary and secondary lexical stress are encoded (to be distinguished from phonetic accent in context).
    Note: The international SAMPA encoding with double quote (") and percent (%) was found to be inconvenient for a number of ASCII oriented processing environments, and replaced by a single quote (') for primary stress and two single quotes ('') for secondary stress. In LaTeX, the latter is generally indistinguishable from as a double quote. This notation has the advantage of being iconic; for example, tertiary stress (in compound words) can be simply included as three single quotes (''').
  4. Conventions and encodings for morphological and phonological boundary markers are used as follows:
    # (hash, word boundary): Word boundaries of two kinds are included; the hash sign, ` #', is a standard notation in linguistics.
    In compound words, word boundaries are encoded as a single hash sign (` #').
    In phrasal idioms, word boundaries are encoded as a double hash sign ` ##').
    . (period, point): Syllable boundaries which are not simultaneously word boundaries; word boundaries are simultaneously syllable boundaries.
    + (plus): Morph boundaries which are not simultaneiously word boundaries; word boundaries are simultaneously morph boundaries.
    Phonemic morphs (contrast, in orthography, orthographic morphs) are the phonemic represenations of morphemes, in distinction to structural or semantic characterisations of morphemes).
    Note: Note that morphs and syllables are frequently not co-extensive: Morphs may contain more than or less than one syllable, and morphs and syllables may overlap (cf. Verbindung /vE6.+b'In.d+UN/ for overlap of the morph /b'Ind/ with the syllable /dUN/.

The strictly ASCII oriented database standard enables simple database access functions to be specified and emulated using UNIX tools, defining sets such as the following:

For example, 'the set of morphologically simple unstressed monosyllables with short vowels' is defined as
    grep -v "[.'+\#:]" <infile> > <outfile>

A set of UNIX access tools has been provided for the VERBMOBIL database, based on these basic principles.



next up previous contents
Next: Further developments Up: No Title Previous: Lexical spelling conventions



Dafydd Gibbon
Fri Sep 1 19:40:09 MET DST 1995