Next: Steps in practical lexicography
Up: Lexical representation
Previous: A conventional database record
Excellent introductions to practical everyday considerations for designing the
microstructure of lexicographic databases for multilingual applications
are given in [Coward & Grimes 1995, ].
The following informal summary of a useful selection of
microstructure attributes (implemented as database columns)
covers basic types of lexical information for general linguistic
fieldwork.
It has been modified from the Coward & Grimes overview and adapted
to the present context.
- Main fields: key (lexeme, lemma, headword); homonym (homograph, homophone) number; lexical citation form; pronunciation; root form (lexical morpheme or stem); part of speech (POS); native language part of speech term; sense (reading) number for polysemous entries.
- Metalanguage fields: glosses, reverse search keys, definitions and descriptions in native (vernacular, regional, national) and standard (e.g. English, French German) languages.
- Headwords: literal gloss; scientific term; etymological status (native, loan, word formation, etc.).
- Examples: contextualised example sentences; translated examples.
- Pragmatics and semantics: usage and meaning (including formality level, social, archaic, ritual, taboo); encyclopaedic background information; restrictions on usage categories (sex, animacy, non-active verbs...); semantic domain (word field) and subdomain; dialect variants.
- Lexical relations and functions: synonyms, antonyms, part-whole, generic-specific, typical actors, undergoers, instruments, material used, ..., with glosses; cross-references.
- Grammatical information: morphology (paradigm, number, class); part of speech details.
- Sources, comments, housekeeping: bibliographical reference, pictures, speech recording; informal notes on lexical and background information, open questions; source of data (informant / linguistic consultant, researcher, village); status for editing, printing (e.g. exclude); date of last modification, lexicographer.
This linguistic microstructure specification is extremely detailed, and
intended for thorough linguistic field work, but even so it is not suitable
for all purposes, the types of information being neither complete nor always
relevant for system development, for instance.
The purpose of summarising the microstructure is to show how much more
comprehensive a lexicographic database can go than many conventional
dictionaries or the training and system lexica used in speech technology.
Next: Steps in practical lexicography
Up: Lexical representation
Previous: A conventional database record
Dafydd Gibbon
Thu Nov 19 10:12:05 MET 1998