next up previous
Next: Requirements for lexicon building Up: Lexicography Previous: Lexicography

Lexicography, lexicology, lexicon theory

Lexicography is the branch of applied linguistics concerned with the design and construction of lexica for practical use. Lexica can range from the paper lexica or encyclopaedia designed for human use and shelf storage to the electronic lexica used in a variety of human language technology systems, from palmtop word databases through word processors to software for readback (by speech synthesis in Text-to-Speech, TTS, systems) and dictation (by automatic speech recognition, ASR, systems). At a more generic level, a lexicon may be a generic lexicographic knowledge base from which lexica of all these different kinds can be derived automatically.

Lexicographic projects have always been long-term efforts, whether the lifetime effort of early lexicographers from the 17th to the 19th centuries -- recall Dr. Samuel Johnson's definition in his own dictionary of a lexicographer as `a harmless drudge' -- to the century-plus publication time of the Oxford English Dictionary and of comparable dictionaries for other languages. Since the advent of computers, lexicographic projects have been greatly accelerated, but, by Parkinson's law, lexica have also in general grown in size, and the design and construction of a reasonably large-scale lexicon of over tens or hundreds of thousands of words is a major task involving many person years of specification, design, collection of lexical data, information structuring, and user-oriented presentation formatting.

Lexicology, on the other hand, is the branch of descriptive linguistics concerned with the linguistic theory and methodology for describing lexical information, often focussing specifically on issues of meaning. Traditionally, lexicology has been mainly concerned with `lexis', i.e. lexical collocations and idioms, and lexical semantics, the structure of word fields and meaning components and relations. Until recently, lexical semantics was conducted separately from study of the syntactic, morphological and phonological properties of words, but linguistic theory in the 1990s has gradually been integrating these dimensions of lexical information.

The twin fields of terminology and terminography are industrially and commercially important disciplines which are related to lexicology and lexicography, and are concerned with the identification and construction of technical terms in relation to the real world of technical artefacts. Historically, these fields have developed and are in general practised separately from lexicology and lexicography, though there is no a priori reason for this.

Lexicon theory, in contrast to both lexicology and lexicography, is the study of the universal, in particular formal properties of lexica, from the points of view of theoretical linguistics, general knowledge representation languages in artificial intelligence, lexicon construction (cf. [Daelemans & Durieux (this volume)]), access algorithms in computational linguistics, or the cognitive conditions on human lexical abilities in empirical psycholinguistics (cf. [Baayen, Schreuder & Sproat (this volume)]).

Lexicon theorists have increasingly made use of extensive lexicological and lexicographic descriptions as models for testing their theories, and lexicographers are increasingly making use of theoretically interesting formalisms such as regular expression calculus in order to drive parsing, tagging and learning algorithms for extracting lexical information from text corpora (cf. [Grefenstette, Schiller & Aït-Mokhtar (this volume)]). Furthermore, the computer has not only accelerated work in practical lexicography, it has also gradually led to a convergence within this trio of lexical sciences. Several papers in this volume, for example [Baayen, Schreuder & Sproat (this volume), Bouma, Van Eynde & Flickinger (this volume), Cahill, Carson-Berndsen & Gazdar (this volume), Grefenstette, Schiller & Ait-Mokhtar (this volume)], and related studies, for instance [Pollard & Sag 1987, Pollard & Sag 1994], manifest this convergence by combining the lexical semantics slant of lexicology on the one hand with views predominant in lexicon theory on formal syntax, lexicalist morphology and phonology, or on the mental lexicon, and on the other with the treatment of large-scale corpora which is characteristic of lexicography.

The present overview of central issues in lexicography will concentrate on conditions for lexicon construction. Specific problems of spoken language lexicography are discussed in detail in [Adda-Decker & Lamel (this volume)] and [Quazza & van den Heuvel (this volume)]. The present overview is intended to provide a foundation for understanding and relating other articles in this volume, and is aimed at a general linguistic and engineering readership. The level of presentation will progress from the rather general to the rather technical.

next up previous
Next: Requirements for lexicon building Up: Lexicography Previous: Lexicography

Dafydd Gibbon
Mon Nov 16 17:29:29 MET 1998