next up previous
Next: Lexical properties and relations Up: Lexical structure and lexical Previous: Lexical structure and lexical

Toward an integrated lexical sign model

The traditional word-level dyadic sign model is no longer adequate for the range of dictionaries required today in natural language processing and speech technologies, or in the use of paper or electronic lexica. The inadequacy rests on both descriptive and theoretical considerations. A bilingual dictionary or translation dictionary does not fit this model at all, since in general word forms in tex2html_wrap_inline1257 are matched with word forms in tex2html_wrap_inline1259, and the issue of meaning is not directly addressed. Specialised dictionaries, such as pronouncing dictionaries, also only cover the form-form relation, but between the two modalities of written and spoken language transmission within in one language, rather than between two languages. Further, it is unclear how the relation between word forms in classical word-based dictionaries, and phrasal forms in idiom dictionaries is to be described, and how the distinctions between literal meanings, figurative meanings and frozen meanings are to be captured for lexical units of different ranks (stems, words, phrases, texts, dialogues). Perhaps most importantly, the classical type of dictionary (referred to indiscriminately by the man in the street as `the dictionary') contains many different types of lexical information, from orthography and pronunciation through grammatical word class and internal morphological structure, to canonical meaning, special uses, synonyms and antonyms, and etymology (word history), the majority of which are not covered by the dyadic sign model.

Current work on lexicalist theories in general and computational linguistics (cf. [Bouma, van Eynde & Flickinger (this volume)]) is based on more complex sign models. These models describe lexical items, their properties, and a wide range of compositional and interpretative relations between them. However, even these models do not capture the notion of non-word-level lexical signs (e.g. morphemes or stems below the word level, phrasal signs above the word level), and do not integrate the many levels of phonological interpretation (from phonemic to word, phrasal and discourse properties).

The `integrated lexicalist' (ILEX) sign model on which the present overview is based is a more ambitious generic approach, and relates lexical and non-lexical signs at different compositional ranks, each with their own surface and semantic interpretation. The compositional and interpretative dimensions of the ILEX model are outlined in Figure 1; the other lexicalisation and generalisation dimensions are discussed in the text. The model will be taken up for detailed discussion in later sections.

 figure114
Figure 1: Sign model with compositional and interpretative dimensions. 

In this generic model, a sign is embedded in the more-or-less well-defined world in which it is used, indicated by the surrounding dotted line. Both non-lexical and lexical signs have two kinds of interpretation with respect to this world; signs are in the general case complex, and interpretation is compositional, based on the two main structural properties of signs, category (and subcategory etc.), and parts:

There are two main kinds of composition: on the one hand, parts are ordered in a hierarchy of rank, and on the other, each rank has its own hierarchical constituent structuring principles:

  1. Rank: words are structured differently from sentences, sentences from texts, texts from dialogues. This part-whole relation is visualised in Figure 1 as small boxes (ranks) linked with a vertical dotted line.
  2. Constituent: within a rank, units have a more homogeneous structure; words have stem, affix and prosodic constituents which are concatenated and prosodically associated by a `word grammar', sentences and their parts have phrasal and prosodic constituents which are concatenated and prosodically associated by different, more complex principles; analogously, specific structural conditions apply to texts and dialogues. Word grammars are typically finite state, though semantic interpretation of words requires higher complexity; sentence grammars are typically context-free grammars enhanced with cross-reference devices. Constituency hierarchies are visualised in Figure 1 by the little trees within the boxes which represent different ranks.

In the present context, attention will be limited to the word rank; however, lexical signs of other ranks (e.g. morphemes, phrasal idioms, ritual texts, routine dialogues) also exist.

Figure 1 shows the main types of information in the ILEX sign model, i.e. information about the interpretation and composition of signs. Further types of lexical information may be represented in practical dictionaries; in etymological dictionaries, for example, meta-information reconstructing the historical development of the basic types of information is represented.

However, the ILEX model contains two important further dimensions of information about signs which need to be added to those outlined in Figure 1:

Lexicography is, first, concerned with lexicalised signs of all ranks and constituent types, not just words, and, second, these lexicalised signs are not simply bundles of idiosyncratic word properties but can be grouped into classes on the basis of generalisations about their similarities and differences. It is the four dimensions of composition, interpretation, lexicalisation, and generalisation which, taken together, characterise the Integrated Lexicalist (ILEX) model of language signs.


next up previous
Next: Lexical properties and relations Up: Lexical structure and lexical Previous: Lexical structure and lexical

Dafydd Gibbon
Thu Nov 19 10:12:05 MET 1998