Next: Definition of the Chomsky
Up: 13.11.2001: Grammatik: prozedurale Aspekte
Previous: Declarative and procedural descriptions
Much of the terminology used in formal grammar theory was introduced by
Chomsky in the 1950s and 1960s.
A grammar is a set of generalisations over the sentences of a language.
It is important to note that for a given language, a number of different, but
equivalent, grammars can be defined:
- weak equivalence:
- two grammars define the same set of sentences.
- strong equivalence:
- two grammars not only define the same set of sentences, but also assign the same structures to them.
For a detailed introduction to formal grammars, see the recommended literature
for this course.
Briefly, in formal grammar theory, a formal language is described as a set of
sentences
(sometimes called words in computer science textbooks),
consisting of items from a vocabulary of minimal units.
In abstract definitions, the minimal units are represented as letters of
the alphabet, maybe with indices to distinguish them.
In computational linguistics, the units are chosen according to
specific descriptive linguistic requirements -
in syntax they are words. In morphology the ``language'' is a set of words,
and the vocabulary is a set of morphemes which compose these words.
In phonology the ``language'' is also a set of words, but the parts are
seen from the perspective of pronunciation rather than meaning, and may be
phonemes or other units of interest for pronunciation.
Formal languages can be classified according to how complex the
grammars are which are required to define them.
In the 1950s, Chomsky discovered a fairly simple set of principles for
defining these grammars, from the most complex to the most simple, which
are collectively known as the Chomsky hierarchy of formal grammars,
or more simply as the Chomsky hierarchy or the formal grammar hierarchy. Chomsky identified 4 types:
- Type 0, unrestricted rewriting systems:
- grammars designed on these principles can describe any languaage whatsovever. They are now regarded as two general (too powerful) to be used as accurate models of human languages, though practically all formalisms and programming languages in common use are approximations to this kind grammar.
- Type 1, context-sensitive grammars:
- grammars designed on these principles can define languages for which the relations within their sentences can be represented by certain types of restricted graph or network. There are a number of sub-types of this class of language, one of which, the index languages, is thought to be the kind which best models the sentences of natural languages.
- Type 2, context-free grammars:
- grammars designed on these principles can describe the typical tree-graphs which are used to represent the hierarchical structures of sentences. Since more complex structures than trees are known to be required for describing natural languages, context-free languages are not quite adequate for describing natural languages. However, they can describe large parts of natural languages (if some simplifications are made), and context-free grammars are the most well-known types of grammar, both for natural languages and for programming languages. In view of the slight inadequacy of context-free grammars, they are often enhanced by other mechanisms, such as variables (indices) which refer across trees in order to capture more complex structures.
- Type 3, regular grammars:
- grammars designed on these principles describe sequences which can be thought of as being arranged as a flat structure (like a string or list), or with at most unidirectional branching (either right or left, but not mixed). Grammars of this kind are very frequently used in linguistics and in speech and language technology, because they are adequate for many areas of language (phonology, morphology, parts of syntax), and very efficient to process (only requireing a finite memory for processing, unlike the other grammars).
The key property of the Chomsky hierarchy is that the languages it defines are in a relation of inclusion: the regular languages are included in the context-free languages; the context-free languages are included in the context-sensitive languages, and the context-sensitive languages are included in the unrestricted languages.
Correspondingly, regular grammars are more highly constrained than general context-free grammars, context-free grammars are more highly constrained than general context-sensitive grammars, and context-sensitive grammars are more highly constrained than unrestricted rewriting systems.
Next: Definition of the Chomsky
Up: 13.11.2001: Grammatik: prozedurale Aspekte
Previous: Declarative and procedural descriptions
Dafydd Gibbon, Wed Feb 12 10:50:41 MET 2003 Automatically generated, links may change - update every session.