next up previous contents
Next: Fieldwork Up: 23 01 19 LINGUISTIK Previous: A practical preview: simple

Documentation: requirements, design, specification, evaluation, maintenance

The systematic development of documentation for a language (and actually for many other purposes too) is not unlike software development - after all, software can be seen as just a rather extreme form of text. We can use a standard methodology for software development, the so-called `waterfall model', which consists of a number of phases, which are repeated until the goals are fulfilled.

The five main development phases are:

Requirements specification:
Who is the documentation for, and what is it supposed to contain?
Design specification:
  1. Project design: What are the tasks to be performed? What are the human, material, money and time resources? How are the resources to be assigned to the tasks?
  2. System design: What is the structure of the documentation? What parts does it have? How are they related? What is the format of the parts? How is the documentation to be constructed? How is the finished documentation to be accessed?
  3. Implementation design: What representation languages, software, computational platforms are to be used? What coding is to be used for the actual documentation to be coordinated, constructed, archived, accessed? What is the user interface to be like? Is a Database Management System to be selected?
Implementation:
Development of modules, links/interfaces,
Evaluation:
How does the implemented system match up to
  1. the requirements specification (black box evaluation)?
  2. the design specification (glass box evaluation)?
Maintenance:
How is the documentation to be made generally available, and maintained, debugged, corrected, otherwise modified, extended?

For some examples of language documentation, see

Documenting a language: requirements

Requirements for language documentation depend on who is to use the documentation - the speaker, the linguist, librarians, educationalists, ...

An overview is given in a requirements document prepared by Steven Bird and Gary Simons for the Exploration 2000 workshop.

Very detailed practical information is given in Gibbon, Moore & Winski (1997) and Gibbon, Mertins & Moore (2000); see the bibliography.

Documenting a language: design

Document design means specifying the modules in the document archive and their relations to each other.

A minimal set of modules is the following:

  1. Archive metadata
  2. Annotated corpus
    1. Corpus metadata
    2. acoustic recordings
    3. video recordings
    4. photographs
    5. time-stamped transcriptions (annotations)
      • Segmental and syllabic
      • Word tone
      • Intonation
      • Boundaries
      • Orthographic (if present)
      • Comments
      • Note that interlinear annotations of the speech signal can be made automatically using the time-stamped transcriptions
    6. interlinear morphosyntactic annotation
    7. interlinear glosses
    8. field notes, including interview notes
  3. Lexicon
    1. Lexicon metadata
    2. Semasiological lexicon (form based, with semantic information, minimally glosses)
    3. Onomasiologial lexicon (meaning based, typically a word-field oriented thesaurus
    4. Concordance (form based, with occurrences in corpus) - either as a textual concordance (e.g. with transcriptions) or as an audio (video, etc.) concordance (with transcriptions, audio files and phonetic information)
    5. Lexical database with access-neutral specification, and semasiological, onomasiological, etc. `views' on the database
  4. Grammar
    1. Phonetic and phonological description, including phoneme inventory, syllable structure description, toneme inventory, and phonological rules
    2. Morphological description (segmental and tonal), including inflection, derivation and compounding
    3. Syntactic description, including tonal and segmental morphosyntax, word order, sentence types
    4. Discourse descriptions (text and dialogue structure): narratives, greetings, riddles, proverbs, games, dialogues, ...
  5. Ethnographic/sociolinguistic description
    1. Structure of the ethnic group, the village
    2. Age classes
    3. Gender classes
    4. Administration (hereditary, elected; royalty, chiefdom; elders)
    5. Family relationships and inheritance
    6. Accommodation, food
    7. Education
    8. Religion, festivities
    9. Occupations and trade
    10. Geographical factors
    11. Ethnic history
  6. Outline of language family and dialectal and historical relationships
    1. `Family tree' theory of relationships to groups, subgroups within a language family
    2. Characterisation of differences to other dialects and languages within the same group (definition of `language')
    3. Internal history of the language (related to linguistic family tree)
    4. External history of the language (related to internal history, and ethnic and other historical accounts)

Each of these modules must be specified in detail - e.g. different levels of corpus annotation, different kinds of lexicon, different levels of grammatical description (including phonetics and phonology, morphology, phrasal syntax, narrative and dialogue discourse structure).

Documenting a language: implementation

The production of documentation in practice depends on a variety of factors: availablility of data, whether from fieldwork, laboratory experimentation, texts, or other sources.

The Language Technology Group in Edinburgh has provided is a useful FAQ list on questions of available corpora, software tools.


next up previous contents
Next: Fieldwork Up: 23 01 19 LINGUISTIK Previous: A practical preview: simple

Dafydd Gibbon, Thu Feb 15 15:07:15 MET 2001