Next: Fieldwork
Up: 23 01 19 LINGUISTIK
Previous: A practical preview: simple
The systematic development of documentation for a language (and actually for many other purposes too) is not unlike software development - after all, software can be seen as just a rather extreme form of text.
We can use a standard methodology for software development, the so-called `waterfall model', which consists of a number of phases, which are repeated until the goals are fulfilled.
The five main development phases are:
- Requirements specification:
- Who is the documentation for, and what is it supposed to contain?
- Design specification:
-
- Project design: What are the tasks to be performed? What are the human, material, money and time resources? How are the resources to be assigned to the tasks?
- System design: What is the structure of the documentation? What parts does it have? How are they related? What is the format of the parts? How is the documentation to be constructed? How is the finished documentation to be accessed?
- Implementation design: What representation languages, software, computational platforms are to be used? What coding is to be used for the actual documentation to be coordinated, constructed, archived, accessed? What is the user interface to be like? Is a Database Management System to be selected?
- Implementation:
- Development of modules, links/interfaces,
- Evaluation:
- How does the implemented system match up to
- the requirements specification (black box evaluation)?
- the design specification (glass box evaluation)?
- Maintenance:
- How is the documentation to be made generally available, and maintained, debugged, corrected, otherwise modified, extended?
For some examples of language documentation, see
Requirements for language documentation depend on who is to use the
documentation - the speaker, the linguist, librarians, educationalists, ...
An overview is given in a
requirements document
prepared by Steven Bird and Gary Simons for the Exploration 2000 workshop.
Very detailed practical information is given in
Gibbon, Moore & Winski (1997) and
Gibbon, Mertins & Moore (2000); see the bibliography.
Document design means specifying the modules in the document archive
and their relations to each other.
A minimal set of modules is the following:
- Archive metadata
- Annotated corpus
- Corpus metadata
- acoustic recordings
- video recordings
- photographs
- time-stamped transcriptions (annotations)
- Segmental and syllabic
- Word tone
- Intonation
- Boundaries
- Orthographic (if present)
- Comments
- Note that interlinear annotations of the speech signal can be made automatically using the time-stamped transcriptions
- interlinear morphosyntactic annotation
- interlinear glosses
- field notes, including interview notes
- Lexicon
- Lexicon metadata
- Semasiological lexicon (form based, with semantic information, minimally glosses)
- Onomasiologial lexicon (meaning based, typically a word-field oriented thesaurus
- Concordance (form based, with occurrences in corpus) - either as a textual concordance (e.g. with transcriptions) or as an audio (video, etc.) concordance (with transcriptions, audio files and phonetic information)
- Lexical database with access-neutral specification, and semasiological, onomasiological, etc. `views' on the database
- Grammar
- Phonetic and phonological description, including phoneme inventory, syllable structure description, toneme inventory, and phonological rules
- Morphological description (segmental and tonal), including inflection, derivation and compounding
- Syntactic description, including tonal and segmental morphosyntax, word order, sentence types
- Discourse descriptions (text and dialogue structure): narratives, greetings, riddles, proverbs, games, dialogues, ...
- Ethnographic/sociolinguistic description
- Structure of the ethnic group, the village
- Age classes
- Gender classes
- Administration (hereditary, elected; royalty, chiefdom; elders)
- Family relationships and inheritance
- Accommodation, food
- Education
- Religion, festivities
- Occupations and trade
- Geographical factors
- Ethnic history
- Outline of language family and dialectal and historical relationships
- `Family tree' theory of relationships to groups, subgroups within a language family
- Characterisation of differences to other dialects and languages within the same group (definition of `language')
- Internal history of the language (related to linguistic family tree)
- External history of the language (related to internal history, and ethnic and other historical accounts)
Each of these modules must be specified in detail - e.g. different levels
of corpus annotation, different kinds of lexicon, different levels of
grammatical description (including phonetics and phonology, morphology,
phrasal syntax, narrative and dialogue discourse structure).
The production of documentation in practice depends on a variety of
factors: availablility of data, whether from fieldwork, laboratory
experimentation, texts, or other sources.
The Language Technology Group in Edinburgh has provided is a useful
FAQ list
on questions of available corpora, software tools.
Next: Fieldwork
Up: 23 01 19 LINGUISTIK
Previous: A practical preview: simple
Dafydd Gibbon, Thu Feb 15 15:07:15 MET 2001