The CoGesT [kost] Conversational Gesture Transcription System is being developed in
the DFG-funded project ``Theory and design of multimodal lexica'',
which forms part of the
research group ``Text technological modelling of information'' at the
Universities of Bielefeld, Dortmund, Giessen and Tübingen. It is a twofold
system for the transcription of gestures produced in conversational
speech:
- it provides a system of linguistically motivated categories for gestures, and
- it is a practical machine and human readable transcription and annotation scheme with simple and complex symbols for simple and complex categories.
The term gesture as used here refers to the communicatively
relevant position or
movement of any limbs or parts of the body of a person involved in
face-to-face discourse, most prominently
conversation. Conversation refers to any type of speech that
includes at least one other person apart from the speaker. This other
person has to be physically present, or the speaker needs to know that
she or he is
visible to the other person (this includes video conference
conversations, for example).
CoGesT is based on the notion that conversation is
multimodal and comprises at least the acoustic and the visual
modalities (see also Gibbon et al., 2000). Speakers are assumed to express meanings by means of
information in the acoustic and the visual modalities, which are both
inextricably linked to and codependent on each other. Each modality
can be further divided into several submodalities: The acoustic
modality, for example, consists of parallel information streams on
different linguistic levels such as the segmental (speech sounds) and
the prosodic (pitch, speech tempo, voice quality). For example, speakers may
use intonation, fast speech or a breathy voice to convey a specific
meaning. The visual
modality can be divided into several submodalities according to the
various body parts which produce gestures during conversation, e.g. hands
and eyebrows. In face-to-face conversation, visual information is always
present, and a parallel use of the acoustic information is possible.
The purpose of CoGesT is the description of gestures within this
multimodal conversational context. The focus does not lie on an
exhaustively detailed description of every aspect of gesture as for
example in HamNoSys (see Prillwitz et al., 1989), a phonetic description
of signs in German sign language, and FORM (see Martell, 2002),
developed for conversational gestures and general body
motions. Rather, CoGesT focuses on linguistically relevant gestural forms motivated by the functions of gestures
within multimodal conversations, and appropriate for collating in a multimodal lexicon.
The theoretical assumptions underlying the CoGesT system differ from
other descriptions of gestures in three important ways.
- Linguistically motivated categories.
All categories for gestural transcriptions are linguistically motivated. This means that we assume the null hypothesis that the patterning of visual gesture is semiotically organised in much the same way as the acoustically transformed articulatory gestures in speech. We distinguish between form and function and assume that gestures have both morphological and syntactic rules for structural and sequential combinations.
- Clear distinction between form and function.
The CoGesT system is being developed for the purpose of representing
gestures in both a corpus and a multimodal lexicon. The classification
of gestures proceeds according to their functional relation with other
modalities of the conversation. Systematic in its nature, it is
intended to serve as the basis for formalization and
implementation. In contrast to McNeill (1992), who claims that
there are no separately structured systems of form and meaning in
gestures (p. 23), the CoGesT system is based on the theoretical
assumption that a clear distinction between gestural forms and
functions is possible. When making a gesture whilst saying ``And there
was this circle in the sand.'', it does not matter whether one uses
one's left hand or right hand, one's index finger only, the thumb,
index finger and middle finger, or all fingers of the hand in a
circling movement; the function of the gesture, i.e. the illustration
of the meaning of the word ``circle'' is the same. This is a case of gestural
paraphrase. Likewise, a single gesture may be ambiguous in or out of
context, i.e. have more than one function and therefore potentially be
a source of misunderstanding. As an example of this, take a circle
formed with thumb and forefinger, which can be interpreted either
as an icon for a circle, with the meaning ``okay'', or as an insult, depending on the surrounding physical, social
and cultural context. This means that although the form of gestures
may allow certain variations, their functions in communication can be
described as a separate set of categories. This implies that despite
the fact that gestures are ``spontaneous, unique and personal''
(McNeill, 1992), they are instances of a system and can be
classified into functionally relevant categories. This, in principle,
also applies to prosody. It has not yet been clarified whether gestures have discrete or gradient function.
- Notational system.
CoGesT provides a notational system based on a clear distinction between categories of gestural form and function and by distinguishing between compulsory basic and additional optional categories of description. The categories are mapped on to a notation which can be adapted to various scientific requirements and should not be confused with the category set itself. Possible uses are the description and
comparison of gestures by speakers of different competence (child
vs. adult, native speaker vs. non-native speaker), of different
languages and personalities and in different types of conversations.
The CoGesT
system currently distinguishes between the function of gestures and a number of dimensions of form, phase, location, directionality, and shape, and allows
further extensions as required.
The outline of the paper is the following: Section 2 then describes the
classification of gestures according to form. Section 3 describes ways in which gestures can be combined regarding precedence and overlap, while Section 4
outlines the functional relationship of gestures with other modalities. The notational
scheme for the gesture categories is introduced in Section
5, and a description of an operationalization of the
CoGesT system in annotation is exemplified in Section
6. Results of the evaluation of the system in terms of
an inter-rater reliability study are presented in Section 7.
Thorsten Trippel
2003-08-12