4.1 The relationship between speech and gesture

Figure 2: Contribution of different submodalities to the total meaning.
\includegraphics[scale=0.65]{Meaning_CoGeT.eps}

In addition to the components as listed in Figure 2, the contribution of prior and current knowledge of the situation needs to be included. Grouping these components in very general terms, there are two major modalities which contribute to the meaning of a communicative event - speech (s) as composed of speech sounds, prosody and paralinguistics on the acoustic level and gesture (g) comprising limb and head movements, body posture and position, as well as facial expression and gaze on the visual level. Thus, such a composite meaning can be modelled as follows:

meaning $(s+g)=meaning(s) \circ meaning(g)$

At this stage the role of long-term knowledge and the situation is not explicitly included. Resulting from this basic model, there are four possible types of gesture-speech interaction (see Table 1), which will be explained below.


Table 1: CoGesT classification of gestures in relation to the modalities involved.
type s g modalities involved CoGesT classification
1 - - neither communicative modality involved, thus of no relevance
2 + - only speech involved held postures, transpositions
3 - + only gesture involved gestural idiom, non-conventionalised gesture
4 + + both modalities involved gestural idiom, non-conventionalised gesture

Whereas the first type does not carry any communicative relevance at all, types two and three are relevant without regard to the other modality, the former only consisting of speech, the latter merely involving meaningful gesture (e.g. signs, icons). Whereas the first three types appear to be somewhat self-explanatory, the last type requires additional specification, since it can be further divided into three subtypes, all of them including a combination of significant speech and gesture. Our working assumption is that in these three types the degree of overlap of gestural meaning with the locutionary meaning of speech varies, as is illustrated in stylized form in Figure 3.

Figure 3: Various degrees of overlap between gestural and locutionary meaning.
\includegraphics[scale=0.4]{Overlap_CoGeT.eps}

In case A the communicative meaning of the speech and the gesture produced in parallel are identical, i.e. $meaning[s]=meaning[g]$. This is the case for the ``thumbs-up'' sign accompanied by an utterance such as ``great'' or ``well done''. In case $B_1$, the meaning of the gesture contributes modestly to the total meaning, thus forming a subset of the meaning of the superordinate modality, namely speech: $meaning[g] \subset meaning[s]$. An example is the utterance ``walking up the stairs'' accompanied by a gesture of the hand indicating a special type of stairs such as a spiral staircase. Case $B_2$ represents the opposite scenario, namely that the meaning ofspeech occurs as a subset of the meaning of gesture, thus $meaning[s] \subset
meaning[g]$. An example of this would be highly emotive gesture with relative speechlessness. Case $C_1$ and $C_2$ illustrate small or no overlap of the meaning of speech and the meaning of gesture; hence, in case $C_1$, we get $meaning[s] \cap meaning[g] \not= \emptyset$, and for $C_2$ we have $meaning[s] \cap meaning[g]=\emptyset$. This could be a vague waving of hands during speech, not connected to the semantics of what is being said. For a transcription system such a categorical division into several functional types of gestures is indispensable. However, the difference between these types is highly granular and involves at least two parameters:

  1. size of contribution,
  2. degree of overlap.

Thorsten Trippel 2003-08-12