next up previous
Next: 4 Illustration: lexicography in Up: No Title Previous: 2 Spoken language resources

3 What are spoken language data?

Two epistemological domains:

SIGNALS: recorded spoken data

SYMBOLS: interpreted spoken data

figure213

The SIGNAL-SYMBOL barrier:

HUMAN: categorial, interpretative perception, world knowledge
MACHINE: stochastic segmentation, classification, top-down prediction

figure213

Pre-recording, recording post-recording requirements:

  1. Corpus design: specification of speakers, scenario
  2. Corpus recording: studio quality
  3. Corpus processing: physical and linguistic characterisation:
    - Physical: signal properties, speaker characteristics
    - Linguistic: transcription, annotation, lexicon ...



Dafydd Gibbon
Wed May 22 10:39:25 MET DST 1996