next up previous contents
Next: Pitch pattern formalised as Up: Pitch in concatenative speech Previous: Quality considerations for concatenative

Design specification of the concatenative digit synthesiser

System specification:

  1. Goal: A simple concatenative audio digit synthesiser for teaching purposes
  2. Vocabulary: English digits, from 0 (zero) to 9 (nine).
  3. Prosody:
    1. Non-final intonation function (fall-rise pitch form)
    2. Final intonation function (fall pitch form)
  4. Naturalness and comprehensibility with
    1. Good audio quality
    2. Native speaker
    3. Isochronous rhythm with some temporal variation
    4. Click-free concatenation

Corpus specification (`pre-recording phase'):

  1. Homogeneity: Maximal homogeneity is required, entailing use of a single speaker, in one session, and if possible in one take.
  2. Digits are spoken as a sequence of pairs with fall-rise then falling pitch, in a single recording take, as a single audio file; this is possible because of the tiny corpus

Corpus recording (`recording phase'):

  1. Recording: digital, 8 kHz sampling rate
  2. Format: WWW compatible audio format, mime type audio/basic (*.au)
  3. Microphone: Sony medium quality cardioid magnetic
  4. Acquisition: Solaris 2.4 audiotool on SUN Sparc Classic
  5. Pacing: The audiotool time display was used as an informal pacer
  6. Instruction: The digit pair sequence (1-fallrise, 1-fall, ... , zero-fallrise, zero-fall) is to be spoken from memory

Corpus processing (`post-recording phase'):

  1. Transcription of digits and digit sequences with a lexicon of ten Arabic numerals
  2. Definition of sequencing and digit production machine as a non-deterministic Finite State Transducer (FST); see the FST formalisation section (11.2.3)
  3. Manual extraction of digit signals in 600 msec windows, using audiotool and storage in individual files; the equal sized windows emulate the approximate isochrony of English speech, and the position of the signal in the window was varied slightly to produce slight syncopation in order to fulfil the naturalness specification

System implementation:

  1. Design of HTML form interface with the following functionality:
    1. Input field with instruction
    2. Audio synthesis and Reset buttons
  2. Design and implementation of CGI synthesiser script with the following functionality:
    1. Limitation of input length to 10 digits
    2. Correct implementation of FST:
      1. Input: string of Arabic digits
      2. Output: string of digit signal file names
    3. Concatenation of files
    4. Output file transferred to CGI output location
  3. Prototype test evaluation
  4. Integration into teaching materials (table and colour presentation design)

next up previous contents
Next: Pitch pattern formalised as Up: Pitch in concatenative speech Previous: Quality considerations for concatenative

Dafydd Gibbon
Mon Sep 14 14:35:18 MET DST 1998