Next: Pitch pattern formalised as
Up: Pitch in concatenative speech
Previous: Quality considerations for concatenative
System specification:
- Goal: A simple concatenative audio digit synthesiser for teaching purposes
- Vocabulary: English digits, from 0 (zero) to 9 (nine).
- Prosody:
- Non-final intonation function (fall-rise pitch form)
- Final intonation function (fall pitch form)
- Naturalness and comprehensibility with
- Good audio quality
- Native speaker
- Isochronous rhythm with some temporal variation
- Click-free concatenation
Corpus specification (`pre-recording phase'):
- Homogeneity: Maximal homogeneity is required, entailing use of a single speaker, in one session, and if possible in one take.
- Digits are spoken as a sequence of pairs with fall-rise then falling pitch, in a single recording take, as a single audio file; this is possible because of the tiny corpus
Corpus recording (`recording phase'):
- Recording: digital, 8 kHz sampling rate
- Format: WWW compatible audio format, mime type audio/basic (*.au)
- Microphone: Sony medium quality cardioid magnetic
- Acquisition: Solaris 2.4 audiotool on SUN Sparc Classic
- Pacing: The audiotool time display was used as an informal pacer
- Instruction: The digit pair sequence (1-fallrise, 1-fall, ... , zero-fallrise, zero-fall) is to be spoken from memory
Corpus processing (`post-recording phase'):
- Transcription of digits and digit sequences with a lexicon of ten Arabic numerals
- Definition of sequencing and digit production machine as a non-deterministic Finite State Transducer (FST); see the FST formalisation section (11.2.3)
- Manual extraction of digit signals in 600 msec windows, using audiotool and storage in individual files; the equal sized windows emulate the approximate isochrony of English speech, and the position of the signal in the window was varied slightly to produce slight syncopation in order to fulfil the naturalness specification
System implementation:
- Design of HTML form interface with the following functionality:
- Input field with instruction
- Audio synthesis and Reset buttons
- Design and implementation of CGI synthesiser script with the following functionality:
- Limitation of input length to 10 digits
- Correct implementation of FST:
- Input: string of Arabic digits
- Output: string of digit signal file names
- Concatenation of files
- Output file transferred to CGI output location
- Prototype test evaluation
- Integration into teaching materials (table and colour presentation design)
Next: Pitch pattern formalised as
Up: Pitch in concatenative speech
Previous: Quality considerations for concatenative
Dafydd Gibbon
Mon Sep 14 14:35:18 MET DST 1998