The handbook ([4]) has been realised as a series of
necessarily inter-related chapters, where each chapter
provides some introductory background, including
definitions of basic terminology, and then provides
concise summaries of common approaches, including
alternatives, where these exist. Factors pertaining to
recommended approaches are outlined, and preferred
methods and recommendations are identified wherever
possible. A chapter on tools catalogues the software and
hardware tools that are available to support resource
creation. A selected bibliography is included as well as
useful reading lists of a tutorial nature and an index.
The current chapter plan addresses:
1. Introduction
2. System Design and Specification
3. Corpus Design
4. Corpus Collection
5. Corpus Representation
6. Lexica
7. Language Modelling
8. Dialogue
9. Physical Characterisation and Description
10. Assessment Methodologies and Experimental Design
11. Assessment of Recognition Systems
12. Assessment of Synthesis Systems
13. Assessment of Speaker Verification Systems
14. Assessment of Interactive Systems
15. Tools
16. Terminology & Glossary
A number of appendices provide valuable reference
material on various topics, including:
A. Computer readable phonetic alphabets
B. SAMPA description
C. Speech file formats (SAM, NIST, Verbmobil)
D. Recording protocols (studio, telephone)
E. Compendium of public domain SL Corpora
F. EUROM databases overview
G. Speechdat and Polyphone databases overview
H. Current list of document servers
I. Directory of speech agencies (ESCA, ELRA, ELSNET, LDC)
The current handbook cannot be considered a final or complete statement of guidelines and recommendations as agreed by the EU SL community. Nevertheless it is expected that the present work substantially reflects the community position on a large range of relevant topics, and will prove to be an important interim working document for the provision of commonly agreed working standards and ultimately, where appropriate, may support progression of these de facto conventions and practices towards formal representation.
In its present incarnation the handbook already reflects the results of fruitful co-operation between the EAGLES project and the LRE project SPEECHDAT, which itself is concerned with creating an infrastructure and implementation model for the creation of commonly required spoken language resources. This co-operative basis of resource specification and description, and close identification with previous speech technology projects, seeks to ensure that the current set of EAGLES recommendations are relevant and closely related to present-day requirements in both industry and research.
One of the single, most important achievements of the SPEECHDAT project to date has been initiating the creation of an association - the European Language Resource Association (ELRA) - to oversee the creation, validation, marketing and distribution of the growing body of specifically European language resources, both text and spoken. ELRA will provide the executive structures required to implement the strategies for speech resource creation, validation and distribution initially formulated within the SPEECHDAT project. It is foreseen that the co-operation fostered between EAGLES and ELRA will continue to develop as a closely interlinked relationship, much as between a legislature and an executive body.