Using the diphone boundary markings, diphones are extracted from the speech signal, often into separate files, or as a file containing time stamps which define the position of the diphone within a larger file. For practical purposes (such as reliability of pitch extraction), neighbouring stretches of speech are generally included with the diphone. The length of a diphone in normal speech together with its immediate context is less than 500 msec. The set of diphones, together with the set of diphonemes and time stamp triples associated with the beginnings, segment boundaries and ends of each diphone, consitutes the raw diphone inventory.