Bianca Angelini, Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter and Maurizio Omologo
A system for automatic segmentation and labeling of speech has been developed that provides phone boundaries, given the linguistic content of a speech utterance.
The technique is based on the use of an acoustic-phonetic unit Hidden Markov Model (HMM) recognizer. Starting from some phonological rules, a network is derived that represents a wide variety of possible phonetic realizations of a given text, in order to cope with pronunciation variabilities. Given this network, both the most likely phone sequence and phone boundaries are determined by using the Viterbi algorithm.
The system has been developed and tested both on the DARPA-TIMIT acoustic-phonetic continuous speech database of American English and on a similar Italian database. Given a tolerance of 20 ms, the system provides a correct boundary location of 86.2% for American English (90.9% for Italian), when trained with 256 (158 for Italian) phonetically rich sentences.