Back to DITELO

Automatic Speech Segmentation and Labeling

Automatic speech segmentation and labelling  is fundamental for producing data to train speech recognizers. The technology developed by ITC-irst allows to  segment speech previously labeled at either phonetic or orthographic level.  The method is based on the use of acoustic-phonetic units represented by Hidden Markov Models (HMMs). Experiments have been carried out on both the DARPA-TIMIT speech database, formed by phonetically balanced sentences in American English, and on the APASCI database, formed by phonetically balanced sentences in Italian (see the paper
"Automatic Segmentation and Labeling of English and Italian Speech Databases" for the details).
To consider multiple phonetic transcriptions of the input utterance Finite State Networks are used. The networks account for  insertions, delitions or substitutions of phonemes. Tools for designing and compiling FSNs have also been developed.


 

This page is maintained by Daniele Falavigna.