Automatic Speech
Segmentation and Labeling
Automatic speech segmentation and labelling is
fundamental for producing data to train speech recognizers. The
technology developed by ITC-irst allows to segment speech
previously labeled at either phonetic or orthographic level. The
method is based on the use of acoustic-phonetic units represented by
Hidden Markov Models (HMMs). Experiments have been carried out on both
the DARPA-TIMIT speech database, formed
by phonetically balanced sentences in American English, and on the APASCI database, formed by phonetically
balanced
sentences in Italian (see the paper
"Automatic
Segmentation and Labeling of English and Italian Speech Databases"
for the details).
To consider multiple phonetic transcriptions of the
input utterance Finite State Networks are used. The networks account
for insertions, delitions or substitutions of phonemes. Tools for
designing and compiling FSNs have also been developed.
This page is
maintained
by Daniele Falavigna.