Phone-Based Prefiltering for Continuous Speech Recognition

Renato De Mori, Diego Giuliani and Roberto Gretter

Proceedings of ICSLP 94, Yokohama, Japan, 1994


An architecture for speech recognition is proposed, based on four stages: (1) recognition of the most likely phone sequence using centisecond Hidden Markov Models (HMMs); (2) phone-based lexical and syntactical forward decoding; (3) A* phone-based backward pass, producing a Word Hypothesis Structure (WHS); (4) accurate rescoring of the search sub-space represented by the WHS using centisecond HMMs.

Experiments carried out on two different tasks show that a recognizer based on the proposed four-stage architecture is able to achieve comparable performance respect to a classic one-stage recognizer.

Experimental results show also that the same recognition performance can be obtained with WHSs built with this approach and WHSs built using centisecond HMMs with a potential speed-up, in WHS generation, proportional to the average phoneme duration in centiseconds.

