Maximum Likelihood Endpoint Detection with Time-Domain Features

Marco Orlandi, Alfiero Santarelli, Daniele Falavigna

Eurospeech 2003, European Conference on Speech Communication and Technology

Geneva, Switzerland, September 1-4, 2003.

IRST Tech. Rep. No. 0307-01


In this paper we propose an effective, robust and computationally low-cost HMM-based start-endpoint detector for speech recognisers. Our first attempts follow the classical scheme feature estractor-Viterbi classifier (used for voice activity detection), followed by a post-processing stage, but the ultimate goal we pursue is a pure HMM-based architecture capable of performing the endpointing task. The features used for voice activity detection are energy and zero crossing rate, together with AMDF (Average Magnitude Difference Function), which proves to be a valid alternative to energy; further, we study the impact on performance of grammar structures and training conditions. In the end, we set the basis for the investigation of pure HMM-based architectures.

paper (file pdfscript, 61 kByte)