Giuliano Carli and Roberto Gretter
In this work we describe the implementation of a Start-End Point Detection (SEPD) algorithm working in a real-time acoustic front-end for Automatic Speech Recognition (ASR) systems. The SEPD algorithm separates voice segments from the background noise and dynamically adapts its energy-based threshold according to the background noise level, without being influenced by the voice signal. It is divided into two parts: the first one computes an energy-based threshold and makes a local binary decision regarding the actual frame; the second one takes into account a larger context in order to either cut down isolated spikes or to merge short voice segments. We also briefly describe the Acoustic Front-End (AFE) in which our SEPD has been integrated. The whole AFE system runs on the AT\&T DSP32C VME board, and includes A/D conversion, desampling filtering, start-end point detection, MEL cepstrum computation and Vector Quantization (VQ). This system is interfaced with an Hidden Markov Model (HMM) based speech recognizer running on a UNIX workstation. Examples are also given that show how the SEPD algorithm performs under different time-varying noise conditions.