Back to DITELO

Speaker Identification and Verification

Automatic person recognition can be introduced in systems and/or services for restricting their use to only authorized people. Possible applications can be: retrieval of private information, control of financial transactions, control  of entrance into safe or reserved areas, buildings and so on. Person recognition can be based on both the separate or combined use of several biometric features (voice, face, fingerprints, iris, etc...). A usual approach, to cope with the speaker recognition problem, consists in classifying acoustic parameters derived from the input speech signal by short time spectral analysis. These parameters contain both phonetic information, related to the uttered text, and individual information, related to the speaker. Since the task of separating the phonetic information from the individual one is not yet solved many speaker recognition systems behave in a text dependent way (i.e. the user must utter a predefined key sentence). However, this is not always possible, expecially when the customer is supposed not to collaborate during the recognition process (think, for example, of criminal investigation applications). In these cases speaker recognition must be performed in a text independent way. According to the application area, speaker recognition systems can be divided into speaker identification systems and speaker verification systems. Speaker identification consists in assigning the input speech signal to one person of a known group, while speaker verification consists in confirming or not the identity provided by the user of the system.

Different approaches for speaker identification and verification have been studied in our Institute.

Text independent speaker identification based on Vector Quantization. This method provided good results for quite small sets of speakers (less than 100). The method has also been succefully used to combine acoustic features with visual ones (derived from the analysis of the face).

Text independent speaker identification based on Neural Network. This approach integrates Competitive Neural Networks with Radial Basis Function Networks to perform the task.

Text independent speaker identification and verification based on Continuous Density Hidden Markov Models. The method provides better results, for identification, than the previous ones.

Text dependent speaker identification and verification based on Semi Continuous Hidden Markov Models. The method is very promising for speaker verification purposes because allows to verify (accept or reject) both the speaker identity and the content of the input utterance. The method requires to define and record a selected set of training utterances, for each reference speaker, in order to design a corresponding set of phoneme models. Experiments led on theAPASCI database (138 speakers) provided an identification error equal to 1.3 % and an equal error rate of 1.0 %.

Finally, a prototype system that integrates both acoustic and visual features has been developed. This integration method could be efficiently used for combining other types of biometric features, such as fingerprints or iris scanning.

This page is maintained by Daniele Falavigna.