Automatic
person recognition can be introduced in systems and/or services for restricting
their use to only authorized people. Possible applications can be: retrieval
of private information, control of financial transactions, control
of entrance into safe or reserved areas, buildings and so on. Person recognition
can be based on both the separate or combined use of several biometric
features (voice, face, fingerprints, iris, etc...). A usual approach, to
cope with the speaker recognition problem, consists in classifying acoustic
parameters derived from the input speech signal by short time spectral
analysis. These parameters contain both phonetic information, related to
the uttered text, and individual information, related to the speaker. Since
the task of separating the phonetic information from the individual one
is not yet solved many speaker recognition systems behave in a text
dependent way (i.e. the user must utter a predefined key sentence).
However, this is not always possible, expecially when the customer is supposed
not to collaborate during the recognition process (think, for example,
of criminal investigation applications). In these cases speaker recognition
must be performed in a text independent way.
According to the application area, speaker recognition systems can be divided
into speaker identification
systems and speaker
verification systems. Speaker identification consists in assigning
the input speech signal to one person of a known group, while speaker verification
consists in confirming or not the identity provided by the user of the
system.
Different approaches for speaker identification and verification have been studied in our Institute.
Text
independent speaker identification based on Vector Quantization. This
method provided good results for quite small sets of speakers (less
than 100). The method has also been succefully used
to
combine acoustic features with visual ones (derived from the analysis
of the face).
Text
independent speaker identification based on Neural Network. This approach
integrates Competitive
Neural Networks
with Radial
Basis Function Networks to perform the task.
Text
independent speaker identification and verification based on Continuous
Density Hidden Markov Models. The method provides better results,
for identification, than the previous ones.
Text
dependent speaker identification and verification based on Semi
Continuous Hidden Markov Models. The method is very promising
for speaker verification purposes because allows to verify (accept or reject)
both the speaker identity and the content of the input utterance. The method
requires to define and record a selected set of training utterances, for
each reference speaker, in order to design a corresponding set of phoneme
models. Experiments led on theAPASCI database
(138 speakers) provided an identification error equal to 1.3 % and
an equal error rate of 1.0 %.
Finally, a prototype system that integrates both acoustic and visual features has been developed. This integration method could be efficiently used for combining other types of biometric features, such as fingerprints or iris scanning.
This page is maintained by Daniele Falavigna.