Skip to main content

Speaker recognition encompasses all the activities involving the identification of a speaker, based on his or her voice and the clustering of speakers based on similarities of their voices.  This technology has been popular in science fiction movies for years--but with research we are doing, it is rapidly becoming a reality.  Typically, speaker recognition is text-dependent or text-prompted: the user is expected to repeat a pre-arranged or a prompted text instead of being able to speak freely.  We are developing technology for text-independent speaker recognition. As a result users are allowed to speak freely, even in a different language.

Speaker verification

Speaker verification lets the computer verify the identity of a person based on his or her voice.  Applications are numerous.  For example, speaker verification can secure your personal computer each time the screen saver switches on, to be unlocked only when you return and say a few words into the microphone.  Over the telephone, speaker verification will allow secure access to information, for example in banking; and on portable devices such as a hand-held memo recorders it can be used to provide security to ensure that only you can listen to the messages you have recorded.  Speaker verification not only provides security and convenience, but also eliminates easily forgotten passwords and PIN numbers.

Speaker identification

Unlike speaker verification, where a claim of an identity is accepted or rejected based on the speaker's voice, our research work on identification lets the computer identify who is talking, from a large number of enrolled speakers, based on a small sample of his or her voice.  Again applications are limitless.  For example, on a personal computer, speaker identification can help you share your computer with other people and still always have the screen and the icons configured the way you prefer.

Speaker classification

Speaker classification involves the ability to handle a population of an unknown number of unknown speakers to

  • detect whenever a speaker changes;
  • regroup speech segments spoken by the same person;
  • cluster speakers who speak similarly (e.g. same accent).

Research direction

Our focus is on text-independent technology, which is designed to work regardless of what the speaker is saying, or even regardless of the language.  One benefit of this approach is that the user doesn't have to say anything special in order to be registered in the system's voice database - a few seconds of speech will do.  A second benefit is that the user doesn't have to say anything special in order to be recognized.

The applications we have talked about in this section are just now becoming a reality in our laboratories.  We are currently researching ways to extend our capabilities to very large populations.  What is possible today for a few thousand speakers, will one day be possible on a scale of millions of speakers.



Click here for more info.