![]() |
![]() |
![]() |
![]() |
|
| Conversational Biometrics Group - Projects | |||
|
|
|||
|
|
Conversational Biometrics Server(CBS)The CBS is a scalable high-performance implementation of the Conversational Biometrics . It allows the verification of user identities based ontheir speech data in combination with any other verification capability such as knowledge verification. The design is robust so as to allow future extension to behavioral biometric modeling based on accumulated user verification data. The Server runs as a UNIX/LINUX demon or Windows service. The client interface is exposed as a set of C/C++ or Java functions. Prospective developers can implement a client using a platform and environment of their choice, simply by complying with the XML message set and certain call sequences. ( Read more.) High-Performance Text-Independent Speaker VerificationA major focus area of our research is text-independent voiceprint verification. We develop algorithms for processing speech signals, extracting features, modeling speaker characteristics and performing the acoustic verification that allow for speakers to enroll and test with any arbitrary speech content. The text-independency provides for flexibility needed in the Conversational Biometrics framework, but also represents a great technical challenge, especially in the context of telephony application. IBM's leading edge text-independent technology was ranked first the 2002 NIST Speaker Recognition Evaluation. ( Read more ) Very Large Population Text-Independent Speaker IdentificationThe IBM speaker identification effort covers the largest population to date with over 10000 speakers. A fundamental goal of the project is to characterize system behavior along the dimensions of population size and system complexity. Speaker modeling is accomplished via our Transformation Enhanced Multi-Grained Models. We show that the most complex models within the framework perform the best and demonstrate that, in approximation, the identification error rate scales linearly with the log of the population size for the described system. Confidence measures and N-best list analyses are developed that increase the useful identification rate. Our research yields a distributed, multi-stage search strategy based on aspects of the system behavior where both speed and accuracy are achievable by increasing the system and model complexity as the search progresses. ( Read more [1], [2]) Utilizing High-Level Information for Speaker VerificationIdentifying individuals based on their speech is an important component technology in many application, be it automatically tagging speakers in the transcription of a board-room meeting (to track who said what), user verification for computer security or picking out a known terrorist or narcotics trader among millions of ongoing satellite telephone calls. How do we recognize the voices of the people we know? Generally, we use multiple levels of speaker information conveyed in the speech signal. At the lowest level, we recognize a person based on the sound of his/her voice (e.g., low/high pitch, bass, nasality, etc.). But we also use other types of information in the speech signal to recognize a speaker, such as a unique laugh, particular phrase usage, or speed of speech among other things. Most current state-of-the-art automatic speaker recognition systems, however, use only the low level sound information (specifically, very short-term features based on purely acoustic signals computed on 10-20 ms intervals of speech) and ignore higher-level information. While these systems have shown reasonably good performance, there is much more information in speech which can be used and potentially greatly improve accuracy and robustness. IBM participated in a 2002 Johns Hopkins Univeristy Summer Workshop project (SuperSID) dedicated to augmenting the traditional signal-processing based speaker recognition systems with such higher-level knowledge sources. Ways to define speaker-distinctive markers and create new classifiers that make use of these multi-layered knowledge sources were explored. The team worked on a corpus of recorded telephone conversations (Switchboard I and II corpora) that have been transcribed both by humans and by machine and have been augmented with a rich database of phonetic and prosodic features. A well-defined performance evaluation procedure was used to measure progress and utility of newly developed techniques. (Read more: Final report) |
| About IBM | Privacy | Legal | Contact |