IBM Virtual Voice Creator
Text to Speech (TTS) synthesis technologies have become increasingly natural sounding and expressive, opening up new opportunities in domains such as entertainment and education. The IBM Virtual Voice Creator vision of customizable voice generation for game characters, cartoon heroes, and engaging conversational agents becomes a reality using high quality, expressive TTS with customizable on-line voice transformations and an interactive voice design web studio. We’re developing an inexpensive, fast, repeatable, and flexible voice-over process that encompasses static as well as dynamic and AI-generated textual content.
Mobile Multi-Factor Authentication (MMFA)
The mobile environment presents many security and usability challenges. MMFA utilizes the multitude of sensors and information channels on mobile devices together with our multimodal biometric authentication technology. We’re working to maximize both security and usability levels according to the situation, risk, and environment. Video authentication combines speaker and face verification for high accuracy (0.1% equal error rate), audiovisual liveness detection against replay attacks, and high usability in a short authentication session - several seconds of Selfie video.
State-of-the-art expressive Text-to-Speech (TTS) technology for delivering information and interacting with enterprise customers, as well as for education and entertainment purposes. In addition to active participation in development of the TTS service on the IBM Watson Developer Cloud, we’re working on other facets of TTS. Current research other topics include customizing the TTS voice by applying on-line voice transformation with user-controlled parameters, or by learning the characteristics and speaking style of uploaded samples of target speaker and automatically generating an appropriate voice transformation. We are also working on controlling the synthesized speech expressions, emphasis, and emotions while preserving high quality.
Advanced multimodal biometrics technology for mobile multi-factor authentication solutions. Our focus is on voice, face, and video authentication with biometric fusion, as well as vocal, visual, and audiovisual liveness detection to protect against spoofing/replay attacks.
Speech-based Emotion Recognition
Technologies and solutions for speech-based emotion recognition that enable analytics of spoken data, as well as for affect aware human computer interaction. Speech-based emotion recognition combines prediction from verbal content (textual transcript) as well as the non-verbal content (by direct signal analysis).