|
Research
on User Interface Technologies (UIT)
at IBM
is dedicated to developing innovative technologies, algorithms and
tools for next-generation human-computer interfaces.
For decades,
IBM has been among the leaders in all aspects of speech recognition
technology. Most recently, the IBM
Embedded Speech Recognition Technology that represents the
state of the art in utilizing the voice modality for command and
control,
ViaVoice for Embedded Multiplatforms, was made available on
multiple platforms and systems, such as Palm and iPaq handheld computers
and cars.
The
Telephony Speech Recognition project is aimed at developing
algorithms for improving the accuracy & robustness of speech recognition
in conversational telephony applications e.g., bank transactions,
directory assistance, etc. The project has a significant product
impact by directly contributing algorithms and data files for WebSphere
Voice Server and related
voice technology offerings from IBM.
The multi-year
goal of the Superhuman Speech Recognition project is to develop
speech recognition technology that surpasses the abilities of humans
to recognize speech. We are addressing this goal by attacking a
graded series of speech recognition tasks, ranging from simple read
speech to complex spontaneous conversations in difficult channels
and environments. The focal point of this work is the creation of
more robust and sophisticated acoustic and language modeling techniques
to improve performance while simultaneously reducing the labor required
to install and tune new applications.
The Audio-Visual
Speech Recognition project, which has been selected as an"
IBM Research Science and Technology accomplishment" for 2002, explores
the use of visual information in speech recognition systems. It
aims at combining visual cues with audio signals for the purpose
of improved automatic machine recognition of large-vocabulary continuous
speech. This combination proves important particularly in acoustically
challenging conditions with significant background noise levels,
such as multi-speaker environments, production lines, airport halls,
or cars. Audio-Visual Speech Recognition can also provide help with
speech-reading for the hearing/speech impaired.
IBM is
utilizing two key UIT technologies - speaker verification and conversational
systems - to provide enhanced security for voice-based transactions.
Our leadership speaker verification technology (ranked first among
25 worldwide participants in the 2002 Speaker Recognition Evaluation
organized by the National
Institute of Standards and Technology) is combined with user
knowledge (i.e. passwords and personal information) elicited through
a brief conversation. The combination of the two information sources
- called a "Conversational Biometric" - greatly increases
the security and reliability of voice based transactions in a non-intrusive
manner and provides a flexible framework for various authentication
scenarios so as to maximize user convenience. A demonstration of
"Conversational Biometrics" can be seen at http://www.research.ibm.com/VIVA_Demo/.
Exciting
advancements are taking place in the area of text-to-speech (TTS).
The goal of the project is to produce computer-generated speech
which is indistinguishable from natural speech; our newest system
takes a big step in that direction. The TTS system relies on a large
database of natural speech, which is automatically divided into
small building blocks which are then reassembled to form arbitrary
word sequences. In synthesis the blocks are chosen to minimize a
cost function which considers various important aspects of naturalness
in speech. Systems have been built in US and UK English as well
as Chinese, Japanese, French, German, and Spanish. Here
is an audio sample generated by our system.
An example
of how UIT can enable human-to-human communication is the IBM Speech-To-Speech
(Speech Translation) Technology. A speech translation system
is capable of processing spoken input to translate the content into
another language, for example translate from English
to Mandarin Chinese and from Mandarin
Chinese to English.
Another
important example of translation technology is InfoScope,
a handheld device equipped with a digital camera that can take snapshots
of text in English, French, German, Spanish, Italian and Chinese
and translate the image to another language in a matter of seconds.
The device displays characteristics of augmented reality, by presenting
the real world in the form of a captured image, such as a restaurant
sign, and merging it with virtual data, by providing a translation
of the image as an overlay to the PDA's screen.
The use
of UIT to increase productivity is demonstrated in BlueSpace,
a next-generation workspace solution encompassing multiple software
and hardware components that integrate sensors, actuators, displays
and wireless networks into architectural elements. The goal of the
space is to increase knowledge workers’ productivity by deterring
unwanted interruptions, improving awareness and fluid communication
among team members, and providing greater individual comfort through
personalized environmental settings.
Combining
many aspects of the UIT in one device, the unique project Cross-Industry
Dashboard for Retail and Healthcare should be named. This project
represents a combination of groundbreaking research in new form-factor
multi-modal devices (MetaPad), digital ink using new InkXML standards,
speech, and analytic methods. We are developing a mobile wireless
device and middleware to enable retail store employees to access
store-specific information anywhere utilizing various input modalities
such as traditional keyboard, bar code scanners, magnetic stripe
readers, and handwritten information in the form of digital ink.
This technology has applications to various domains including healthcare.
As part
of the IBM Corporate Community Relations (CCR) program, the Web
Adaptation project has received a lot of attention this
past year in the press and currently is in use by several organizations
serving populations of elderly users and users with disabilities.
For 2003, CCR plans to internationalize this project to deploy the
software to CCR partners worldwide.
IBM researchers
have been, and continue to be, among the worldwide leaders in developing
User-Interface Technologies. Being part of IBM gives us rare opportunities
to have our research affect both the state-of-the-art and the state-of-the-practice.
|