Skip to main content

The creation of a more dynamic computer interface has been a major goal of the IBM speech recognition research team since the late 1950s. Following is a glimpse into the evolution of speech recognition research, development and product introduction at IBM.

Late 1950s--Speech recognition research begins at IBM.  Computers are designed to detect specific linguistic patterns and arrive at statistical correlation between sounds and the words they represent.


 o IBM demonstrates recognition of spoken digits at 1964 World's Fair with "shoe box recognizer."

 o Department of Defense funds a new research initiative, allowing for faster processing methods.

 o "Automatic Prototyping" allows computers to search out specific sounds, and store them for comparison and analysis.

 o "Dynamic Programming" is used to recognize conversational speech despite variations in rates of speech.

 o IBM pioneers a statistical approach to speech recognition that allows the system to improve its performance automatically, using powerful statistical algorithms adapted from Information Theory.

1984--IBM demonstrates the world's first 5000-word vocabulary speech recognition system, achieving 95% accuracy.  Running on three, six-foot-tall array processors and a 4341 mainframe, with a user interface running on an Apollo computer, this system could take discrete (word-at-a-time) dictation from a speaker "trained" to the system.

1986--Statistical speech recognition is recognized as the de facto standard for speech engine development.

1987--Vocabulary is increased to 20,000 words and hardware required is reduced to a single auxiliary card.

1989--Customers begin to test the technology and IBM further adapts the system to professional business environments.  New processors allow faster computation speeds, and the system now allows users to add new words to the system.

1992--IBM introduces its first dictation system, called IBM Speech Server Series (ISSS).

1993--IBM launches the IBM Personal Dictation System (IPDS) for OS/2*, running on an IBM PC using a custom audio adapter card and a conventional microphone.  The system takes dictation at about 80 words a minute with 95 percent accuracy, and allows users to "play back" their recording for editing purposes.  IPDS supports US English as well as UK English, German, French, Italian and Spanish.

1994--IBM creates a Speech Business Unit to speed up the time to market of speech recognition technology.

1996--IBM introduces a new release of its dictation system, called VoiceType 3.0, which requires no adapter card; it takes discrete (word-at-a-time) speech dictation and recognizes continuous commands without the need for training.  VoiceType 3.0 is offered for Windows 95** and the technology is integrated into OS/2 WARP.  IBM also introduces the world's first continuous dictation product, MedSpeak Radiology.  Finally, IBM introduces VoiceType Simply Speaking, the world's first consumer dictation product, in time for the holiday shopping season.

1997--IBM delivers an avalanche of new products, including VoiceType Connection for Netscape Navigator 3.0, VoiceType Simply Speaking Gold,  ViaVoice, the world's first continuous dictation retail product, and ViaVoice Gold.  ViaVoice is also the first ever continuous dictation product offered in Chinese and Japanese.  VoiceType is now also offered in Arabic.  The MedSpeak family of products is extended to Pathology.  The mass market appeal of dictation is now evident: dictation is preloaded on selected Aptiva (TM) models and included on the Lotus Smart Suite(TM) 97 upgrade.