Overview
Search in spoken data is an emerging research area currently garnering a lot of attention from the natural language research community. IBM HRL has developed skills and technological assets in this area, which incorporate existing state-of-the-art assets developed by IBM Research in the area of Automatic Speech Recognition (ASR).
Information retrieval from noisy transcripts
The team in Haifa has developed a novel scheme for information retrieval from noisy transcripts. The scheme uses additional output from the transcription system to reduce the effect of recognition errors in the word transcripts. Although ASR technology is capable of transcribing speech to text, it suffers from deficiencies such as recognition errors and a limited vocabulary. For example, noisy spontaneous speech (e.g., phone calls) is typically transcribed with an accuracy of 60% to 70%. In some circumstances where there are noisy channels, foreign accents, or under-trained engines, accuracy may fall to 50% or lower. The HRL scheme shows a dramatic improvement in the quality of searches being conducted within transcript information.
Overcoming the limitations of OOV and phonetic transcripts
To overcome the limitations and high error rate associated with phonetic transcription and queries for terms not recognized by the ASR engines, the HRL team developed a new technique that combines phone-based and word-based search. When people search through speech transcripts and query for terms that are outside the vocabulary domain on which the engine is trained, the engine may not return any results. The ‘out of vocabulary’ (OOV) terms are those words missing from the ASR system vocabulary. Although phonetic transcription constitutes an alternative to word transcription for OOV search, they suffer from high error rate and are therefore not a viable alternative. The HRL team has developed algorithms specifically for fuzzy search on phonetic transcripts, thereby overcoming this problem.
Ranking high on the list
IBM Research received the highest overall ranking for US English speech data in the 2006 NIST Spoken Term Detection (STD) evaluation. This work was carried out jointly by researchers from Watson Research Center and from HRL. The collaboration between the labs in this area continues to date.
Related Projects
IBM HRL is also a major contributor of speech processing and retrieval technologies in European FP6 project SAPIR - Search in Audiovisual content using P2P IR and in the FP7 project HERMES - Cognitive Care and Guidance for Active Aging.
