![]() |
![]() |
![]() |
![]() |
|
| Distributed Speech Recognition | |||
|
|
|||
|
|
Distributed Speech Recognition
Project Summary: Distributed Speech Recognition (DSR) is an innovative technology that is making speech recognition practical for mobile devices. DSR works by splitting the processing required for speech recognition between the mobile device and network servers, instead of sending the speech data to the server and having all the processing done there. Beginning the processing on the mobile device, or 'front-end', enables the device itself to extract spectral features from the speech. These features are compressed, error protected, and transmitted over the wireless channel to the server, or 'back-end'. Once the compressed features have arrived at the server, the server can then convert the incoming stream of features into text. DSR technology dramatically improves recognition performance, while minimizing the memory and CPU requirements on the device. This is achieved by using a noise robust front-end and by eliminating the detrimental effects of low bit-rate coding and channel errors that occur when speech itself is sent over a communications channel to the server. Speech reconstruction is done by using the pitch and voicing, together with the information from the features themselves, and reconstructing a speech waveform. In this way, users’ voice can also be listened to at the receiving server. This is extremely useful for scenarios that include voice response services where the content is sensitive (e.g., financial transactions), or for reviewing how live customers interact with voice service applications. This can help developers tune the grammar and dialog for voice services.
|
| About IBM | Privacy | Legal | Contact |