Commercially available large vocabulary systems can perform well when the speaker is calm and in a quiet environment, but accuracy sharply degrades when the speaker is subject to stress or when there is background noise. Background noise is a problem in part because it makes it difficult to isolate speech, and in part because speakers shout to improve communication, a phenomenon known as the Lombard effect (e.g., see Pisoni et al., Junqua and Angelade, and Lippmann et al.). Even changing the style of speech to address a different audience can affect recognition performance.

Tanner Labs has studied innovative training procedures and system architectures to obtain a recognizer that is robust to noisy environments and speaker variability. One promising approach that we are currently developing is using both audio and visual cues to recognize speech.



