|

Commercially available large vocabulary systems can perform
well when the speaker is calm and in a quiet environment,
but accuracy sharply degrades when the speaker is subject
to stress or when there is background noise. Background noise
is a problem in part because it makes it difficult to isolate
speech, and in part because speakers shout to improve communication,
a phenomenon known as the Lombard effect (e.g., see Pisoni
et al., Junqua
and Angelade, and Lippmann
et al.). Even changing the style of speech to address
a different audience can affect recognition performance.
Tanner Labs has studied innovative training procedures and
system architectures to obtain
a recognizer that is robust to noisy environments and speaker
variability. One promising approach that we are currently
developing is using both audio and visual
cues to recognize speech.
|