Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition
We propose a multi-stream articulator model (MSAM) for audio visual speech recognition (AVSR). This model extends the articulator modelling technique recently used in audio-only speech recognition to audio-visual domain. A multiple-stream structure with a shared articulator layer is used in the model to mimic the speech production process. We also present an adaptive reliability measure (ARM) based on two local dispersion indicators, integrating audio and visual streams with local, temporal reliability. Experiments on the AVCONDIG database shows that our model can achieve comparable recognition performance with the multi-stream hidden Markov model (MSHMM) under various noisy conditions. With the help of the ARM, our model even performs the best at some testing SNRs.
KeywordsVisual Speech Visual Stream Audio Speech Babble Noise Reliability Pair
Unable to display preview. Download preview PDF.
- 1.Xie, L.: Research on Key Issues of Audio Visual Speech Recognition. Ph.D Thesis of Northwestern Polytechnical University (2004)Google Scholar
- 3.Bilmes, J.A., Zweig, G., et al.: Discrimiatively Structured Graphical Models for Speech Recognition. Technical Report of JHU 2001 Summer Workshop (2001)Google Scholar
- 4.Saenko, K., Livescu, K., Glass, J., Darrell, T.: Production Domain Modeling of Pronunciation for Visual Speech Recognition. In: Proc. ICASSP 2005, Philadelphia (2005)Google Scholar
- 5.Adjoudani, A., Benoit, C.: On the Integration of Auditory and Visual Parameters on an HMM-based ASR. In: Stork, D.G., Hennecke, M.E. (eds.) Speechreading by Humans and Machines, pp. 461–471. Springer, Berlin (1996)Google Scholar
- 6.Lucey, S.: Audio-Visual Speech Processing. Ph.D Thesis of Queensland University of Technology (2002)Google Scholar
- 7.Xie, L., Zhao, R.C., Liu, Z.Q.: Adaptive Stream Reliability Modelling based on Local Dispersion Measures for Audio Visual Speech Recognitin. In: Proc. ICMLC 2005, Guangzhou, China (2005)Google Scholar