Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

Wang, Jianzong; Wang, Yiwen; Liu, Aozhi; Xiao, Jing

doi:10.1007/978-3-319-69923-3_64

Jianzong Wang²³,
Yiwen Wang²³,
Aozhi Liu²³ &
…
Jing Xiao²³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10568))

Included in the following conference series:

Chinese Conference on Biometric Recognition

3691 Accesses
1 Citations

Abstract

Acoustic speech recognition, as a technique to decode text from a speech, receives a great success in recent years. The trained model of Ping An Technology (ShenZhen) Co., Ltd results in a word error rate (WER) of 8.4%, which shows competitive performance among popular business products. However, an assumption of the achievement is the quiet environment of the speech. In a noisy environment, the accuracy will decrease 10%–20%. For the improvement in such environment, a multi-modal biometric system integrating acoustic speech-recognition with sentence level lip-reading is designed. In several noisy situations, the 5.7% averaged word error rate (WER) of the results of our integrated system indicates a significant improvement to the pure acoustic speech-recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Goldschen, A.J., Garcia, O.N., Petajan, E.D.: Continuous automatic speech recognition by lipreading. In: Shah, M., Jain, R. (eds.) Motion-Based Recognition, pp. 321–343. Springer, Dordrecht (1997). doi:10.1007/978-94-015-8935-2_14
Chapter Google Scholar
Maas, A.L., Xie, Z., Jurafsky, D., Ng, A.Y.: Lexicon-free conversational speech recognition with neural networks. In: NAACL (2015)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
Google Scholar
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A.: Audio visual speech recognition. Technical report, IDIAP (2000)
Google Scholar
Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. arXiv preprint arXiv:1512.02595 (2015)
Song, W., Cai, J.: End-to-End Deep Neural Network for Automatic Speech Recognition, Stanford CS224D reports (2015)
Google Scholar
Assael, Y.M.: LipNet: end-to-end sentence-level lipreading. In: ICLR (2017)
Google Scholar

Download references

Acknowledgments

This work was primarily supported by PingAn Deep Learning Group.

Author information

Authors and Affiliations

Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
Jianzong Wang, Yiwen Wang, Aozhi Liu & Jing Xiao

Authors

Jianzong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Aozhi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Xiao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Computing and Technology, Chinese Academy of Sciences, Beijing, China
Yong Xu
Shenzhen University, Shenzhen, China
Linlin Shen
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yu Qiao
Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
Zhenhua Guo
Shenzhen University, Shenzhen, China
Shiqi Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Wang, Y., Liu, A., Xiao, J. (2017). Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading. In: Zhou, J., et al. Biometric Recognition. CCBR 2017. Lecture Notes in Computer Science(), vol 10568. Springer, Cham. https://doi.org/10.1007/978-3-319-69923-3_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-69923-3_64
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69922-6
Online ISBN: 978-3-319-69923-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics