Skip to main content

Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading

  • Conference paper
  • First Online:
Biometric Recognition (CCBR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10568))

Included in the following conference series:

Abstract

Acoustic speech recognition, as a technique to decode text from a speech, receives a great success in recent years. The trained model of Ping An Technology (ShenZhen) Co., Ltd results in a word error rate (WER) of 8.4%, which shows competitive performance among popular business products. However, an assumption of the achievement is the quiet environment of the speech. In a noisy environment, the accuracy will decrease 10%–20%. For the improvement in such environment, a multi-modal biometric system integrating acoustic speech-recognition with sentence level lip-reading is designed. In several noisy situations, the 5.7% averaged word error rate (WER) of the results of our integrated system indicates a significant improvement to the pure acoustic speech-recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  2. Goldschen, A.J., Garcia, O.N., Petajan, E.D.: Continuous automatic speech recognition by lipreading. In: Shah, M., Jain, R. (eds.) Motion-Based Recognition, pp. 321–343. Springer, Dordrecht (1997). doi:10.1007/978-94-015-8935-2_14

    Chapter  Google Scholar 

  3. Maas, A.L., Xie, Z., Jurafsky, D., Ng, A.Y.: Lexicon-free conversational speech recognition with neural networks. In: NAACL (2015)

    Google Scholar 

  4. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)

    Google Scholar 

  5. Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A.: Audio visual speech recognition. Technical report, IDIAP (2000)

    Google Scholar 

  6. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. arXiv preprint arXiv:1512.02595 (2015)

  7. Song, W., Cai, J.: End-to-End Deep Neural Network for Automatic Speech Recognition, Stanford CS224D reports (2015)

    Google Scholar 

  8. Assael, Y.M.: LipNet: end-to-end sentence-level lipreading. In: ICLR (2017)

    Google Scholar 

Download references

Acknowledgments

This work was primarily supported by PingAn Deep Learning Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, J., Wang, Y., Liu, A., Xiao, J. (2017). Assistance of Speech Recognition in Noisy Environment with Sentence Level Lip-Reading. In: Zhou, J., et al. Biometric Recognition. CCBR 2017. Lecture Notes in Computer Science(), vol 10568. Springer, Cham. https://doi.org/10.1007/978-3-319-69923-3_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69923-3_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69922-6

  • Online ISBN: 978-3-319-69923-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics