Skip to main content

Advertisement

Log in

Lip reading of words with lip segmentation and deep learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Speech perception is recognized as a multimodal task, that is, it solicits more than one meaning. Lip reading, which superimposes visual signals to auditory signals, is useful and sometimes even necessary for understanding a message. Lip-reading is an area of great importance for a wide range of applications, such as silent dictation, speech recognition in noisy environment, improved hearing aids and biometrics. It is a difficult research subject in the field of computer vision, whose main purpose is to observe the movement of human lips from the video to identify the corresponding textual content. However, because of the limitations of lip changes and the richness of linguistic content, the increased difficulty of lip recognition slows down the development of lip language research topics. Recently, the development of deep learning in various fields gives us enough confidence to carry out the task of lip recognition. Unlike recognition of lip characteristics in traditional lip recognition, lip learning based on deep learning typically involves extracting features and understanding images using a network model. In this topic, we focus on the design of the acquisition, processing, and data recognition network framework for lip reading. In this work, we developed an accurate and robust algorithm, for lip reading. First, we extract the mouth region and segmented the mouth by using a proposed hybrid model with a new proposed edge based on a proposed filter, then we train our spatio-temporal model by the combination of Convolutional Neural Networks (CNN) and Bi-directional Gated Recurrent Units (Bi-GRU). Finally, we test our algorithm, and we get an evaluation of 90.38% of accuracy. The result shows the performance of our system by application of lip segmented as inputs to the proposed spatio-temporal model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Agrawal S, Omprakash VR (2016) Lip reading techniques: a survey. Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp 753–757

  2. Bradski G (2000) Opencv in Dr. Dobbs J Softw Tools

  3. Caselles V, Kimmel R, Sapiro G (1995) “Geodesic active contours”, IEEE Int Conf Comput Vis

  4. Chan T, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10:266–277

    Article  MATH  Google Scholar 

  5. Chen Y, Kang Y, Chen Y, Wang Z (2020) Probabilistic forecasting with temporal convolutional neural network. Neurocomputing J 399:491–501

  6. Cheng S, Ma P, Tzimiropoulos G, Petridis S, Bulat A, Shen J; Pantic M (2020) “Towards Pose-Invariant Lip-Reading”, In Proceedings of the ICASSP 2020– IEEE Int Conf Acoustics Speech Signal Process (ICASSP), pp. 4357–4361

  7. Chung JS, Zisserman A (2016) Lip reading in the wild. Asian Conference on Computer Vision, pp 87–103

  8. Courtney L, Sreenivas R (2019) Using deep convolutional LSTM networks for learning spatiotemporal features. In: Proceedings of the ACPR 2019: Pattern Recogn

  9. Dahl GE, Sainath TN: Hinton GE (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–30 May 2013

  10. Danielis A, Giorgi D (2017) Lip segmentation based on Lambertian shadings and morphological operators for hyper-spectral images. Pattern Recogn 63:1–48

  11. Eveno, N., Caplier, A., and Coulon, P.Y: “Automatic and accurate lip tracking”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, N°.5, pp.706–715, (2004).

  12. Houneida S, Ramzi M, Houneida S, Ramzi M, Mohamed A, Mourad S, Moncef T (2019) Moving towards a 5D cardiac model. Journal of Flow Visualization & Image Processing 26:19–48

    Article  Google Scholar 

  13. Jiang H, Feng R, Gao X (2011) Level set based on signed pressure force function and its application in liver image segmentation. Wuhan University Journal of Natural Sciences 16(3):265–270

    Article  Google Scholar 

  14. Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1:321–331

    Article  MATH  Google Scholar 

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems 1:1097–1105

    Google Scholar 

  16. Li C, Xu C (2010) Distance regularized level set evolution and its application to image segmentation. IEEE Trans Image Process 19(12):3243–3254

    Article  MathSciNet  MATH  Google Scholar 

  17. Li C, Kao C-Y, Gore J et al (2008) Minimization of region scalable fitting energy for image segmentation. IEEE Trans Image Process 17(10):1940–1949

  18. Liew A, Leung SH, Lau WH (2003) Segmentation of color lip images by spatial fuzzy clustering. IEEE Trans Fuzzy Syst 11(1):542–549

    Article  Google Scholar 

  19. Luo M, Yang S, Shan S, Chen X (2020) Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, pp 273–280

  20. Martinez B, Ma P, Petridis S, Maja P (2020) Lipreading using temporal convolutional networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6319–6323

  21. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp 689–696

  22. Paragios N, Deriche R (2002) Geodesic active regions and level set methods for supervised texture segmentation. Int J Comput Vis 46(3):223–247

    Article  MATH  Google Scholar 

  23. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp 91–99

  24. Ronneberger O, Fischer P (2015) Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 234–241

  25. Stafylakis T, Tzimiropoulos G (2017) Combining residual networks with LSTMs for Lipreading. ISCA Interspeech

  26. Wand M, Koutnic J, Schmidhuber J (2016) Lipreading with long shortterm memory. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp 6115–6119

  27. Wang L, Wu H, Pan C (2013) Region-based image segmentation with local signed difference energy. Pattern Recogn Lett 34:637–645

    Article  Google Scholar 

  28. Xiao J, Yang S, Zhang Y, Shan S, Chen X (2020) “Deformation Flow Based Two-Stream Network for Lip Reading”, Proceedings of the 2020 15th IEEE Int Conf Automatic Face Gesture Recogn, pp. 364–370

  29. Yan X, Li X, Zhang L, LI F (2010) Robust lip segmentation method based on level set model. In: Proceedings of the 11th Pacific Rim Conference Multimedia, pp 731–739

  30. Yangyang H, Hong L, Jinhua C (2016) Robust lip segmentation based on complexion mixture model. Conference: Pacific Rim Conference on Multimedia 9916:85–94

    Google Scholar 

  31. Zhu H, Chen H, Brown R (2018) A sequence-to-sequence model-based deep learning approach for recognizing activity of daily living for senior care. J Biomed Inform 84:148–158

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malek Miled.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miled, M., Messaoud, M.A.B. & Bouzid, A. Lip reading of words with lip segmentation and deep learning. Multimed Tools Appl 82, 551–571 (2023). https://doi.org/10.1007/s11042-022-13321-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13321-0

Keywords

Navigation