Lip reading of words with lip segmentation and deep learning

Miled, Malek; Messaoud, Mohammed Anouar Ben; Bouzid, Aicha

doi:10.1007/s11042-022-13321-0

Lip reading of words with lip segmentation and deep learning

Published: 08 June 2022

Volume 82, pages 551–571, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Malek Miled¹,
Mohammed Anouar Ben Messaoud^1,2 &
Aicha Bouzid¹

853 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Speech perception is recognized as a multimodal task, that is, it solicits more than one meaning. Lip reading, which superimposes visual signals to auditory signals, is useful and sometimes even necessary for understanding a message. Lip-reading is an area of great importance for a wide range of applications, such as silent dictation, speech recognition in noisy environment, improved hearing aids and biometrics. It is a difficult research subject in the field of computer vision, whose main purpose is to observe the movement of human lips from the video to identify the corresponding textual content. However, because of the limitations of lip changes and the richness of linguistic content, the increased difficulty of lip recognition slows down the development of lip language research topics. Recently, the development of deep learning in various fields gives us enough confidence to carry out the task of lip recognition. Unlike recognition of lip characteristics in traditional lip recognition, lip learning based on deep learning typically involves extracting features and understanding images using a network model. In this topic, we focus on the design of the acquisition, processing, and data recognition network framework for lip reading. In this work, we developed an accurate and robust algorithm, for lip reading. First, we extract the mouth region and segmented the mouth by using a proposed hybrid model with a new proposed edge based on a proposed filter, then we train our spatio-temporal model by the combination of Convolutional Neural Networks (CNN) and Bi-directional Gated Recurrent Units (Bi-GRU). Finally, we test our algorithm, and we get an evaluation of 90.38% of accuracy. The result shows the performance of our system by application of lip segmented as inputs to the proposed spatio-temporal model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features

RETRACTED ARTICLE: Application of deep learning in Mandarin Chinese lip-reading recognition

Article Open access 05 September 2023

Lip-Reading Based on Deep Learning Model

References

Agrawal S, Omprakash VR (2016) Lip reading techniques: a survey. Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp 753–757
Bradski G (2000) Opencv in Dr. Dobbs J Softw Tools
Caselles V, Kimmel R, Sapiro G (1995) “Geodesic active contours”, IEEE Int Conf Comput Vis
Chan T, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10:266–277
Article MATH Google Scholar
Chen Y, Kang Y, Chen Y, Wang Z (2020) Probabilistic forecasting with temporal convolutional neural network. Neurocomputing J 399:491–501
Cheng S, Ma P, Tzimiropoulos G, Petridis S, Bulat A, Shen J; Pantic M (2020) “Towards Pose-Invariant Lip-Reading”, In Proceedings of the ICASSP 2020– IEEE Int Conf Acoustics Speech Signal Process (ICASSP), pp. 4357–4361
Chung JS, Zisserman A (2016) Lip reading in the wild. Asian Conference on Computer Vision, pp 87–103
Courtney L, Sreenivas R (2019) Using deep convolutional LSTM networks for learning spatiotemporal features. In: Proceedings of the ACPR 2019: Pattern Recogn
Dahl GE, Sainath TN: Hinton GE (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–30 May 2013
Danielis A, Giorgi D (2017) Lip segmentation based on Lambertian shadings and morphological operators for hyper-spectral images. Pattern Recogn 63:1–48
Eveno, N., Caplier, A., and Coulon, P.Y: “Automatic and accurate lip tracking”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, N°.5, pp.706–715, (2004).
Houneida S, Ramzi M, Houneida S, Ramzi M, Mohamed A, Mourad S, Moncef T (2019) Moving towards a 5D cardiac model. Journal of Flow Visualization & Image Processing 26:19–48
Article Google Scholar
Jiang H, Feng R, Gao X (2011) Level set based on signed pressure force function and its application in liver image segmentation. Wuhan University Journal of Natural Sciences 16(3):265–270
Article Google Scholar
Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1:321–331
Article MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems 1:1097–1105
Google Scholar
Li C, Xu C (2010) Distance regularized level set evolution and its application to image segmentation. IEEE Trans Image Process 19(12):3243–3254
Article MathSciNet MATH Google Scholar
Li C, Kao C-Y, Gore J et al (2008) Minimization of region scalable fitting energy for image segmentation. IEEE Trans Image Process 17(10):1940–1949
Liew A, Leung SH, Lau WH (2003) Segmentation of color lip images by spatial fuzzy clustering. IEEE Trans Fuzzy Syst 11(1):542–549
Article Google Scholar
Luo M, Yang S, Shan S, Chen X (2020) Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, pp 273–280
Martinez B, Ma P, Petridis S, Maja P (2020) Lipreading using temporal convolutional networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6319–6323
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp 689–696
Paragios N, Deriche R (2002) Geodesic active regions and level set methods for supervised texture segmentation. Int J Comput Vis 46(3):223–247
Article MATH Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp 91–99
Ronneberger O, Fischer P (2015) Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp 234–241
Stafylakis T, Tzimiropoulos G (2017) Combining residual networks with LSTMs for Lipreading. ISCA Interspeech
Wand M, Koutnic J, Schmidhuber J (2016) Lipreading with long shortterm memory. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp 6115–6119
Wang L, Wu H, Pan C (2013) Region-based image segmentation with local signed difference energy. Pattern Recogn Lett 34:637–645
Article Google Scholar
Xiao J, Yang S, Zhang Y, Shan S, Chen X (2020) “Deformation Flow Based Two-Stream Network for Lip Reading”, Proceedings of the 2020 15th IEEE Int Conf Automatic Face Gesture Recogn, pp. 364–370
Yan X, Li X, Zhang L, LI F (2010) Robust lip segmentation method based on level set model. In: Proceedings of the 11th Pacific Rim Conference Multimedia, pp 731–739
Yangyang H, Hong L, Jinhua C (2016) Robust lip segmentation based on complexion mixture model. Conference: Pacific Rim Conference on Multimedia 9916:85–94
Google Scholar
Zhu H, Chen H, Brown R (2018) A sequence-to-sequence model-based deep learning approach for recognizing activity of daily living for senior care. J Biomed Inform 84:148–158
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering School of Tunis, Tunis, Tunisia
Malek Miled, Mohammed Anouar Ben Messaoud & Aicha Bouzid
Faculty of Sciences of Tunis, Tunis, Tunisia
Mohammed Anouar Ben Messaoud

Authors

Malek Miled
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Anouar Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Aicha Bouzid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malek Miled.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miled, M., Messaoud, M.A.B. & Bouzid, A. Lip reading of words with lip segmentation and deep learning. Multimed Tools Appl 82, 551–571 (2023). https://doi.org/10.1007/s11042-022-13321-0

Download citation

Received: 11 February 2021
Revised: 02 February 2022
Accepted: 30 May 2022
Published: 08 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13321-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lip reading of words with lip segmentation and deep learning

Abstract

Access this article

Similar content being viewed by others

Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features

RETRACTED ARTICLE: Application of deep learning in Mandarin Chinese lip-reading recognition

Lip-Reading Based on Deep Learning Model

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lip reading of words with lip segmentation and deep learning

Abstract

Access this article

Similar content being viewed by others

Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features

RETRACTED ARTICLE: Application of deep learning in Mandarin Chinese lip-reading recognition

Lip-Reading Based on Deep Learning Model

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation