Silent Speech Recognition

  • Amaresh P. Kandagal
  • V. Udayashankara
  • M. A. Anusuya
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 801)


Speech is essential to exchange information. Speech recognition is one of the interfaces for man-machine interaction. However, the performance of these systems is restricted to noisy acoustic conditions. Silent speech i.e. visual dynamic features of speech have more potential information for Human-Computer Interaction. This paper presents lip localization and segmentation by Otsu algorithm. The height and width parameters of lip movements are captured as visual cues for silent speech recognition. We develop stochastic visual word models with an in-house database of 20 subjects. Performance evaluation these models are measured by word error rate. The accuracy of the system recorded for speaker dependent female subjects is 84.6%, and 65.8% as an overall result.


Lip reading Otsu Speech recognition HMM 


  1. 1.
    Petajan, E.: Automatic lip reading to enhance speech recognition. In: IEEE Proceedings of Global Telecommunications Conference, Atlanta, GA, pp. 265–272 (1984)Google Scholar
  2. 2.
    Kandagal, A.P., Udayashankara, V.: Automatic bimodal audiovisual speech recognition a review. In: IEEE International Conference on Contemporary Computing and Informatics, Mysore, India, pp. 940–945 (2014).
  3. 3.
    Tareque, M.H., Al Hasan, A.S.: Human lips-contour recognition and tracing. Int. J. Adv. Res. Artif. Intell. 3, 47–51 (2014)Google Scholar
  4. 4.
    Luettin, J., Thacker, N.A., Beet, S.W.: Speech reading using shape and intensity information. In: 4th International Conference on Speech and Language Processing, vol. 1, pp. 58–61 (1996)Google Scholar
  5. 5.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)CrossRefzbMATHGoogle Scholar
  6. 6.
    Hassanat, A.B.A., Jassim, S.: Color-based lip localization method. In: Proceedings of SPIE - The International Society for Optical Engineering (2010).
  7. 7.
    Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of visual features for lip-reading. Trans. Pattern Anal. Mach. Intell. 24, 198–213 (2002)CrossRefGoogle Scholar
  8. 8.
    Eveno, N., Caplier, A., Coulon, P.Y.: New color transformation for lips segmentation. In: 4th IEEE Workshop on Multimedia Signal Processing, pp. 3–8 (2001)Google Scholar
  9. 9.
    The MathWorks Inc.: MATLAB User Guide, vol. 4 (1998)Google Scholar
  10. 10.
    Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital image processing using MATLAB, vol. 2. Gatesmark Publishing, Knoxville (2009)Google Scholar
  11. 11.
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Cambridge (2009)Google Scholar
  12. 12.
    Jun, H., Hua, Z.: Research on visual speech feature extraction. In: Proceedings of the International Conference on Computer Engineering and Technology, vol. 2, pp. 499–502 (2009)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Sri Siddhartha Institute of TechnologyTumkurIndia
  2. 2.Sri Jayachamarajendra College of EngineeringMysuruIndia

Personalised recommendations