Analysis of Induced Color for Automatic Detection of ROI in Multipose AVSR System

  • Amarsinh Varpe
  • Prashant Borde
  • Sadhana Sukale
  • Pallavi Perdeshi
  • Pravin Yannawar
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 340)


Visual speech information plays an important role in automatic speech recognition (ASR), but the problem of visual speech decoding remained open in pose variation. Face detection proposed by ‘Viola-Jones’ based on image statistic is most popular, but the accuracy of the method is not enough to detect facial features in multipose scenario. In this paper we compared and proposed advanced skin color detection method for automatic isolation of region-of-interest based on induced and non-induced lip color over ‘Viola-Jones’ algorithm for multi-pose audio visual speech recognition system. The ‘Viola-Jones’ algorithm was widely used for detection of face components (eyes, nose and mouth) and offers accurate face detection for full frontal visual stream but it’s performance dramatically degrades for non-frontal poses whereas the efficiency of our proposed system for induced lip-color based isolation scheme is 100 % each and ROI isolation for non-induced lip color is 100, 92.67 and 93.4 % each applicable for full front, 45° and side pose profile respectively.


Skin color pixels Facial feature Mouth detection Multi-pose state mouth 



The Authors gratefully acknowledge support by the Department of Science and Technology (DST) for providing financial assistance for Major Research Project sanctioned under Fast Track Scheme for Young Scientist, vide sanction number SERB/1766/2013/14 and the authorities of Dr. Babasaheb Ambedkar Marathwada University, Aurangabad (MS) India, for providing the infrastructure for this research work.


  1. 1.
    Chow, G., Li, X.: Towards a system for automatic facial feature detection. Pattern Recogn.Google Scholar
  2. 2.
    Viola, P.A.M.J.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE Confernce on Computer vision and Pattern RecognitionGoogle Scholar
  3. 3.
    Petjan, E., Bischoff, B., Bodoff, D.: An improved automatic lip reading system to enhance speech recognition. Technical report TM 11251-871012-11, AT&T Bell Labs (1987)Google Scholar
  4. 4.
    Bergler, C.: Improving connected letter recognition by lip reading. In: IEEE (1993)Google Scholar
  5. 5.
    Yuhas, B.P., Goldstien, M.H., Sejn owski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Commun. Mag. 0163-6804/89/0011-0066 (1989)Google Scholar
  6. 6.
    Duchnowski, P.: Toward movement invariant automatic lip reading and speech recognition. In: IEEE (1995)Google Scholar
  7. 7.
    Luetin, J., et al.: Visual speech recognition using active shape model and hidden Markov model. IEEE (1996)Google Scholar
  8. 8.
    Sum, K.L., et al.: A new optimization procedure for extracting the point based lip contour using active shape model. IEEE (2001)Google Scholar
  9. 9.
    Capiler, A.: Lip detection and tracking. In: 11th International Conference on Image Analysis and Processing (2001)Google Scholar
  10. 10.
    Matthews, I., Cootes, T.F., Banbham, J.A., Cox, S., Harvey, R.: Extraction of visual features of lip-reading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), (2002)Google Scholar
  11. 11.
    Hong, X., et al.: A PCA based visual DCT feature extraction method for lip reading. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 321–326 (2006)Google Scholar
  12. 12.
    Saitoh, T., Morishita, K., Konishi, R.: Analysis of efficient lip reading method for various languages. In: 19th International Conference on Pattern Recognition (ICPR 2008) pp. 1–4, 8–11 (2008)Google Scholar
  13. 13.
    Li, M., Cheung, Y.: A novel motion based lip feature extraction for lip reading. In: IEEE International Conference on Computational Intelligence and Security, pp. 361–365 (2008)Google Scholar
  14. 14.
    Yannawar, P.L., Manza, G.R., Gawali, B.W., Mehrotra, S.C.: Detection of redundant frame in audio visual speech recognition using low level analysis. In: IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Coimbatore (TN), India, 28–29 Dec 2010. E-ISBN 978-1-4244-5967-4Google Scholar
  15. 15.
    Estellers, V., Thiran, J.P.: Multi-pose lip reading and audio-visual speech recognition. EUROSIP J. Adv. Signal Process. 2012, 51 (2012)CrossRefGoogle Scholar
  16. 16.
    Morade, S.S., Patnaik, S.: A novel lip reading algorithm by using localized ACM and HMM: tested for digit recognition. Elsevier, Amsterdam (2014)Google Scholar
  17. 17.
    Soetedjo, A., Yamada, K., Limpraptono, F.Y.: Lip detection based-on normalized RGB chromaticity diagram. In: The 6th International Conference on Information and Communication Technology and Systems (2010), ISSN: 2085-1944Google Scholar
  18. 18.
    Tamura, S., Iwano, K., Furui, S.: Multi-modal speech recognition using optical-flow analysis for lip images. J. VLSI Signal Process. 36, 117–124 (2004)Google Scholar
  19. 19.
    Tian, Y., Kanade, T., Cohn, J.: Robust lip tracking by combining shape, color and motion. In: National Laboratory of Pattern Recognition Chinese Academy of Sciences, Beijing, China (2000)Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Amarsinh Varpe
    • 1
  • Prashant Borde
    • 1
  • Sadhana Sukale
    • 1
  • Pallavi Perdeshi
    • 1
  • Pravin Yannawar
    • 1
  1. 1.Vision and Intelligent System Lab, Department of Computer Science and ITDr. Babasaheb Ambedkar Marathwada UniversityAurangabadIndia

Personalised recommendations