The Visual Computer

, Volume 30, Issue 3, pp 245–257 | Cite as

Audio-visual speech recognition techniques in augmented reality environments

  • Mohammad Reza MirzaeiEmail author
  • Seyed Ghorshi
  • Mohammad Mortazavi
Original Article


Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.


Augmented reality Audio-visual speech recognition Augmented reality environments Communication Deaf people 


  1. 1.
    Sherman, W.R., Craig, A.B.: Understanding Virtual Reality. Morgan Kaufmann, San Mateo (2003) Google Scholar
  2. 2.
    Cawood, S., Falia, M.: Augmented Reality: A Practical Guide. Pragmatic Bookshelf (2008) Google Scholar
  3. 3.
    Arusoaie, A., Cristei, A.I., Livadariu, M.A., Manea, V., Iftene, A.: Augmented reality. In: Proc. of the 12th Int. Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 502–509. IEEE Comput. Soc., Los Alamitos (2010) Google Scholar
  4. 4.
    Schmalstieg, D., Wagner, D.: Experiences with handheld augmented reality. In: Proc. of the 6th Int. Symposium on Mixed and Augmented Reality, Japan, pp. 3–15. IEEE Press/ACM, New York (2007) Google Scholar
  5. 5.
    Silva, R., Oliveira, J.C., Giraldi, G.A.: Introduction to Augmented Reality. National Laboratory for Scientific Computation. LNCC research report No. 25, Brazil (2003) Google Scholar
  6. 6.
    Lange, B.S., Requejo, P., Flynn, S.M., Rizzo, A.A., Cuevas, F.J., Baker, L., Winstein, C.: The potential of virtual reality and gaming to assist successful aging with disability. J. Phys. Med. Rehabil. Clin. N. Am. 21(2), 339–356 (2010) CrossRefGoogle Scholar
  7. 7.
    Zainuddin, N.M., Zaman, H.B.: Augmented reality in science education for deaf students: preliminary analysis. Presented at Regional Conf. on Special Needs Education, Faculty of Education, Malaya Univ (2009) Google Scholar
  8. 8.
    Zayed, H.S., Sharawy, M.I.: ARSC: an augmented reality solution for the education field. Int. J. Comput. Educ. 56, 1045–1061 (2010) Google Scholar
  9. 9.
    Passig, D., Eden, S.: Improving flexible thinking in deaf and hard of hearing children with virtual reality technology. Am. Ann. Deaf 145(3), 286–291 (2000) CrossRefGoogle Scholar
  10. 10.
    Kalra, A., Singh, S., Singh, S.: Speech recognition. Int. J. Comput. Sci. Netw. Secur. 10(6), 216–221 (2010) Google Scholar
  11. 11.
    Mosbah, B.B.: Speech recognition for disabilities people. In: Proc. of the 2nd Information and Communication Technologies (ICTTA), Syria, pp. 864–869 (2006) Google Scholar
  12. 12.
    Mihelic, F., Zibert, J.: Speech Recognition, Technologies and Applications. InTech Open Access Publisher (2008) Google Scholar
  13. 13.
    Bailly, G., Vatikiotis, E., Perrier, P.: Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004) Google Scholar
  14. 14.
    Lipovic, I.: Speech and Language Technologies, InTech Open Access Publisher (2011) Google Scholar
  15. 15.
    Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000) CrossRefGoogle Scholar
  16. 16.
    Navarathna, R., Lucey, P., Dean, D., Fookes, C., Sridharan, S.: Lip detection for audio-visual speech recognition in-car environment. In: Proc. of the 10th Int. Conf. on Information Science, Signal Processing and Their Applications, pp. 598–601 (2010) Google Scholar
  17. 17.
    Shen, P., Tamura, S., Hayamizu, S.: Evaluation of real-time audio-visual speech recognition. Presented at Int. Conf. on Audio-Visual Speech Processing, Japan (2010) Google Scholar
  18. 18.
    Zainuddin, N.M.M., Zaman, H.B., Ahmad, A.: Developing augmented reality book for deaf in science: the determining factors. In: Proc. of the Int. Symposium in Information Technology (ITSim), pp. 1–4. IEEE Press, Los Alamitos (2010) Google Scholar
  19. 19.
    Lopez-Ludena, V., San-Segundo, R., Martin, R., Sanchez, D., Garcia, A.: Evaluating a speech communication system for deaf people. J. Latin Am. Trans. 9(4), 556–570 (2011) Google Scholar
  20. 20.
    Irawati, S., Green, S., Billinghurst, M., Duenser, A., Ko, H.: Move the couch where? Developing an augmented reality multimodal interface. In: Proc. of 5th Int. Symposium on Mixed and Augmented Reality, pp. 183–186. IEEE/ACM, Los Alamitos (2006) Google Scholar
  21. 21.
    Hanlon, N., Namee, B.M., Kelleher, J.D.: Just Say It: an evaluation of speech interfaces for augmented reality design applications. In: Proc. of the 20th Irish Conf. on Artificial and Cognitive Science (AICS), pp. 134–143 (2009) Google Scholar
  22. 22.
    Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Cohen, P., Feiner, S.: Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Proc. of the 5th Int. Conf. on Multimodal Interfaces, Vancouver, BC, Canada, pp. 12–19. ACM, New York (2003) CrossRefGoogle Scholar
  23. 23.
    Goose, S., Sudarsky, S., Zhang, X., Navab, N.: Speech-enabled augmented reality supporting mobile industrial maintenance. Int. J. Pervasive Comput. Commun. 2(1), 65–70 (2003) CrossRefGoogle Scholar
  24. 24.
    Chin, S.W., Ang, L.M., Seng, K.P.: Lips detection for audio-visual speech recognition system. Presented at Int. Symposium on Intelligent Signal Processing and Communication Systems, Thailand (2008) Google Scholar
  25. 25.
    Adobe Systems Inc.: Adobe flash builder (2011). Accessed 10 June 2011
  26. 26.
    Open Computer Vision Library: Open AVSR Alpha 1 (2011). Accessed 12 May 2011
  27. 27.
    Bradski, G., Kaehler, A.: Learning OpenCV: computer vision with the OpenCV library. O’Reilly Media, Sebastopol (2008) Google Scholar
  28. 28.
    Liu, X.X., Zhao, Y., Pi, X., Liang, L.H., Nefian, A.V.: Audio-visual continuous speech recognition using a coupled hidden Markov model. In: Proc. of the 7th Int. Conf. on Spoken Language Processing, Denver, CO, pp. 213–216 (2002) Google Scholar
  29. 29.
    Braunstein, R., Wright, M.H., Noble, J.J.: ActionScript 3.0 Bible. Wiley, New York (2007) Google Scholar
  30. 30.
    Transmote: FLARManager: augmented reality in flash (2011). Accessed 8 May 2011
  31. 31.
    Spark Project Team: FLARToolKit (2011). Accessed 8 May 2011
  32. 32.
    Hohl, W.: Interactive environment with open-source software: 3D walkthrough and augmented reality for architects with blender 2.43, DART 3.0 and ARToolkit 2.72. Springer Vienna Architecture (2008) Google Scholar
  33. 33.
    Hello Enjoy Company: Papervision 3D (2011). Accessed 8 May 2011
  34. 34.
    Spark Project Team: Marilena face detection (2011). Accessed 14 June 2011

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mohammad Reza Mirzaei
    • 1
    Email author
  • Seyed Ghorshi
    • 1
  • Mohammad Mortazavi
    • 1
  1. 1.School of Science and EngineeringSharif University of TechnologyKish IslandIran

Personalised recommendations