Skip to main content

“Well Adjusted”: Using Robust and Flexible Speech Recognition Capabilities in Clean to Noisy Mobile Environments

  • Chapter
  • First Online:
Book cover Advances in Speech Recognition

Abstract

Speech-based interfaces increasingly penetrate environments that can benefit from hands-free and/or eyes-free operations. In this chapter, a new speech-enabled framework that aims at providing a rich interactive experience for smartphone users is presented. This framework is based on a conceptualization that divides the mapping between the speech acoustical microstructure and the ­spoken implicit macrostructure into two distinct levels, namely, the signal level and ­linguistic level. At the signal level, a front-end processing that aims at improving the performance of Distributed Speech Recognition (DSR) in noisy mobile environments is performed. At this low level, the Genetic Algorithms (GAs) are used to optimize the combination of conventional Mel-Frequency Cepstral Coefficients (MFCCs) with Line Spectral Frequencies (LSFs) and formant-like (FL) features. The linguistic level involves a dialog scheme to overcome the limitations of ­current human–computer interactive applications that are mostly using constrained grammars. For this purpose, conversational intelligent agents capable of learning from their past dialog experiences are used. The Carnegie Mellon PocketSphinx engine for speech recognition and the Artificial Intelligence Markup Language (AIML) for pattern matching are used throughout­ our experiments. The evaluation results show that the inclusion of both the GA-based front-end processing and the AIML-based conversational agents leads to a significant improvement in effectiveness and performance of an interactive spoken dialog system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Addou, D., Selouani, S.-A., Kifaya, K., Boudraa, M., and Boudraa, B. (2009) A noise-robust front-end for distributed speech recognition in mobile communications. International Journal of Speech Technology, ISSN 1381–2416, (pp. 167–173)

    Google Scholar 

  2. ALICE (2005) Artificial Intelligence Markup Language (AIML) Version 1.0.1, AI Foundation. Retrieved october 23, 2009, from http://alicebot.org/TR/2005/WD-aiml

  3. Ben Aicha, A., and Ben Jebara, S. (2007) Perceptual Musical Noise Reduction using Critical Band Tonality Coefficients and Masking Thresholds. INTERSPEECH Conference, (pp. 822–825), Antwerp, Belgium

    Google Scholar 

  4. Benesty, J., Sondhi, MM., and Huang, Y. (2008) Handbook of Speech Processing. 1176 p. ISBN: 978–3–540–49128–6. Springer, New York

    Book  Google Scholar 

  5. Boll, S.F. (1979) Suppression of acoustic noise in speech using spectral substraction. IEEE Transactions on Acoustic, Speech and Signal Processing, 29, (pp. 113–120)

    Article  Google Scholar 

  6. Davis, S., and Mermelstein, P. (1980) Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), (pp. 357–366)

    Article  Google Scholar 

  7. ETSI (2003) Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithm. Technical Report. ETSI ES 201 (pp. 108)

    Google Scholar 

  8. Garner, P., and Holmes, W. (1998) On the robust incorporation of formant features into Hidden Markov Models for automatic speech recognition. Proceedings of IEEE ICASSP, (pp. 1–4)

    Google Scholar 

  9. Gong, Y. (1995) Speech recognition in noisy environments: A survey. Speech Communications, 16, (pp. 261–291)

    Article  Google Scholar 

  10. Hermansky, H. (1990) Perceptual linear predictive (PLP) analysis of speech, Journal of Acoustical Society America, 87(4), (pp. 1738–1752)

    Article  Google Scholar 

  11. Hirsch, H.-G., Dobler, S., Kiessling, A., and Schleifer, R. (2006) Speech recognition by a portable terminal for voice dialing. European Patent EP1617635

    Google Scholar 

  12. Houk, C.R., Joines, J.A., and Kay, M.G. (1995) A genetic algorithm for function optimization: a MATLAB implementation. Technical report 95–09. North Carolina University-NCSU-IE

    Google Scholar 

  13. Huang, J., Marcheret, E., and Visweswariah, K. (2005) Rapid Feature Space Speaker Adaptation For Multi-Stream HMM-Based Audio-Visual Speech Recognition. Proc. International Conference on Multimedia and Expo, Amsterdam, The Netherlands

    Google Scholar 

  14. Huggins-Daines, D., Kumar, M., Chan, A., Black, A., Ravishankar, M., and Rudnicky, A. (2006) Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of ICASSP, Toulouse, France

    Google Scholar 

  15. Itakura, F. (1975) Line spectrum representation of linear predictive coefficients of speech signals. Journal of the Acoustical Society of America, 57(1), (p. s35)

    Article  Google Scholar 

  16. ITU-T (1996a) Recommendation G.723.1. Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s

    Google Scholar 

  17. ITU-T (1996b) Recommendation G.712. Transmission performance characteristics of pulse code modulation channels

    Google Scholar 

  18. Loizou, P. (2007) Speech Enhancement Theory and Practice. 1st Edition, CRC Press

    Google Scholar 

  19. Man, K.F., Tang K.S, and Kwong, S. (2001) Genetic Algorithms Concepts and Design. Springer, New York

    Google Scholar 

  20. Michalewicz, Z. (1996) Genetic Algorithms + Data Structure = Evolution Programs Adaptive. AI series, Springer, New York

    Google Scholar 

  21. Nichols, J., Chau, D.H., and Myers, B.A. (2007) Demonstrating the viability of automatically generated user interfaces. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1283–1292)

    Google Scholar 

  22. O’Shaughnessy, D. (2001) Speech communication: human and machine. IEEE Press, New York

    Google Scholar 

  23. Paek, T., and Chickering, D. (2007) Improving command and control speech recognition: Using predictive user models for language modeling. User Modeling and User-Adapted Interaction Journal, 17(1), (pp. 93–117)

    Article  Google Scholar 

  24. Rose, R. and Momayyez, P. (2007) Integration of multiple feature sets for reducing ambiguity in automatic speech recognition. Proceedings of IEEE-ICASSP, (pp. 325–328)

    Google Scholar 

  25. Selouani, S.A., Tang-Hô, L., Benahmed, Y., and O’Shaughnessy, D. (2008) Speech-enabled tools for augmented Interaction in e-learning applications. Special Issue of International Journal of Distance Education Technologies, IGI publishing, 6(2), (pp. 1–20)

    Article  Google Scholar 

  26. Schmid, P., and Barnard, E. (1995) Robust n-best formant tracking. Proceedings of EUROSPEECH, (pp. 737–740)

    Google Scholar 

  27. Shah, S.A.A., Ul Asar, A., and Shah, S.W. (2007) Interactive Voice Response with Pattern Recognition Based on Artificial Neural Network Approach. International Conference on Emerging Technologies, (pp. 249–252). IEEE

    Google Scholar 

  28. Sing, G.O., Wong, K.W., Fung, C.C., and Depickere, A. (2006) Towards a more natural and intelligent interface with embodied conversation agent. Proceedings of international conference on Game research and development (pp. 177–183), Perth, Australia

    Google Scholar 

  29. Soong, F., and Juang, B. (1984) Line Spectrum Pairs (LSP) and speech data compression. Proceedings of IEEE-ICASSP, (pp. 1–4), San Diego, USA

    Google Scholar 

  30. Sphinx (2009) The CMU Sphinx Group Open Source Speech Recognition Engines. Retrieved October 23, 2009 from (http://cmusphinx.sourceforge.net/)

  31. Tian, B., Sun, M., Sclabassi, R.J., and Yi, K. (2003) A Unified Compensation Approach for Speech Recognition in Severely adverse Environment. 4 th International Symposium on Uncertainty Modeling and Analysis, (pp. 256–259)

    Google Scholar 

  32. Tolba, H., Selouani, S.-A., and O’Shaughnessy, D. (2002a) Comparative Experiments to Evaluate the Use of Auditory-based Acoustic Distinctive Features and Formant Cues for Automatic Speech Recognition Using a Multi-Stream Paradigm. International Conference of Speech and Language Processing ICSLP’02, (pp. 2113–2116)

    Google Scholar 

  33. Tolba, H., Selouani, S.-A., and O’Shaughnessy, D. (2002b) Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. Proceedings of the ICASSP, (pp. 837–840), Orlando, USA

    Google Scholar 

  34. Tollervey, N.H. (2006) Program#- An AIML Chatterbot in C#. Retrieved August 23, 2009 from: http://ntoll.org/article/project-an-aiml-chatterbot-in-c Northamptonshire, United Kingdom

    Google Scholar 

  35. Wallace, R. (2004) The elements of AIML style. Alice AI Foundation

    Google Scholar 

Download references

Acknowledgments

This research was funded by the Natural Sciences and Engineering Research Council of Canada and the Canada Foundation for Innovation. The author would like to thank Yacine Benahmed, Kaoukeb Kifaya, and Djamel Addou for their contributions to the development of the experimental platforms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sid-Ahmed Selouani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Selouani, SA. (2010). “Well Adjusted”: Using Robust and Flexible Speech Recognition Capabilities in Clean to Noisy Mobile Environments. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-5951-5_5

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5950-8

  • Online ISBN: 978-1-4419-5951-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics