Speech Enhancement Paradigm

  • Sid-Ahmed Selouani
Part of the SpringerBriefs in Electrical and Computer Engineering book series


Speech enhancement techniques aim at improving the quality and intelligibility of speech that has been degraded by noise. The goal of speech enhancement varies according to the needs of specific applications, such as to increase the overall speech quality or intelligibility, to reduce listener fatigue or to improve the global performance of an ASR embedded in a voice communication system. This chapter begins by giving a background on noise and its estimation and reviews some well-known methods of speech enhancement. It also provides an overview of the various assessment methods used to evaluate speech enhancement algorithms in terms of quality and intelligibility.


Speech enhancement Noise Spectral subtraction Statistical techniques Subspace decomposition Perceptual methods Enhancement evaluation 


  1. 1.
    Abad, A., Pellegrini, T., Trancoso, I., Neto, J., “Context dependent modelling approaches for hybrid speech recognizers,” In Proc. Interspeech, pp. 2950-2953, 2010.Google Scholar
  2. 2.
    Abolhassani, A., Selouani, S.-A., O’Shaughnessy, D., Harkat, M.F., “Speech Enhancement Using PCA and Variance of the Reconstruction Error Model Identification,” in Proc. Interspeech, pp. 974-977, Belgium, August 2007.Google Scholar
  3. 3.
    Ajith, A, Nedjah, N., Mourelle, L.D.M., “Evolutionary Computation: from Genetic Algorithms to Genetic Programming,” Studies in Computational Intelligence (SCI), Springer-Verlag Berlin Heidelberg, 13, pp.120, 2006.Google Scholar
  4. 4.
    ALICE, AI Foundation, “Artificial Intelligence Markup Language (AIML),” A.L.I.C.E. AI Foundation Working Draft, 8 August 2005 (rev 008), Scholar
  5. 5.
    Allen, J.B., “How do Humans Process and Recognize Speech?,” IEEE Trans. Acoust., Speech, Signal Process., 2(4), 567-577, 1994.Google Scholar
  6. 6.
    Akaike, H., “Information theory and an extension of the maximum likelihood principle,” in Proc. 2nd International Symposium on Information Theory, Petrov and Caski, Eds., pp 267-281, 1974.Google Scholar
  7. 7.
    Akbacak, M., Hansen, J. H. L., “Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems,” IEEE Trans. Acoust., Speech, Signal Process., vol. 15, no. 2, pp. 465477, Feb. 2007.Google Scholar
  8. 8.
    Baker, J. E., “Reducing Bias and Inefficiency in the Selection Algorithm,” in Proc. Second International Conference on Genetic Algorithms and their Application, pp. 14-21, 1987.Google Scholar
  9. 9.
    Bartkova K., Jouvet D., “Multiple Models for Improved Speech recognition For Non-Native Speakers,” in Proc. 9th Conference of Speech and Computer, pp. 22-28, St. Petersburg, Russia 2004.Google Scholar
  10. 10.
    Ben Aicha, A., Ben Jebara, S., “Perceptual Musical Noise Reduction using Critical Band Tonality Coefficients and Masking Thresholds,” Interspeech Conf., pp. 822-825, Antwerp, Belgium, 2007.Google Scholar
  11. 11.
    Benahmed, Y., Selouani, S.A., O’Shaughnessy, D., “Real-life Speech-Enabled System to Enhance Interaction with RFID Networks in Noisy Environments,” in Proc. IEEE ICASSP, pp. 1781-1788, May 2011.Google Scholar
  12. 12.
    Benesty, J., Makino, S., Chen, J., “Speech Enhancement,” Springer Series: Signals and Communication Technology, 406 pages, 2005.Google Scholar
  13. 13.
    Benesty, J., M.M. Sondhi, M.M., Huang, Y., Springer Handbook of Speech Processing. Springer-Verlag, Berlin, Germany, 2007.Google Scholar
  14. 14.
    Boll, S.F., “Suppression of acoustic noise in speech using spectral substraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, pp. 113-120, 1979.Google Scholar
  15. 15.
    Bourlard, H., Morgan, N., Connectionist Speech Recognition: A Hybrid Approach, Kluwer Publisher, 1994.Google Scholar
  16. 16.
    Brown, R., Exploring New Speech Recognition And Synthesis APIs In Windows Vista, MSDN magazine, January 2006.Google Scholar
  17. 17.
    Cadzow, J., “Signal Enhancement. A Composite Property Mapping Algorithm,” EEE Trans. Acoust., Speech, Signal Process., ASSP-36, pp. 49-62, 1988.Google Scholar
  18. 18.
    Caelen, J., Space/Time data-information in the ARIAL project ear model, Speech Communications, 4(1), 1985.Google Scholar
  19. 19.
    Carré, R., Descout, R., Eskenazi, M., Mariani, J., and Rossi, M. “French language database: defining, planning and recording a large database,” in Proc. IEEE ICASSP, pp. 324-327, 1984.Google Scholar
  20. 20.
    Chen, L.-Y., Lee, C.-J., Jang, J.-S. R., “Minimum phone error discriminative training for Mandarin Chinese speaker adaptation”, in Proc. Interspeech, pp. 1241-1244, 2008.Google Scholar
  21. 21.
    Chen, M.Y., “Acoustic correlates of English and French nasalized vowels”, J. Acoust. Soc. Am., vol. 102 (4), pp. 2360-2370, 1997.CrossRefGoogle Scholar
  22. 22.
    Ching-Ta, L. “Enhancement of single channel speech using perceptual-decision-directed approach,” Speech communication, Elsevier, vol. 53(4), pp. 495-507, 2011.Google Scholar
  23. 23.
    Chomsky, N., & Halle, M., Sound pattern of English. New York: Harper and Row, 1968.Google Scholar
  24. 24.
    Cohen, I., “Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement,” IEEE Signal Processing Letters 9(1), pp. 12-15, 2002.CrossRefGoogle Scholar
  25. 25.
    Cohen, I., “Noise Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging,” IEEE Trans. Acoust., Speech, Signal Process., 11(5), pp.466-475, 2003.Google Scholar
  26. 26.
    Correa, A., Gonzalez, A., Ladino, C., “Genetic Algorithm Optimization for Selecting the Best Architecture of a Multi-Layer Perceptron Neural Network: A Credit Scoring Case,” SAS Global Forum 2011 Data Mining and Text Analytics, paper 149-2011, 2011.Google Scholar
  27. 27.
    Crochiere, R. E., Tribolet, J. E. and Rabiner, L. R., “An interpretation of the Log Likelihood Ratio as a measure of waveform coder performance,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 3, 1980Google Scholar
  28. 28.
    Davis, S., & Mermelstein, P., “Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speech, Signal Process., 28(4), 357-366, 1980.Google Scholar
  29. 29.
    Davis, L., The genetic algorithm handbook. Ed. New York: Van Nostrand Reinhold, 1991.Google Scholar
  30. 30.
    Delvaux, V., Soquet, A., “Discriminant analysis of nasal vs. oral vowels in French: comparison between different parametric representations,” in Proc. Interspeech, pp. 647-650, 2001.Google Scholar
  31. 31.
    Delvaux, V., Metens, T., Soquet, A., “French nasal vowels: acoustic and articulatory properties”, in Proc. of the 7th International Conference on Spoken Language Processing, Denver, 1, pp. 53-56, 2002.Google Scholar
  32. 32.
    Dempster, A. P., Laird, N. M. and Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm,” Journal of Royal Statistical Society, Vol. 39, pp. 1-38, 1977.MathSciNetMATHGoogle Scholar
  33. 33.
    Dendrinos, M., Bakamidis, S. and Carayannis, G., Speech enhancement from noise: a regenerative approach.Speech Communication, vol. 10, no. 1, pp. 45-57, 1991.Google Scholar
  34. 34.
    Deng, L., O’Shaughnessy, D., Speech processing: a dynamic and optimization-oriented approach. Marcel Dekker Inc., New York, NY., 2003.Google Scholar
  35. 35.
    Diethorn, E.J., Subband noise reduction methods for speech enhancement. Gay, S.L., Benesty, J. (Eds.), Acoustic Signal Processing for Telecommunications, Kluwer Academic, Boston, 2000.Google Scholar
  36. 36.
    O’Shaughnessy, D., Speech communication: Human and machine. IEEE Press, 2001.Google Scholar
  37. 37.
    Eiben A. E., and Smith J. E., Introduction to Evolutionary Computing. Springer, Natural Computing Series, 2nd printing, ISBN: 978-3-540-40184-1, 2007.Google Scholar
  38. 38.
    Ephraim, Y., Mallah, D., “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimation”, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp.1109-1121, Dec. 1984.Google Scholar
  39. 39.
    Ephraim, Y., Merhav, N., “Lower and upper bounds on the minimum mean-square error in composite source signal estimation”, Information Theory, IEEE Transactions on, vol.38, no.6, pp.1709-1724, Nov 1992.MathSciNetMATHCrossRefGoogle Scholar
  40. 40.
    Ephraim, Y., Wilpon, J.G., and Rabiner, L.R., “A Linear Predictive Front-End Processor for Speech Recognition in Noisy Environments”, Proc. IEEE ICASSP, pp.1324-1327, 1987.Google Scholar
  41. 41.
    Ephraim, Y., Van Trees, H.L., “A signal subspace approach for speech enhancement”, IEEE Trans. Acoust., Speech, Signal Process., 3(4), 251-266, 1995.Google Scholar
  42. 42.
    Ephraim Y., “Speech Enhancement Systems Using State Dependent Dynamical System Model”, IEEE Trans. on Speech and Audio Processing, SAP–3(4): pp. 251-266, 1995.Google Scholar
  43. 43.
    Cant-Paz, E., Efficient and Accurate Parallel Genetic Algorithms Springer Series: Genetic Algorithms and Evolutionary Computation, Vol. 1, Springer eds., 184 p., 2000.Google Scholar
  44. 44.
    Fiscus J.G., “A Post-Processing System to Yield Reduced Word Error Rates: Recogniser Output Voting Error Reduction (ROVER)”, In Proc. IEEE ASRU Workshop, pp. 347-352, Santa Barbara, 1997.Google Scholar
  45. 45.
    Fogel, D. B., Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, Wiley-IEEE Press, 3rd edition, 2005.Google Scholar
  46. 46.
    Fugen, C., Holzapfel, H., Waibel, A., “Tight coupling of speech recognition and dialog management - dialog-context dependent grammar weighting for speech recognition”, In Proc. Interspeech, pp. 169-172, 2004.Google Scholar
  47. 47.
    Gales M.J.F., and Young, S.J., “Cepstral parameter compensation for HMM recognition”, Speech communication, vol. 12, pp. 231-239, 1993.CrossRefGoogle Scholar
  48. 48.
    Gavsic, M., Young, S., “Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager”, ACM Trans. Speech Lang. Process., vol. 7, issue 3, Article 4, May 2011.Google Scholar
  49. 49.
    Goldberg, D.E., Genetic algorithms in search, optimization and machine learning. Addison-Wesley publishing, 1989.Google Scholar
  50. 50.
    Goldwater, S., Jurafsky, D., Manning, C.D. “Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates”, Speech Communication, Elsevier, pp. 181-200, 2010.Google Scholar
  51. 51.
    Gong Y., “Speech Recognition in Noisy Environments: A survey”, Speech Communication, 16, pp. 261-291, 1995.CrossRefGoogle Scholar
  52. 52.
    Gorrell, G., Lewin, I., Rayner, M., “Adding intelligent help to mixed initiative spoken dialogue systems”, In Proc. 7th International Conference on Spoken Language Processing (ICSLP), pp. 2065-2068, 2002.Google Scholar
  53. 53.
    Gori M., Scarselli, F., “Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification?”, IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI–20(11): pp. 1121-1132, 1998.Google Scholar
  54. 54.
    Graciarena, M., Franco, H., “Unsupervised noise model estimation for model-based robust speech recognition,” In Proc. ASRU IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 351-356, 2003.Google Scholar
  55. 55.
    Hacioglu, K., Ward, W., “Dialog-context dependent language modeling combining n-grams and stochastic context-free grammars”, Proc. IEEE ICASSP, (ICASSP ’01), pp. 537-540, vol.1, 2001.Google Scholar
  56. 56.
    Hagen, S., Morris A., “Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR”, Computer Speech and Language, Elsevier, (19), pp. 3-30, 2005.Google Scholar
  57. 57.
    Haverinen H., Salmela P., Hakkinen J., Lehtokangas M., and Saarinen J., “MLP Network for Enhancement of Noisy MFCC Vectors”, in Proc. Interspeech, pp. 2371-2374, 1999.Google Scholar
  58. 58.
    Sorensen, H.B.D., “A Cepstral Noise Reduction Multi–layer Neural Network”, in Proc. IEEE ICASSP, pp. 933–936, 1991.Google Scholar
  59. 59.
    Henderson, J., Lemon, O., “Mixture model POMDPs for efficient handling of uncertainty in dialogue management” In Proc. 46th Annual Meeting of the Association for Computational Linguistics (ACL08), pp. 73-76, 2008.Google Scholar
  60. 60.
    Hermansky, H., “Perceptual Linear Predictive (PLP) Analysis of Speech,” J. Acoust. Soc. Am., 87(4), pp. 1738-1752, April, 1990.Google Scholar
  61. 61.
    Hermansky, H., & Morgan, N., “RASTA Processing of Speech,” IEEE Trans. on Audio and Speech Process., ASP-2(4), pp. 578-589, October, 1994.Google Scholar
  62. 62.
    Hermus, K., Wambacq, P., and Van hamme, H., “A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition,” EURASIP Journal on Advances in Signal Processing, vol. 2007, Article ID 45821, 15 pages, 2007.Google Scholar
  63. 63.
    Hernando, J., & Nadeu, C., “A comparative study of parameters and distances for noisy speech recognition,” in Proc. Interspeech, 91-94, pp. 1991.Google Scholar
  64. 64.
    Hirsch, H. G., Pearce, D., “The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions,” ISCA ITRW ASR2000, Paris, September 2000.Google Scholar
  65. 65.
    Houk, C. R., Joines, J. A., Kay, M. G., “A Genetic Algorithm for function optimization: a matlab implementation”, North Carolina University-NCSU-IE, technical report 95–09, 1995.Google Scholar
  66. 66.
    Cambridge University Speech Group, The HTK Book (Version 3.4.1), Cambridge University Group, March 2009.Google Scholar
  67. 67.
    Hu, Y., Loizou, P., “Subjective evaluation and comparison of speech enhancement algorithms”, Speech Communication, Elsevier, 49, pp. 588-601, 2007.Google Scholar
  68. 68.
    ITU-T, P.835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm”, ITU-T Rec. P.835, 2003.Google Scholar
  69. 69.
    ITU-T, P.862, “Perceptual evaluation of speech quality (PESQ), and objective method for end-to end speech quality assessment of narrowband telephone networks and speech codecs”, ITU-T Rec. P.862, 2000.Google Scholar
  70. 70.
    Jabloun, F., Champagne, B., “Incorporating the human hearing properties in the signal subspace approach for speech enhancement”, IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 700-708, Nov. 2003.CrossRefGoogle Scholar
  71. 71.
    Jakobson, R., Fant, G., & Halle, M., Preliminaries to speech analysis: The distinctive features and their correlates. MIT Press, Cambridge, 1963.Google Scholar
  72. 72.
    Jankowski, C., Kalyanswamy, A., Basson, S., and Spitz, J., “NTIMIT: A Phonetically Balanced, Continuous Speech, Telephone Bandwidth Speech Database”, in Proc. IEEEICASSP, Vol.I, pp. 109–112, 1990.Google Scholar
  73. 73.
    Jelinek, F., Statistical Methods for Speech Recognition, MIT Press 1997.Google Scholar
  74. 74.
    Jung, Y., “Improving Robustness in Jacobian Adaptation for Noisy Speech Recognition”, in Proc. 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems, Springer-Verlag, Berlin, Heidelberg, pp. 168-175, 2008.Google Scholar
  75. 75.
    Jurafsky, D., Martin, J. H., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, 2nd edition, 2008.Google Scholar
  76. 76.
    Kang, S., Lee, S., Seo, J., “Dialogue Strategies to Overcome Speech Recognition Errors in Form-Filling Dialogue”, in Proc. of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL ’09), Springer-Verlag, pBerlin, 2009.Google Scholar
  77. 77.
    Kamper, H., Niesler, T.R., “Characterisation and simulation of telephone channels using the TIMIT and NTIMIT databases”, in Proc. of the Twentieth annual symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, November 2009.Google Scholar
  78. 78.
    Kim, K., Lee, C., Jung, S., and Lee, G. G., “A frame-based probabilistic framework for spoken dialog management using dialog examples”, in Proc. 9th SIGdial Workshop on Discourse and Dialogue, pp.120-127, 2008.Google Scholar
  79. 79.
    Klatt, D.H., “Prediction of perceived phonetic distance from critical band spectra: a first step”, in Proc. IEEE-ICASSP, Paris, pp. 1278-1281, May 1982.Google Scholar
  80. 80.
    Kumar, A., Hansen, J.H.L., “Environment mismatch compensation using average eigenspace for speech recognition”, in Proc. Interspeech, pp.1277-1280, 2008.Google Scholar
  81. 81.
    Lauri, F., Illina, I., Fohr, D., Korkmazsky, F., “Using genetic algorithms for rapid speaker adaptation”, in Proc. Eurospeech, pp. 1497-1500, 2003.Google Scholar
  82. 82.
    Ledesma R. D., “Determining the Number of Factors to Retain in EFA: an easy-to-use computer program for carrying out Parallel Analysis”, Practical Assessment, Research & Evaluation PAR&E online, Vol. 12., 2007.Google Scholar
  83. 83.
    Lee C. H., Gauvain, J.L., “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, in Proc. IEEE ICASSP, Minneapolis, Minnesota, pp. 558-561, 1993.Google Scholar
  84. 84.
    Legetter, C.J., Woodland, P.C., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer speech and language, Vol. 9, pp. 171-185, 1995.Google Scholar
  85. 85.
    Levin, E., Pieraccini, R., “A Stochastic Model of Human-Machine Interaction for learning dialog Strategies”, IEEE Trans. Speech, Audio Process., pp. 11-23, 2000.Google Scholar
  86. 86.
    Li, W.H., Yue, H., Valle-Cervantes S., Qin, S. J., “Recursive PCA for adaptive process monitoring”, Journal of Process Control, 10(5), pp.471-486, 2000.CrossRefGoogle Scholar
  87. 87.
    Lim, J.S., Oppenheim, A.V., “Enhancement and bandwidth compression of noisy speech”, Proceedings of IEEE, vol. 67, pp. 1586-1604, 1979.CrossRefGoogle Scholar
  88. 88.
    Loizou, P., Speech Enhancement Theory and Practice, 1st Edition, CRC Press, 2007.Google Scholar
  89. 89.
    Malinowski, F.R., Factor Analysis in Chemistry. Wiley-Inter-science, New York, 1991.MATHGoogle Scholar
  90. 90.
    Mansour, D., Juang, B.H., “A family of distorsion measures based upon projection operation for robust speech recognition”, IEEE Trans. Acoust., Speech, Signal Process., 37, pp. 1659-1671, 1989.Google Scholar
  91. 91.
    Mari, J.F., “HMM and Selectively Neural Networks for Connected Confusable Word Recognition”, International Conference Speech and Language Processing, pp. 1519-1522, 1994.Google Scholar
  92. 92.
    Martin, R., “Spectral subtraction based on minimum statistics”, in Proc. of European Signal Processing Conference (EUSIPCO), pp. 1182-1185, 1994.Google Scholar
  93. 93.
    Martin, R., “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics”, IEEE Trans. Speech, and Audio Processing, 9(5), pp. 504-512, 2001.Google Scholar
  94. 94.
    Michalewicz, Z., Genetic Algorithms + Data Structure = Evolution programs. AI series. Springer-Verlag, New York, 1996.Google Scholar
  95. 95.
    Mokbel, C., “Online adaptation of HMMs to real-life conditions: a unified framework”, IEEE Trans. Speech, and Audio Process., Vol.9, No 4, pp. 342-357, May 2001.Google Scholar
  96. 96.
    Moreno, P. J., Stern, R., “Sources of degradation of speech recognition in the telephone network”, in Proc. IEEE ICASSP, Vol.1, pp. 109-112, 1994.Google Scholar
  97. 97.
    Nguyen, P., Wellekens, C., Junqua, J., “Eigenspace and MLLR for speech recognition in noisy environments”, in Proc. Eurospeech, vol. 6, Sep. 1999, pp. 2519-2522, 1999.Google Scholar
  98. 98.
    Nguyen, D., Widrow, B., “Improving the Learning Speed of Two-Layer Neural Networks by Choosing Initial Values of the Adaptative Weights”, in Proc. of IJCNN, Vol. 3, pp. 21-26, 1990.Google Scholar
  99. 99.
    Oja, E., “Neural Networks, Principal Components, and Subspaces”, Inter. Journ. of Neural Systems, IJNS–1(1), pp 61–68, 1989.Google Scholar
  100. 100.
    Paek, T., Chickering, D., “Improving command and control speech recognition: Using predictive user models for language modeling”, User Modeling and User-Adapted Interaction Journal, 17(1), pp. 93-117, 2007.CrossRefGoogle Scholar
  101. 101.
    Paek, T., Pieraccini, R., “Automating spoken dialogue management design using machine learning: an industry perspective,” Speech Communication (50), Elsevier, pp. 716-729, 2008.Google Scholar
  102. 102.
    Picone, J., Signal modeling techniques in speech recognition, Proceedings of the IEEE, 81(9), 1215-1247, 1993.CrossRefGoogle Scholar
  103. 103.
    Povey, D., Gales, M.J.F., Kim, D.Y. and Woodland, P.C., “MMI-MAP and MPE-MAP for acoustic model adaptation”, in Proc. of Eurospeech, pp. 1891-1894, 2003.Google Scholar
  104. 104.
    Povey, D., Discriminative training for large vocabulary speech recognition, Ph.D. Dissertation, Department of Engineering, University of Cambridge, UK, 2004.Google Scholar
  105. 105.
    Qin, S. J., Dunia, R., “Determining the number of principal components for best reconstruction”, in IFAC DYCOPS’98, Greece, June 1998.Google Scholar
  106. 106.
    Quackenbush, S., Barnwell, T., Clements, M., Objective Measures of Speech Quality Englewood Cliffs, NJ: Prentice-Hall, 1988.Google Scholar
  107. 107.
    Rabiner, L.R., “A tutorial on HMM and selected applications in speech recognition”, Proceedings of IEEE, pp. 257-286, Vol. 77, No. 2, 1989.Google Scholar
  108. 108.
    Rabiner, L., Juang, B. H., Fundamentals of Speech Recognition, Prentice-Hall, 1993.Google Scholar
  109. 109.
    Racine, I., Detey, S., Buehler, N., Schwab, S., Zay, F., Kawaguchi, Y., “The production of French nasal vowels by advanced Japanese and Spanish learners of French: a corpus-based evaluation study”, in Proc. of New Sounds 2010 - Sixth International Symposium on the Acquisition of Second Language Speech, pp.367-372, 2010.Google Scholar
  110. 110.
    Rangachari, S., Loizou, P., “A noise estimation algorithm for highly nonstationary environments”, Speech Communication, 28, pp. 220-231, 2006.CrossRefGoogle Scholar
  111. 111.
    Rasheed, K., Hirsh., H., “Guided Crossover: A New Operator for Genetic Algorithm Based Optimization”, In Proc. of the Congress on Evolutionary Computation, pp. 1535-1541, 1997 Google Scholar
  112. 112.
    Raut, C.K., Yu, K., Gales, M.J.F., “Adaptive training using discriminative mapping transforms”, in Proc. Interspeech, pp.1697-1700, 2008.Google Scholar
  113. 113.
    Rennie, S., Kristjansson, T., Olsen, P., Gopinath, R., “Dynamic noise adaptation”, In Proc. IEEE ICASSP, Vol. 1, pp. 1197-1200, 2006.Google Scholar
  114. 114.
    Rezayee A., Gazor, S., “An adaptive KLT approach for speech enhancement”, IEEE Trans. Speech, and Audio Process., vol. 9, no. 2, pp. 87-95, 2001.Google Scholar
  115. 115.
    Rigoll, G., “Maximum Mutual Information Neural Networks for Hybrid Connectionist-HMM Speech Recognition Systems”, IEEE Trans. Speech, and Audio Process., Vol. 2, No. 1, Special Issue on Neural Networks for Speech Processing, pp. 175-184. 1994.Google Scholar
  116. 116.
    Rissanen, J., “Modeling by shortest data description”, Automatica, 14, pp. 465-471, 1978.MATHCrossRefGoogle Scholar
  117. 117.
    Roy, N., Pineau, J., Thrun, S., “Spoken dialogue management using probabilistic reasoning” in Proc. 38th Annual Meeting of the Association for Computational Linguistics (ACL00), 2000.Google Scholar
  118. 118.
    Russel, R.L., Bartley, C., “The Autoregressive Backpropagation Algorithm”, in Proc. of IJCNN, pp. 369-377, 1991.Google Scholar
  119. 119.
    Sagayama, S., Yamaguchi, Y., Takahashi, S., and Takahashi, J., “Jacobian approach to fast acoustic model adaptation”, in Proc. IEEE ICASSP, pp. 835-838, 1997.Google Scholar
  120. 120.
    Samir, M.A., Automatic Evaluation of Real-Time Multimedia Quality: a Neural Network Approach. Phd. Thesis, IFSIC-IRISA, Rennes University (France), 2003.Google Scholar
  121. 121.
    Selouani S.-A., Tolba H., and O’Shaughnessy D., “Robust automatic speech recognition in low-SNR car environments by the application of a connectionist subspace-based approach to the MEL-based cepstral coefficients”, in Proc. of Eurospeech, pp. 1577–1560, 2001.Google Scholar
  122. 122.
    Selouani, S.A., O’Shaughnessy, D., “A Hybrid HMM/Autoregressive Time-Delay Neural Network Automatic Speech Recognition System”, in Proc. European Signal Processing Conference (EUSIPCO), paper 108, 4 pages, September 2002.Google Scholar
  123. 123.
    Selouani S.-A., and O’Shaughnessy D., “Robustness of speech recognition using genetic algorithms and a Mel-cepstral subspace approach”, in Proc. IEEE ICASSP, Vol.I, pp. 201–204, 2004.Google Scholar
  124. 124.
    Selouani, S.A., O’Shaughnessy, D., “Speaker adaptation using evolutionary-based linear transform”, in Proc. of International Conference on Spoken Language Processing, pp.1109-1112, Pittsburgh, November 2006.Google Scholar
  125. 125.
    Selouani, S.A., “Using Robust and Flexible Speech Recognition Capabilities in Clean to Noisy Mobile Environments”, Advances in Speech Recognition: Mobile environments, Call Centers and Clinics, Neustein, Amy (Ed.), pp. 91-112, 2010.Google Scholar
  126. 126.
    Shah, S.A.A., Ul Asar, A., Shah, S.W., “Interactive Voice Response with Pattern Recognition Based on Artificial Neural Network Approach”, International IEEE conference on Emerging Technologies, pp. 249-252, 2007.Google Scholar
  127. 127.
    Shukla, A., Tiwari, R., Kala, R., Real Life Applications of Soft Computing”. CRC Press, ISBN: 1439822875, 686 pages, 2010.Google Scholar
  128. 128.
    Singh, S., Litman, D., Kearns, M., Walker, M., “Dialogue Management with Reinforcement Leaning: Experiments with the NJFun System.” Optimizing Journal of Artificial Intelligence, Vol. 16, pp. 105-133, 2002.Google Scholar
  129. 129.
    Sivanandam, S.N., Deepa, S.N., Introduction to Genetic Algorithms, Springer eds., 1st edition, 2007.Google Scholar
  130. 130.
    Sohn, J., Sung, W., “A voice activity detector employing soft decision based noise spectrum adaptation”, in Proc. IEEE ICASSP, pp.365-368, 1998.Google Scholar
  131. 131.
    Spalanzani, A., Selouani, S.A., Kabre, H., “Evolutionary algorithms for optimizing speech data projection”, Genetic and Evolutionary Computation Conference, Orlando, pp. 1799, 1999.Google Scholar
  132. 132.
    Temby, L., Vamplew, P., Berry, A., “Accelerating Real-Valued Genetic Algorithms Using Mutation-With-Momentum,” Springer Lecture Notes in Computer Science series, Australian joint conference on artificial intelligence, vol. 3809, pp. 1108-1111, Sidney, Australia, 2005.Google Scholar
  133. 133.
    Fisher, W.M., Dodington, G.R., Goudie-Marshall, K.M., “The DARPA Speech Recognition Research Database: Specification and Status”, in Proc. DARPA Workshop on Speech Recognition, pp. 93–99, 1986.Google Scholar
  134. 134.
    Tohkura, Y., “A weighted cepstral distance measure for speech recognition,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, pp. 1414-1422, Oct.1987.Google Scholar
  135. 135.
    Tolba, H., Selouani, S.A., O’Shaughnessy, D., “Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm”, in Proc. IEEE ICASSP (ICASSP’2002), pp. 837-840, 2002.Google Scholar
  136. 136.
    Tufts, D.W., Kumaresan, R., Kirsteins, I., “Data adaptive signal estimation by singular value decomposition of a data matrix”, Proceedings of the IEEE, vol. 70, no. 6, pp. 684-685, 1982.CrossRefGoogle Scholar
  137. 137.
    Uebel, L.F., Woodland, P.C., “Discriminative linear transforms for speaker adaptation,” in Proc. of ISCA ITRW Adaptation Methods for Automatic Speech Recognition. Sophia-Antipolis, France, pp. 61-63, 2001.Google Scholar
  138. 138.
    Uemura, Y., Takahashi, Y., Saruwatari, H., Shikano, K., Kondo, K., “Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation”, In IEEE Int. Conf. Acoust. Speech Signal Process., pp.4433-4436, 2009.Google Scholar
  139. 139.
    Visweswariah, K., Printz, H., (2001): “Language models conditioned on dialog state”, in Proc. Eurospeech, pp. 251-254, 2001.Google Scholar
  140. 140.
    Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K., “Phoneme Recognition Using Time Delay Neural Networks”, IEEE Trans. Acoust., Speech, Signal Process., 37, pp.328–339, 1989.Google Scholar
  141. 141.
    Wang, S., Sekey A., and Gersho A., “An objective measure for predicting subjective quality of coders”, IEEE Journal on Selected Areas Commun., (10), pp. 819-829, 1992.Google Scholar
  142. 142.
    Wang, Z., Schultz, T., and Waibel, A., Comparison of Acoustic Model Adaptation Techniques on Non-Native Speech, in ICASSP-IEEE, pp. 540–543, 2003.Google Scholar
  143. 143.
    Wang, L., Discriminative linear transforms for adaptation and adaptive training, Ph.D. Dissertation, Department of Engineering, University of Cambridge, UK, 2006.Google Scholar
  144. 144.
    Wang, L., and Woodland, P.C., MPE-based discriminative linear transforms for speaker adaptation Computer Speech and Language, Volume 22, Issue 3, pp. 256-272, 2008.Google Scholar
  145. 145.
    Wang, L., Woodland, P.C., “MPE-based discriminative linear transform for speaker adaptation”, Proc. IEEE-ICASSP, Vol. I, pp. 321–324, 2004.Google Scholar
  146. 146.
    Williams, J., and Young, S., “Partially observable Markov decision processes for spoken dialog systems”, Computer Speech and Language, Elsevier, (21), pp. 393422, 2007.Google Scholar
  147. 147.
    Woodland, P.C., and Povey, D., “Large scale discriminative training of hidden markov models for speech recognition”, Computer Speech and Language 16, 2547, 2002.CrossRefGoogle Scholar
  148. 148.
    Yang, S., Bosch, L.T., Boves, L., “Hybrid HMM/BLSTM-RNN for robust speech recognition” 13th international conference on Text, speech and dialogue (TSD’10), Springer-Verlag, Berlin, Heidelberg, pp. 400-407, 2010.Google Scholar
  149. 149.
    Yang, W., Dixon, M., Yantorno, R., “A modified bark spectral distortion measure which uses noise masking threshold” IEEE Speech Coding Workshop, pp. 55-56, Pocono Manor, 1997.Google Scholar
  150. 150.
    Yang, W., Benbouchta, M., Yantorno, R., “Performance of a modified bark spectral distortion measure as an objective speech quality measure”, IEEE ICASSP, pp.541-544, Seattle, 1998.Google Scholar
  151. 151.
    Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., and Yu, K., “The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management.”, Computer Speech and Language 24(2), pp. 150-174, 2010.Google Scholar
  152. 152.
    Yu, K., Gales, M., Woodland, P.C., “Unsupervised Adaptation With Discriminative Mapping Transforms”, Audio, Speech, and Language Processing, IEEE Transactions on, vol.17, no.4, pp.714-723, May 2009.Google Scholar
  153. 153.
    Zadeh, L. A., “Fuzzy Logic, Neural Networks, and Soft Computing” Communications of the ACM, March 1994, Vol. 37 No. 3, pages 77-84.MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Université de MonctonMonctonCanada

Personalised recommendations