Computational Intelligence in Speech and Audio Processing: Recent Advances

  • Aboul Ella Hassanien
  • Gerald Schaefer
  • Ashraf Darwish
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 75)


Computational intelligence techniques have been used for the processing of speech and audio for several years. Some of the applications in speech processing where computational intelligences are extensively used include speech recognition, speaker recognition, speech enhancement, speech coding and speech synthesis, while in audio processing, computational intelligence applications include music classification, audio classification and audio indexing and retrieval. In this paper we provide an overview of recent applications of modern computational intelligence theory in the field of speech and audio processing.


Speech Recognition Speech Signal Speaker Recognition Speech Synthesis Speech Enhancement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bovbel, E., Tsishkou, D.: Belarussian speech recognition using genetic algorithms. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 185–204. Springer, Heidelberg (2000)Google Scholar
  2. Bugatti, A., Flammini, A., Migliorati, P.: Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP Journal on Applied Signal Processing 2002 (4), 372–378 (2002)Google Scholar
  3. Buscicchio, C., Grecki, P., Caponetti, L.: Speech emotion recognition using spiking neural networks. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 38–46. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. Cetin, O., Kantor, A., King, S., Bartels, C., Magimai-Doss, M., Frankel, J., Livescu, K.: An articulatory feature-based tandem approach and factored observation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 645–648 (2007)Google Scholar
  5. Corrigan, G., Massey, N., Schnurr, O.: Transition-based speech synthesis using neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 945–948 (2000)Google Scholar
  6. Czyzewski, A.: Automatic identification of sound source position employing neural networks and rough sets. Pattern Recognition Letters 24, 921–933 (2003)CrossRefGoogle Scholar
  7. Czyzewski, A., Szczerba, M.: Pitch estimation enhancement employing neural network-based music prediction. In: IASTED Intern. Conference, Artificial Intelligence and Soft Computing, pp. 413–418 (2002)Google Scholar
  8. Czyzewski, A.B.K., Skarzynski, H.: Diagnostic system for speech articulation and speech understanding. In: Meeting of the Acoustical Society of America (2002)Google Scholar
  9. Czyzewski, A., Kaczmarek, A., Kostek, B.: Intelligent processing of stuttered speech. Journal of Intelligent Information Systems 21(2), 143–171 (2003)CrossRefGoogle Scholar
  10. Ding, I.J.: Incremental MLLR speaker adaptation by fuzzy logic control. Pattern Recognition 40(11), 3110–3119 (2007)zbMATHCrossRefGoogle Scholar
  11. Faraj, M., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters 28(11), 1368–1382 (2007)CrossRefGoogle Scholar
  12. Fellenz, W., Taylor, J., Cowie, R., Douglas-Cowie, E., Piat, F., Kollias, C., Orovas, S., Apolloni, B.: On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 2, pp. 93–98 (2000)Google Scholar
  13. Frankel, J., Richmond, K., King, S., Taylor, P.: An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In: Proc. ICSLP, vol. 4, pp. 254–257 (2000)Google Scholar
  14. Guido, R., Pereira, J., Slaets, J.: Advances on pattern recognition for speech and audio processing. Pattern Recognition Letters 28(11), 1283–1284 (2007)CrossRefGoogle Scholar
  15. Halavati, R., Shouraki, S., Eshraghi, M., Alemzadeh, M., Ziaie, P.: A novel fuzzy approach to speech recognition. In: International Conference on Hybrid Intelligent Systems, pp. 340–345 (2004)Google Scholar
  16. Hendessi, F., Ghayoori, A., Gulliver, T.A.: A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM. ACM Transactions on Asian Language Information Processing 4(1), 38–52 (2005)CrossRefGoogle Scholar
  17. Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: World Congress on Neural Networks, pp. 45–50 (1996)Google Scholar
  18. Kostek, B., Czyzewski, A.: Employing fuzzy logic and noisy speech for automatic fitting of hearing aid. In: Meeting of the Acoustical Society of America (2001)Google Scholar
  19. Kung, S.Y., Hwang, J.N.: Neural networks for intelligent multimedia processing. Proceedings of the IEEE 86(6), 1244–1272 (1998)CrossRefGoogle Scholar
  20. Lewis, T., Powers, D.M.W.: Audio-visual speech recognition using red exclusion and neural networks. In: Australasian conference on Computer science, vol. 4, pp. 149–156 (2002)Google Scholar
  21. Lim, E., Seng, K., Tse, K.: RBF neural network mouth tracking for audio-visual speech recognition system. In: IEEE Region 10 Conference TENCON, pp. 84–87 (2004)Google Scholar
  22. Liu, J., Wang, Z., Xiao, X.: A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition. Pattern Recognition Letter 28(8), 912–920 (2007)CrossRefGoogle Scholar
  23. Meng, S., Zhang, Y.: A method of visual speech feature area localization. In: International Conference on Neural Networks and Signal Processing, vol. 2, pp. 1173–1176 (2003)Google Scholar
  24. Nakamura, S.: Statistical multimodal integration for audio-visual speech processing. IEEE Transactions on Neural Networks 13(4), 854–866 (2002)CrossRefGoogle Scholar
  25. Sadeghi, V., Yaghmaie, K.: Vowel recognition using neural networks. International Journal of Computer Science and Network Security 6(12), 154–158 (2006)Google Scholar
  26. Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion. In: IEEE International Conference on Recognition Multimedia, pp. 5–8 (2006)Google Scholar
  27. Selouani, S.A., O’Shaughnessy, D.: On the use of evolutionary algorithms to improve the robustness of continuous speech recognition systems in adverse conditions. EURASIP Journal on Applied Signal Processing 8, 814–823 (2003)Google Scholar
  28. Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: 5th IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61 (2006)Google Scholar
  29. Zwan, P., Szczuko, P., Kostek, B., Czyzewski, C.: Automatic singing voice recognition employing neural networks and rough sets. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 793–802. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Aboul Ella Hassanien
    • 1
  • Gerald Schaefer
    • 2
  • Ashraf Darwish
    • 3
  1. 1.Information Technology DepartmentCairo UniversityGizaEgypt
  2. 2.Department of Computer ScienceLoughborough UniversityLoughboroughU.K
  3. 3.Computer Science DepartmentHelwan UniversityCairoEgypt

Personalised recommendations