Advertisement

Cognitive Computation

, Volume 5, Issue 4, pp 545–550 | Cite as

Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

  • Aitzol Ezeiza
  • Karmele López de Ipiña
  • Carmen Hernández
  • Nora Barroso
Article

Abstract

Mel frequency cepstral coefficients (MFCCs) are a standard tool for automatic speech recognition (ASR), but they fail to capture part of the dynamics of speech. The nonlinear nature of speech suggests that extra information provided by some nonlinear features could be especially useful when training data are scarce or when the ASR task is very complex. In this paper, the Fractal Dimension of the observed time series is combined with the traditional MFCCs in the feature vector in order to enhance the performance of two different ASR systems. The first is a simple system of digit recognition in Chinese, with very few training examples, and the second is a large vocabulary ASR system for Broadcast News in Spanish.

Keywords

Nonlinear speech processing Automatic speech recognition Mel frequency cepstral coefficients Fractal dimensions 

Notes

Acknowledgments

The authors thank Roger Jang and Infozazpi irratia for providing the basic resources for this work.

References

  1. 1.
    Solé-Casals J, Zaiats V, Monte-Moreno E. Non-linear and non-conventional speech processing: alternative techniques. Cogn Comput. 2010;2:133–4.CrossRefGoogle Scholar
  2. 2.
    Teager HM, Teager SM. Evidence for nonlinear sound production mechanisms in the vocal tract. Speech production and speech modelling. In: NATO Advanced Study Institute Series D, vol 55, Bonas, France. 1989.Google Scholar
  3. 3.
    Barroso N, López de Ipiña K, Ezeiza A. Acoustic phonetic decoding oriented to multilingual speech recognition in the basque context. Adv Intell Soft Comput. 2010;71:697–704. doi: 10.1007/978-3-642-12433-4_82.CrossRefGoogle Scholar
  4. 4.
    Faúndez M, Kubin G, Kleijn WB, Maragos P, McLaughlin S, Esposito A, et al. Nonlinear speech processing: overview and applications. Int J Contr Intell Syst. 2002;30(1):1–10.Google Scholar
  5. 5.
    Pitsikalis V, Maragos P. Analysis and classification of speech signals by generalized fractal dimension features. Speech Commun. 2009;51(12):1206–23.CrossRefGoogle Scholar
  6. 6.
    Indrebo KM, Povinelli RJ, Johnson MT. Third-order moments of filtered speech signals for robust speech recognition. In: Proceedings of NOLISP’2005; 2005.Google Scholar
  7. 7.
    Shekofteh Y, Almasganj F. Using phase space based processing to extract proper features for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications; 2010.Google Scholar
  8. 8.
    Pickover CA, Khorasani A. Fractal characterization of speech waveform graphs. Comput Graph. 1986;10(1):51–61. doi: 10.1016/0097-8493(86)90068-3.CrossRefGoogle Scholar
  9. 9.
    Martinez F, Guillamon A, Martinez JJ. Vowel and consonant characterization using fractal dimension in natural speech. In: Proceedings of NOLISP’2003; 2003.Google Scholar
  10. 10.
    Langi A, Kinsner W. Consonant characterization using correlation fractal dimension for speech recognition. In: Proceedings of WESCANEX 95. Communications, Power, and Computing. Conference Proceedings. IEEE; 1995; doi:  10.1109/WESCAN.1995.493972.
  11. 11.
    Nelwamondo FV, Mahola U, Marwola T. Multi-scale fractal dimension for speaker identification systems. WSEAS Trans Syst. 2006;5(5):1152–7.Google Scholar
  12. 12.
    Li Y, Fan Y, Tong Q. Endpoint detection in noisy environment using complexity measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China; 2007.Google Scholar
  13. 13.
    Chen X, Zhao H. Fractal Characteristic-based endpoint detection for whispered speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal; 2006.Google Scholar
  14. 14.
    Maragos P. Fractal aspects of speech signals: dimension and interpolation, Proceedings of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP-91), Toronto, Canada; 1991. p. 417–420.Google Scholar
  15. 15.
    Maragos P, Potamianos A. Fractal dimensions of speech sounds: computation and application to automatic speech recognition. J Acoust Soc Am. 1999;105(3):1925–32.PubMedCrossRefGoogle Scholar
  16. 16.
    Pitsikalis V, Kokkinos I, Maragos P. Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents. In: Proceedings of Interspeech`2002, Santorini, Greece; 2002.Google Scholar
  17. 17.
    Pitsikalis V, Maragos P. Filtered dynamics and fractal dimensions for noisy speech recognition. IEEE Sig Process Lett. 2006;13(11):711–4.CrossRefGoogle Scholar
  18. 18.
    Higuchi T. Approach to an irregular time series on the basis of the fractal theory. Physica D. 1988;31277:283.Google Scholar
  19. 19.
    Katz MJ. Fractals and the analysis of waveforms. Comput Biol Med. 1988;18(3):145–56.PubMedCrossRefGoogle Scholar
  20. 20.
    Castiglioni P. What is wrong in Katz’s method? Comments on: “a note on fractal dimensions of biomedical waveforms”. Comput Biol Med. 2010;40:11–2.Google Scholar
  21. 21.
    Tsonis AA. Reconstructing dynamics from observables: the issue of the delay parameter revisited. Int J Bifurcat Chaos. 2007;17:4229–43.CrossRefGoogle Scholar
  22. 22.
    Jang JSR. Audio signal processing and recognition. In: Roger Jang’s Homepage. 2011. http://www.cs.nthu.edu.tw/~jang. Accessed 5 Apr 2011.
  23. 23.
    Esteller R, Vachtsevanos G, Echauz J, Litt B. A comparison of waveform fractal dimension algorithms. IEEE Trans Circuits Syst I Fundam Theory Appl. 2001;48(2):177–83.CrossRefGoogle Scholar
  24. 24.
    Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P. The HTK book 3.4. Cambridge: Cambridge University Press; 2006.Google Scholar
  25. 25.
    Barroso N, Lopez de Ipina K, Ezeiza A, Hernandez C, Ezeiza N, Barroso O, et al. GorUp: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech’2011, Firenze; 2011.Google Scholar
  26. 26.
    Barroso N, Lopez de Ipina K, Hernandez C, Ezeiza A, and Graña M. Experiments for the selection of sub-word units in the Basque context for semantic tasks. Int J Speech Technol. 2012;15(1):49–56. doi: 10.1007/s10772-011-9109-1.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Aitzol Ezeiza
    • 1
  • Karmele López de Ipiña
    • 1
  • Carmen Hernández
    • 2
  • Nora Barroso
    • 1
  1. 1.Department of Systems Engineering and AutomationUniversity of the Basque Country UPV/EHUDonostiaSpain
  2. 2.Department of Computer Science and Artificial IntelligenceUniversity of the Basque Country UPV/EHUDonostiaSpain

Personalised recommendations