Advertisement

Circuits, Systems, and Signal Processing

, Volume 35, Issue 4, pp 1313–1331 | Cite as

Visualization of Babble–Speech Interactions Using Andrews Curves

  • Jamin Atkins
  • Davinder Pal SharmaEmail author
Article

Abstract

Visualizing multidimensional data such as the mel frequency cepstral coefficients (MFCCs) proves difficult, especially when the number of dimensions is greater than 3. As a result, it becomes extremely difficult to spot trends in high-dimensional signal interactions. Andrews curves seam to aid in the process of performing graphical analysis of high-dimensional data. This study examines the properties of the babble in the feature domain as well as the effect of the babble noise on the MFCCs of clean speech. Experiments have been conducted using two babble models: the overlapping conversation model and the overlapping speaker model. The purpose of this paper was to provide an insight into the effect of the babble noise on the first thirteen MFCCs of clean speech through the use of Andrews curves. The investigations of this paper give a visual comparison of the signals to expose trends, which the conventional visualization methods do not. The use of Andrews curves not only allows the signal to be observed, but also allows for a statistical comparisons between signals. With a better understanding of the difference between the models, it would be possible to develop systems, which are more robust in babble-corrupted environment.

Keywords

Babble Noise MFCCs Andrews curves Speech recognition 

Notes

Acknowledgments

Authors are thankful to The University of the West Indies for providing necessary funding through Grant No. CRP.4.MAR11.4 to carry out research on the project “Development of Algorithms and Systems for Robust speech Recognition in the Noisy Environments.”

References

  1. 1.
    D. Andrews, Plots of high dimensional data. Biometrics 28, 125–136 (1972)CrossRefGoogle Scholar
  2. 2.
    B. Arons, A review of the cocktail party effect. Tech. Rep., MIT Media Labs (2000)Google Scholar
  3. 3.
    B.J. Borgstrom, A. Alwan, Utilizing compressibility in reconstructing spectrographic data, with applications to noise robust asr. IEEE Signal Process. Lett. 16(5), 398–401 (2009)CrossRefGoogle Scholar
  4. 4.
    B.J. Borgstrom, A. Alwan, A statistical approach to mel-domain mask estimation for missing-feature asr. IEEE Signal Process. Lett. 17(11), 941–944 (2010)CrossRefGoogle Scholar
  5. 5.
    S. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Proces. 28, 357–366 (1980)CrossRefGoogle Scholar
  6. 6.
    A. Dev, P. Bansal, Robust features for noisy speech recognition using mfcc computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010)Google Scholar
  7. 7.
    J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, Timit acoustic–phonetic continuous speech corpus LDC93S1, Web Download (Linguistic Data Consortium, Philadelphia, 1993)Google Scholar
  8. 8.
    J.J. Godfrey, E. Holliman, Switchboard credit card LDC93S8, Web Download (Linguistic Data Consortium, Philadelphia, 1993)Google Scholar
  9. 9.
    P. Hix, S.A. Zahorian, M. Fansheng, Novel feature extraction for noise robust ASR using the aurora 2 database, in 2006 IEEE International Conference on, Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 1 (2006), pp. 1–7Google Scholar
  10. 10.
    W. Jian, J. Droppo, D. Li, A. Acero, A noise-robust asr front-end using wiener filter constructed from mmse estimation of clean speech and noise, in 2003 IEEE Workshop on, Automatic Speech Recognition and Understanding, 2003. ASRU ’03, pp. 321–326 (2003)Google Scholar
  11. 11.
    G. Kim, P. Loizou, Improving speech intelligibility in noise using a binary mask that is based on continue spectrum constraints. IEEE Signal Proces. Lett. 17, 1010–1013 (2010)CrossRefGoogle Scholar
  12. 12.
    N. Krishnamurthy, J. Hansen, Babble noise: modeling, analysis and applications. IEEE Trans. Audio Speech Lang. Proces. 17(7), 1394–1407 (2009)CrossRefGoogle Scholar
  13. 13.
    H. Lane, B. Tranel, The lombard sign and the role of hearing in speech. J. Speech Lang. Hear. Res. 14, 677–709 (1971)CrossRefGoogle Scholar
  14. 14.
    H. Lane, B. Tranel, The lombard reflex and its role on human listeners and automatic speech recognizers. Acoust. Soc. Am. 93, 510–524 (1993)CrossRefGoogle Scholar
  15. 15.
    P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r. Bioinformatics 24(5), 543–565 (2008)CrossRefGoogle Scholar
  16. 16.
    H. Liang, N. Malik, Reducing cocktail party noise by adaptive array filtering, in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’87, (1987), pp. 185–188Google Scholar
  17. 17.
    P. Loizou, Speech Enhancement Theory and Practice (CRC Press Taylor and Francis, Boca Raton, 2007)Google Scholar
  18. 18.
    M.D. Maeva Garnier, The lombard effect: a physiological reflex or a controlled intelligibility enhancement, in 7th International Seminar on Speech Production (2006), pp. 255–262Google Scholar
  19. 19.
    O.M. Mitchell, C.A. Ross, G.H. Yates, Signal processing for a cocktail party effect. J. Acoust. Soc. Am. 50, 656–660 (1971)CrossRefGoogle Scholar
  20. 20.
    N. Mohammadiha, A. Leijon, Nonnegative hmm for babble noise derived from speech HMM: application to speech enhancement. IEEE Trans. Audio Speech Lang. Process. 21(5), 998–1011 (2013)CrossRefGoogle Scholar
  21. 21.
    N. Morgan, H. Hermansky, Rasta: extensions: robustness to additive and convolutional noise, in ESCS Workshop on Speech Processing in Adverse Conditions (1992)Google Scholar
  22. 22.
    C. Neves, A. Veiga, L. Sa, F. Perdigao, Efficient noise-robust speech recognition front-end based on etsi standard. ICSP (2008), pp. 609–612Google Scholar
  23. 23.
    C. Pal, B. Frey, T. Kristjansson, Noise robust speech recognition using gaussian basis functions for non-linear likelihood function approximation, in 2002 IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (2002), pp. I–405–I–408Google Scholar
  24. 24.
    L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice Hall International, Englewood Cliffs, 1993)zbMATHGoogle Scholar
  25. 25.
    L.R. Rabiner, R.W. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)Google Scholar
  26. 26.
    A. Ragni, M.J.F. Gales, Derivative kernels for noise robust ASR, in 2011 IEEE Workshop on, Automatic Speech Recognition and Understanding (ASRU) (2011), pp. 119–124Google Scholar
  27. 27.
    D. Sarkar, Lattice Multivariate Data Visualization with R (Springer, Berlin, 2008)zbMATHGoogle Scholar
  28. 28.
    M. Shahidullah, S. Goutam, Design, analysis and experimental evaluation of block based transformation in mfcc computation for speaker recognition. Speech Commun. 54(4), 719–720 (2008)Google Scholar
  29. 29.
    D.P. Sharma, J.M. Atkins, FPGA-based embedded solution for automatic speech recognition, in The Second Industrial Engineering and Management Conference on Fostering Engineering Networking, Collaboration and Competence. University of the West Indies (2010), pp. 146–152Google Scholar
  30. 30.
    D. Sharma, J. Atkins, Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng, 7(4), 220–234 (2014)CrossRefGoogle Scholar
  31. 31.
    Z. Shi-Xiong, A. Ragni, M.J.F. Gales, Structured log linear models for noise robust speech recognition. IEEE Signal Proces. Lett. 17(11), 945–948 (2010)CrossRefGoogle Scholar
  32. 32.
    M.D. Skowronski, J.G. Harris, Noise-robust automatic speech recognition using a predictive echo state network. IEEE Trans. Audio Speech Lang. Proces. 15(5), 1724–1730 (2007)CrossRefzbMATHGoogle Scholar
  33. 33.
    M. Slaney, Auditory Toolbox—Version 2 Technical Report #1998-010 (Interval Research Corporation, CA, 2010)Google Scholar
  34. 34.
    E. Wegman, Hyperdimensional data analysis using parallel coordinates. J. Am. Stat. Assoc. 85, 664–675 (1990)CrossRefGoogle Scholar
  35. 35.
    L. Weifeng, W. Longbiao, Z. Yicong, H. Bourlard, L. Qingmin, Robust log-energy estimation and its dynamic change enhancement for in-car speech recognition. IEEE Trans. Audio Speech Lang. Proces. 21(8), 1689–1698 (2013)CrossRefGoogle Scholar
  36. 36.
    F. Weninger, M. Wollmer, J. Geiger, B. Schuller, J.F. Gemmeke, A. Hurmalainen, T. Virtanen, G. Rigoll, Non-negative matrix factorization for highly noise-robust asr: to enhance or to recognize? in 2012 IEEE International Conference on, Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 4681–4684Google Scholar
  37. 37.
    P.C. Wong, R.D. Bergeron, Thirty Years of Multidimensional Multivariate Visualization (IEEE Computer Society Press, Washington DC, 1997)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of PhysicsThe University of the West IndiesSt. AugustineTrinidad and Tobago

Personalised recommendations