Advertisement

Mel scaled M-band wavelet filter bank for speech recognition

  • Prashant Upadhyaya
  • Omar Farooq
  • M. R. Abidi
Article
  • 17 Downloads

Abstract

A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.

Keywords

M-band wavelet Dyadic MFCC Filter bank and feature extraction 

Notes

Acknowledgements

The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.

References

  1. Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.Google Scholar
  2. Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87.  https://doi.org/10.1016/S0165-0270(02)00340-0.CrossRefGoogle Scholar
  3. Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.  https://doi.org/10.1007/s11235-011-9623-0.CrossRefGoogle Scholar
  4. Bhati, D., Sharma, M., Pachori, R. B., & Gadre, V. M. (2017). Time–frequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing, 62, 259–273.  https://doi.org/10.1016/J.DSP.2016.12.004.CrossRefGoogle Scholar
  5. Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014a). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399.  https://doi.org/10.1007/s10772-014-9236-6.CrossRefGoogle Scholar
  6. Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2015). Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Computers & Electrical Engineering, 42, 12–22.  https://doi.org/10.1016/J.COMPELECENG.2014.12.017.CrossRefGoogle Scholar
  7. Biswas, A., Sahu, P. K., & Chandra, M. (2014b). Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122.  https://doi.org/10.1016/J.COMPELECENG.2014.01.008.CrossRefGoogle Scholar
  8. Biswas, A., Sahu, P. K., & Chandra, M. (2016). Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Processing, 10(8), 902–911.  https://doi.org/10.1049/iet-spr.2015.0488.CrossRefGoogle Scholar
  9. Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.  https://doi.org/10.1007/s13042-017-0645-0.CrossRefGoogle Scholar
  10. Chiu, C.-C., Chuang, C.-M., & Hsu, C.-Y. (2009). Discrete wavelet transform applied on personal identity verification with ECG signal. International Journal of Wavelets, Multiresolution and Information Processing, 07(03), 341–355.  https://doi.org/10.1142/S0219691309002957.CrossRefMATHGoogle Scholar
  11. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.  https://doi.org/10.1109/TASSP.1980.1163420.CrossRefGoogle Scholar
  12. Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198.  https://doi.org/10.1109/97.928676.CrossRefGoogle Scholar
  13. Farooq, O., & Datta, S. (2003). Wavelet-based denoising for robust feature extraction for speech recognition. Electronics Letters, 39(1), 163–165.  https://doi.org/10.1049/el:20030068.CrossRefGoogle Scholar
  14. Farooq, O., & Datta, S. (2005). Wavelet based robust sub-band features for phoneme recognition. Chinese Journal of Electronics, 14(1), 115–118.  https://doi.org/10.1049/ip-vis.Google Scholar
  15. Farooq, O., Datta, S., & Shrotriya, M. C. (2010). Wavelet sub-band based temporal features for robust Hindi phoneme recognition. International Journal of Wavelets, Multiresolution and Information Processing, 08(06), 847–859.  https://doi.org/10.1142/S0219691310003845.CrossRefGoogle Scholar
  16. Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).Google Scholar
  17. Grigoryan, A. M. (2005). Fourier transform representation by frequency-time wavelets. IEEE Transactions on Signal Processing, 53(7), 2489–2497.  https://doi.org/10.1109/TSP.2005.849180.MathSciNetCrossRefMATHGoogle Scholar
  18. Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).Google Scholar
  19. Kim, C., & Stern, R. M. (2012). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4101–4104). IEEE.  https://doi.org/10.1109/ICASSP.2012.6288820.
  20. Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32.  https://doi.org/10.1504/IJCSYSE.2012.044740.CrossRefGoogle Scholar
  21. Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.  https://doi.org/10.1109/TASLP.2014.2304637.CrossRefGoogle Scholar
  22. Lin, T., Hao, P., & Xu, S. (2006a). Matrix factorizations for reversible integer implementation of orthonormal M-band wavelet transforms. Signal Processing, 86(8), 2085–2093.  https://doi.org/10.1016/J.SIGPRO.2005.10.015.CrossRefMATHGoogle Scholar
  23. Lin, T., Xu, S., Shi, Q., & Hao, P. (2006b). An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation, 172(2), 717–730.  https://doi.org/10.1016/j.amc.2004.11.025.MathSciNetCrossRefMATHGoogle Scholar
  24. Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.Google Scholar
  25. Long, C., & Datta, S. (1996a). Wavelet based feature extraction for phoneme recognition. In Proceeding of fourth international conference on spoken language processing. ICSLP96 (Vol. 1, pp. 264–267). IEEE.  https://doi.org/10.1109/ICSLP.1996.607095.
  26. Long, C. J. J., & Datta, S. (1996b). Wavelet based feature extraction for phoneme recognition. In ICSLP 96: Fourth international conference on spoken language (Vol. 1, pp. 264–267). IEEE.  https://doi.org/10.1109/ICSLP.1996.607095.
  27. Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.Google Scholar
  28. Mishra, A. N., Chandra, M., Biswas, A., & Sharan, S. N. (2013). Hindi phoneme-viseme recognition from continuous speech. International Journal of Signal and Imaging Systems Engineering, 6(3), 164.  https://doi.org/10.1504/IJSISE.2013.054793.CrossRefGoogle Scholar
  29. Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology.  https://doi.org/10.1007/s10772-018-9525-6.Google Scholar
  30. Munoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O(N) complexity. Signal Processing, 82(5), 749–757.  https://doi.org/10.1016/S0165-1684(02)00140-8.CrossRefMATHGoogle Scholar
  31. Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036.  https://doi.org/10.1016/J.ESWA.2007.12.065.CrossRefGoogle Scholar
  32. Pollock, S., & Cascio, IL (2007). Non-dyadic wavelet analysis. In Optimisation, econometric and financial analysis (pp. 167–203). Berlin: Springer.  https://doi.org/10.1007/3-540-36626-1_9.CrossRefGoogle Scholar
  33. Rajoub, B., Alshamali, A., & Al-Fahoum, A. S. (2002). An efficient coding algorithm for the compression of ECG signals using the wavelet transform. IEEE Transactions on Biomedical Engineering, 49(4), 355–362.  https://doi.org/10.1109/10.991163.CrossRefGoogle Scholar
  34. Rioul, O., & Duhamel, P. (1992). Fast algorithms for discrete and continuous wavelet transforms. IEEE Transactions on Information Theory, 38(2), 569–586.  https://doi.org/10.1109/18.119724.MathSciNetCrossRefMATHGoogle Scholar
  35. Rioul, O., & Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, 8(4), 14–38.  https://doi.org/10.1109/79.91217.CrossRefGoogle Scholar
  36. Sanderson, C., & Lovell, B. C. (2009). Multi-region probabilistic histograms for robust and scalable identity inference. In Lecture notes in computer science (Vol. 5558, pp. 199–208). Berlin: Springer.  https://doi.org/10.1007/978-3-642-01793-3_21.Google Scholar
  37. Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.MathSciNetCrossRefMATHGoogle Scholar
  38. Steffen, P., Heller, P. N., Gopinath, R. A., & Burrus, C. S. (1993). Theory of regular M-band wavelet bases. IEEE Transactions on Signal Processing, 41(12), 3497–3511.  https://doi.org/10.1109/78.258088.CrossRefMATHGoogle Scholar
  39. Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197.  https://doi.org/10.1016/J.SIGPRO.2014.06.027.CrossRefGoogle Scholar
  40. Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).Google Scholar
  41. Tian, J., & Wells, R. O. (2000). An algebraic structure of orthogonal wavelet space. Applied and Computational Harmonic Analysis, 8(3), 223–248.  https://doi.org/10.1006/acha.2000.0300.MathSciNetCrossRefMATHGoogle Scholar
  42. Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Comparative study of visual feature for bimodal Hindi speech recognition. Archives of Acoustics, 40(4), 609–619.  https://doi.org/10.1515/aoa-2015-0061.CrossRefGoogle Scholar
  43. Vaidyanathan, P. P. (1990). Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial. Proceedings of the IEEE, 78(1), 56–93.  https://doi.org/10.1109/5.52200.CrossRefGoogle Scholar
  44. Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.CrossRefGoogle Scholar
  45. Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.  https://doi.org/10.1016/0167-6393(93)90095-3.CrossRefGoogle Scholar
  46. Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Transactions on Signal Processing, 40(9), 2207–2232.  https://doi.org/10.1109/78.157221.CrossRefMATHGoogle Scholar
  47. Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.MATHGoogle Scholar
  48. Zao, L., Coelho, R., & Flandrin, P. (2014). Speech enhancement with EMD and hurst-based mode selection. IEEE Transactions on Audio, Speech and Language Processing, 22(5), 899–911.  https://doi.org/10.1109/TASLP.2014.2312541.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electronics EngineeringAligarh Muslim UniversityAligarhIndia

Personalised recommendations