Mel scaled M-band wavelet filter bank for speech recognition
- 17 Downloads
A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.
KeywordsM-band wavelet Dyadic MFCC Filter bank and feature extraction
The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.
- Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.Google Scholar
- Bhati, D., Sharma, M., Pachori, R. B., & Gadre, V. M. (2017). Time–frequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing, 62, 259–273. https://doi.org/10.1016/J.DSP.2016.12.004.CrossRefGoogle Scholar
- Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014a). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399. https://doi.org/10.1007/s10772-014-9236-6.CrossRefGoogle Scholar
- Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).Google Scholar
- Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).Google Scholar
- Kim, C., & Stern, R. M. (2012). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4101–4104). IEEE. https://doi.org/10.1109/ICASSP.2012.6288820.
- Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.Google Scholar
- Long, C., & Datta, S. (1996a). Wavelet based feature extraction for phoneme recognition. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
- Long, C. J. J., & Datta, S. (1996b). Wavelet based feature extraction for phoneme recognition. In ICSLP 96: Fourth international conference on spoken language (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
- Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.Google Scholar
- Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).Google Scholar
- Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3.CrossRefGoogle Scholar