Skip to main content
Log in

Speech frame recognition based on less shift sensitive wavelet filter banks

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The wavelet transform possesses multi-resolution property and high localization performance; hence, it can be optimized for speech recognition. In our previous work, we show that redundant wavelet filter bank parameters work better in speech recognition task, because they are much less shift sensitive than those of critically sampled discrete wavelet transform (DWT). In this paper, three types of wavelet representations are introduced, including features based on dual-tree complex wavelet transform (DT-CWT), perceptual dual-tree complex wavelet transform, and four-channel double-density discrete wavelet transform (FCDDDWT). Then, appropriate filter values for DT-CWT and FCDDDWT are proposed. The performances of the proposed wavelet representations are compared in a phoneme recognition task using special form of the time-delay neural networks. Performance evaluations confirm that dual-tree complex wavelet filter banks outperform conventional DWT in speech recognition systems. The proposed perceptual dual-tree complex wavelet filter bank results in up to approximately 9.82 % recognition rate increase, compared to the critically sampled two-channel wavelet filter bank.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Rahiminejad, M.: Improvement on Representation Parameters Extraction Methods in Speech Recognition Systems. M.Sc. Thesis. Department of Biomedical Engineering, Amirkabir University of Technology, Tehran (in Persian), (2002)

  2. Tohidypour, H.R., Seyyedsalahi, S.A., Behbood, H., Roshandel, H.: A new representation for speech frame recognition based on redundant wavelet filter banks. Speech Commun. 54(2), 256–271 (2012)

    Article  Google Scholar 

  3. Tohidypour, H.R., Seyyedsalehi, S.A., Roshandel, H., Behbood, H.: Speech recognition using three channel redundant wavelet filterbank. In: 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), vol. 2, pp. 325–328, Wuhan, China, May (2010)

  4. Erzin, E., Cetin, A.E., Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 417–420, Detroit (1995)

  5. Sarikaya, R., Pellom, B.L., Hansen, J.H.: Wavelet packet transform features with application to speaker identification. In: Proceedings of IEEE Nordic Signal Processing Symp (NORSIG’98), pp. 81–84 (1998)

  6. Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 569–572 (1998)

  7. Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)

    Article  Google Scholar 

  8. Farooq, O., Datta, S.: Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process. Lett. 8(7), 196–198 (2001)

    Article  Google Scholar 

  9. Tufekci, Z., Gowdy, J.N., Gurbuz, S., Patterson, E.: Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise—robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)

    Article  Google Scholar 

  10. Gowdy, J.N., Tufekci, Z.: Mel-scaled discrete wavelet coefficients for speech recognition. In: Proceedings of ICASSP, vol. 3, pp. 1351–1354, Istanbul (2000)

  11. Pinter, I.: Perceptual wavelet-representation of speech signals and its application to speech enhancement. Comput. Speech Lang. 10(1), 1–22 (1996)

    Article  Google Scholar 

  12. Xun, S., Du, L., Howng, W.: Wavelet linear prediction vocoder based on auditory model. In: Proceedings of ICSP ’98, Fourth International Conference on Signal Processing Proceedings, vol. 1, pp. 595–598, Beijing (1998)

  13. Zhang, X., Bai, J., Liang, W.: The speech recognition based on the bark wavelet and CZCPA features. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 318–321, Beijing (2006)

  14. Tohidypour, H.R., Seyyedsalehi, S.A., Behbood, H.: Comparison between wavelet packet transform, Bark Wavelet and MFCC for robust speech recognition tasks. In: Proceedings of International Conference on Industrial Mechatronics and Automation, pp. 329–332, Wuhan, China, May (2010)

  15. Abdelnour, A.F., Selesnick, I.W.: Symmetric nearly shift invariant tight frame wavelets. IEEE Trans. Signal Process. 53(1), 231–239 (2005)

    Article  MathSciNet  Google Scholar 

  16. Selesnick, I.W.: A higher-density discrete wavelet transform. IEEE Trans. Signal Process. 54(8), 3039–3048 (2006)

    Article  Google Scholar 

  17. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform. IEEE Signal Process. Mag. 22(6), 123–151 (2005)

    Article  Google Scholar 

  18. Selesnick, I.W.: The design of Hilbert transform pairs of wavelet bases via the flat delay filter. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 6, pp. 3673–3676, Salt Lake City (2001)

  19. Shao, Y., Chang, C.H.: A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Trans. Syst. Man Cybern. Part B 37(4), 877–889 (2007)

    Article  Google Scholar 

  20. Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebyani, M.: FARSDAT—the speech database of farsi spoken language. In: Proceedings of Speech Science and Technology Conference, pp. 826–831, Perth (1994)

  21. Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, vol. 1, pp. 532–535, Glasgow (1989)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Reza Tohidypour.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tohidypour, H.R., Banitalebi-Dehkordi, A. Speech frame recognition based on less shift sensitive wavelet filter banks. SIViP 10, 633–637 (2016). https://doi.org/10.1007/s11760-015-0787-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-015-0787-z

Keywords

Navigation