Speech frame recognition based on less shift sensitive wavelet filter banks

Tohidypour, Hamid Reza; Banitalebi-Dehkordi, Amin

doi:10.1007/s11760-015-0787-z

Speech frame recognition based on less shift sensitive wavelet filter banks

Original Paper
Published: 10 June 2015

Volume 10, pages 633–637, (2016)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Hamid Reza Tohidypour¹ &
Amin Banitalebi-Dehkordi¹

239 Accesses
3 Citations
Explore all metrics

Abstract

The wavelet transform possesses multi-resolution property and high localization performance; hence, it can be optimized for speech recognition. In our previous work, we show that redundant wavelet filter bank parameters work better in speech recognition task, because they are much less shift sensitive than those of critically sampled discrete wavelet transform (DWT). In this paper, three types of wavelet representations are introduced, including features based on dual-tree complex wavelet transform (DT-CWT), perceptual dual-tree complex wavelet transform, and four-channel double-density discrete wavelet transform (FCDDDWT). Then, appropriate filter values for DT-CWT and FCDDDWT are proposed. The performances of the proposed wavelet representations are compared in a phoneme recognition task using special form of the time-delay neural networks. Performance evaluations confirm that dual-tree complex wavelet filter banks outperform conventional DWT in speech recognition systems. The proposed perceptual dual-tree complex wavelet filter bank results in up to approximately 9.82 % recognition rate increase, compared to the critically sampled two-channel wavelet filter bank.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Study on processing of wavelet speech denoising in speech recognition system

Article 08 May 2018

Xinmei Zhong, Yunzhong Dai, … Tao Jin

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

Article 04 June 2014

Astik Biswas, P. K. Sahu, … Mahesh Chandra

A Review of Research Progress and Application of Wavelet Neural Networks

References

Rahiminejad, M.: Improvement on Representation Parameters Extraction Methods in Speech Recognition Systems. M.Sc. Thesis. Department of Biomedical Engineering, Amirkabir University of Technology, Tehran (in Persian), (2002)
Tohidypour, H.R., Seyyedsalahi, S.A., Behbood, H., Roshandel, H.: A new representation for speech frame recognition based on redundant wavelet filter banks. Speech Commun. 54(2), 256–271 (2012)
Article Google Scholar
Tohidypour, H.R., Seyyedsalehi, S.A., Roshandel, H., Behbood, H.: Speech recognition using three channel redundant wavelet filterbank. In: 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), vol. 2, pp. 325–328, Wuhan, China, May (2010)
Erzin, E., Cetin, A.E., Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 417–420, Detroit (1995)
Sarikaya, R., Pellom, B.L., Hansen, J.H.: Wavelet packet transform features with application to speaker identification. In: Proceedings of IEEE Nordic Signal Processing Symp (NORSIG’98), pp. 81–84 (1998)
Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 569–572 (1998)
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)
Article Google Scholar
Farooq, O., Datta, S.: Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process. Lett. 8(7), 196–198 (2001)
Article Google Scholar
Tufekci, Z., Gowdy, J.N., Gurbuz, S., Patterson, E.: Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise—robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)
Article Google Scholar
Gowdy, J.N., Tufekci, Z.: Mel-scaled discrete wavelet coefficients for speech recognition. In: Proceedings of ICASSP, vol. 3, pp. 1351–1354, Istanbul (2000)
Pinter, I.: Perceptual wavelet-representation of speech signals and its application to speech enhancement. Comput. Speech Lang. 10(1), 1–22 (1996)
Article Google Scholar
Xun, S., Du, L., Howng, W.: Wavelet linear prediction vocoder based on auditory model. In: Proceedings of ICSP ’98, Fourth International Conference on Signal Processing Proceedings, vol. 1, pp. 595–598, Beijing (1998)
Zhang, X., Bai, J., Liang, W.: The speech recognition based on the bark wavelet and CZCPA features. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 318–321, Beijing (2006)
Tohidypour, H.R., Seyyedsalehi, S.A., Behbood, H.: Comparison between wavelet packet transform, Bark Wavelet and MFCC for robust speech recognition tasks. In: Proceedings of International Conference on Industrial Mechatronics and Automation, pp. 329–332, Wuhan, China, May (2010)
Abdelnour, A.F., Selesnick, I.W.: Symmetric nearly shift invariant tight frame wavelets. IEEE Trans. Signal Process. 53(1), 231–239 (2005)
Article MathSciNet Google Scholar
Selesnick, I.W.: A higher-density discrete wavelet transform. IEEE Trans. Signal Process. 54(8), 3039–3048 (2006)
Article Google Scholar
Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform. IEEE Signal Process. Mag. 22(6), 123–151 (2005)
Article Google Scholar
Selesnick, I.W.: The design of Hilbert transform pairs of wavelet bases via the flat delay filter. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 6, pp. 3673–3676, Salt Lake City (2001)
Shao, Y., Chang, C.H.: A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Trans. Syst. Man Cybern. Part B 37(4), 877–889 (2007)
Article Google Scholar
Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebyani, M.: FARSDAT—the speech database of farsi spoken language. In: Proceedings of Speech Science and Technology Conference, pp. 826–831, Perth (1994)
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, vol. 1, pp. 532–535, Glasgow (1989)

Download references

Author information

Authors and Affiliations

Digital Multimedia Lab, Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
Hamid Reza Tohidypour & Amin Banitalebi-Dehkordi

Authors

Hamid Reza Tohidypour
View author publications
You can also search for this author in PubMed Google Scholar
Amin Banitalebi-Dehkordi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Reza Tohidypour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tohidypour, H.R., Banitalebi-Dehkordi, A. Speech frame recognition based on less shift sensitive wavelet filter banks. SIViP 10, 633–637 (2016). https://doi.org/10.1007/s11760-015-0787-z

Download citation

Received: 03 August 2014
Revised: 29 May 2015
Accepted: 31 May 2015
Published: 10 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11760-015-0787-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Speech frame recognition based on less shift sensitive wavelet filter banks

Abstract

Access this article

Similar content being viewed by others

Study on processing of wavelet speech denoising in speech recognition system

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

A Review of Research Progress and Application of Wavelet Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech frame recognition based on less shift sensitive wavelet filter banks

Abstract

Access this article

Similar content being viewed by others

Study on processing of wavelet speech denoising in speech recognition system

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

A Review of Research Progress and Application of Wavelet Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation