Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations

Müller, Florian; Mertins, Alfred

doi:10.1007/978-3-642-11509-7_15

Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations

Florian Müller²¹ &
Alfred Mertins²¹

Conference paper

560 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Abstract

The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today’s speaker-independent automatic speech recognition (ASR) systems compared to speaker-dependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformations that originally have been proposed in the field of pattern recognition are investigated for their applicability in speaker-independent ASR tasks. It is shown that the combination of different types of such transformations leads to features that are more robust against VTL changes than the standard mel-frequency cepstral coefficients and that they almost yield the performance of vocal tract length normalization without any adaption to individual speakers.

This work has been supported by the German Research Foundation under Grant No. ME1170/2-1.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benzeghiba, M., Mori, R.D., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., Wellekens, C.: Automatic speech recognition and speech variability: a review. Speech Communication 49(10-11), 763–786 (2007)
Article Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12(2), 75–98 (1998)
Article Google Scholar
Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral space. IEEE Trans. Speech and Audio Processing 13(5 Part 2), 930–944 (2005) (ausgedruckt)
Article Google Scholar
Welling, L., Ney, H., Kanthak, S.: Speaker adaptive modeling by vocal tract normalization. IEEE Trans. Speech and Audio Processing 10(6), 415–426 (2002)
Article Google Scholar
Lee, L., Rose, R.C.: A frequency warping approach to speaker normalization. IEEE Trans. Speech and Audio Processing 6(1), 49–60 (1998)
Article Google Scholar
Umesh, S., Cohen, L., Marinovic, N., Nelson, D.J.: Scale transform in speech analysis. IEEE Trans. Speech and Audio Processing 7, 40–45 (1999)
Article Google Scholar
Mertins, A., Rademacher, J.: Frequency-warping invariant features for automatic speech recognition. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, Toulouse, France, May 2006, vol. V, pp. 1025–1028 (2006)
Google Scholar
Rademacher, J., Wächter, M., Mertins, A.: Improved warping-invariant features for automatic speech recognition. In: Proc. Int. Conf. Spoken Language Processing (Interspeech 2006 - ICSLP), Pittsburgh, PA, USA, September 2006, pp. 1499–1502 (2006)
Google Scholar
Monaghan, J.J., Feldbauer, C., Walters, T.C., Patterson, R.D.: Low-dimensional, auditory feature vectors that improve vocal-tract-length normalization in automatic speech recognition. The Journal of the Acoustical Society of America 123(5), 3066–3066 (2008)
Article Google Scholar
Burkhardt, H., Siggelkow, S.: Invariant features in pattern recognition – fundamentals and applications. In: Nonlinear Model-Based Image/Video Processing and Analysis, pp. 269–307. John Wiley & Sons, Chichester (2001)
Google Scholar
Wagh, M., Kanetkar, S.: A class of translation invariant transforms. IEEE Trans. Acoustics, Speech, and Signal Processing 25(2), 203–205 (1977)
Article MATH Google Scholar
Burkhardt, H., Müller, X.: On invariant sets of a certain class of fast translation-invariant transforms. IEEE Trans. Acoustic, Speech, and Signal Processing 28(5), 517–523 (1980)
Article MATH Google Scholar
Fang, M., Häusler, G.: Modified rapid transform. Applied Optics 28(6), 1257–1262 (1989)
Article Google Scholar
Reitboeck, H., Brody, T.P.: A transformation with invariance under cyclic permutation for applications in pattern recognition. Inf. & Control. 15, 130–154 (1969)
Article MATH MathSciNet Google Scholar
Wang, P.P., Shiau, R.C.: Machine recognition of printed chinese characters via transformation algorithms. Pattern Recognition 5(4), 303–321 (1973)
Article Google Scholar
Gamec, J., Turan, J.: Use of Invertible Rapid Transform in Motion Analysis. Radioengineering 5(4), 21–27 (1996)
Google Scholar
Pinkowski, B.: Multiscale fourier descriptors for classifying semivowels in spectrograms. Pattern Recognition 26(10), 1593–1602 (1993)
Article Google Scholar
Stemmer, G., Hacker, C., Noth, E., Niemann, H.: Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients. In: IEEE Workshop on Automatic Speech Recognition and Understanding, December 2001, pp. 37–40 (2001)
Google Scholar
Mesgarani, N., Shamma, S., Slaney, M.: Speech discrimination based on multiscale spectro-temporal modulations. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 2004, vol. 1, pp. I-601–I-604 (2004)
Google Scholar
Zhang, Y., Zhou, J.: Audio segmentation based on multi-scale audio classification. In: IEEE Int. Con. Acoustics, Speech, and Signal Processing, May 2004, vol. 4, pp. iv-349–iv-352 (2004)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoustics, Speech and Signal Processing 37(11), 1641–1648 (1989)
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Patterson, R.D.: Auditory images: How complex sounds are represented in the auditory system. Journal-Acoustical Society of Japan (E) 21(4), 183–190 (2000)
Article MathSciNet Google Scholar
Bacon, S., Fay, R., Popper, A.: Compression: from cochlea to cochlear implants. Springer, Heidelberg (2004)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Signal Processing, University of Lübeck, 23538, Lübeck, Germany
Florian Müller & Alfred Mertins

Authors

Florian Müller
View author publications
You can also search for this author in PubMed Google Scholar
Alfred Mertins
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Escola Politecnica Superior, Universidat de Vic, c/. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Jordi Solé-Casals
Department of Computer Science, Escola Politecnica Superior, Universitat de Vic, c./. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Vladimir Zaiats

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, F., Mertins, A. (2010). Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-11509-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics