Advertisement

Posterior-Based Features and Distances in Template Matching for Speech Recognition

  • Guillermo Aradilla
  • Hervé Bourlard
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4892)

Abstract

The use of large speech corpora in example-based approaches for speech recognition is mainly focused on increasing the number of examples. This strategy presents some difficulties because databases may not provide enough examples for some rare words. In this paper we present a different method to incorporate the information contained in such corpora in these example-based systems. A multilayer perceptron is trained on these databases to estimate speaker and task-independent phoneme posterior probabilities, which are used as speech features. By reducing the variability of features, fewer examples are needed to properly characterize a word. In this way, performance can be highly improved when limited number of examples is available. Moreover, we also study posterior-based local distances, these result more effective than traditional Euclidean distance. Experiments on Phonebook database support the idea that posterior features with a proper local distance can yield competitive results.

Keywords

Speech Recognition Template Matching Posterior Features KL-divergence Bhattacharyya Multi-Layer Perceptron 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wachter, M.D., Demuynck, K., Compernolle, D.V., Wambacq, P.: Data Driven Example Based Continuous Speech Recognition. In: Proceedings of Eurospeech, pp. 1133–1136 (2003)Google Scholar
  2. 2.
    Aradilla, G., Vepa, J., Bourlard, H.: Improving Speech Recognition Using a Data-Driven Approach. In: Proceedings of Interspeech, pp. 3333–3336 (2005)Google Scholar
  3. 3.
    Axelrod, S., Maison, B.: Combination of Hidden Markov Models with Dynamic Time Warping for Speech Recognition. In: ICASSP 2004. Proceedings of International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 173–176 (2004)Google Scholar
  4. 4.
    Q. Zhu, B.C., Morgan, N., Stolcke, A.: On Using MLP features in LVCSR. Proceedings of International Conference on Spoken Language Processing (ICSLP) (2004)Google Scholar
  5. 5.
    Hermansky, H., Ellis, D., Sharma, S.: Tandem Connectionist Feature Extraction for Conventional HMM Systems. In: ICASSP 2000. Proceedings of International Conference on Acoustics, Speech and Signal Processing (2000)Google Scholar
  6. 6.
    Aradilla, G., Vepa, J., Bourlard, H.: Using Posterior-Based Features in Template Matching for Speech Recognition. In: ICSLP 2006. Proceedings of International Conference on Spoken Language Processing (2006)Google Scholar
  7. 7.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
  8. 8.
    Aradilla, G., Vepa, J., Bourlard, H.: Using Pitch as Prior Knowledge in Template-Based Speech Recognition. In: ICASSP 2006. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (2006)Google Scholar
  9. 9.
    Niyogi, P., Sondhi, M.M.: Detecting Stop Consonants in Continuous Speech. The Journal of the Acoustic Society of America 111(2), 1063–1076 (2002)CrossRefGoogle Scholar
  10. 10.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio, Speech and Signal Processing 28, 357–366 (1980)CrossRefGoogle Scholar
  11. 11.
    Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustic Society of America 87 (1990)Google Scholar
  12. 12.
    Wachter, M.D., Demuynck, K., Wambacq, P., Compernolle, D.V.: A Locally Weighted Distance Measure For Example Based Speech Recognition. In: ICASSP 2004. Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 181–184 (2004)Google Scholar
  13. 13.
    Matton, M., Wachter, M.D., Compernolle, D.V., Cools, R.: A Discriminative Locally Weighted Distance Measure for Speaker Independent Template Based Speech Recognition. In: ICSLP 2004. Proceedings of International Conference on Spoken Language Processing (2004)Google Scholar
  14. 14.
    Cover, T.M., Thomas, J.A.: Information Theory. John Wiley, Chichester (1991)MATHGoogle Scholar
  15. 15.
    Bhattacharyya, A.: On a Measure of Divergence between Two Statistical Populations Defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)MATHMathSciNetGoogle Scholar
  16. 16.
    Fukunaga, K.: Introduction to Statistical Pattern Recogntion. Academic Press, London (1990)Google Scholar
  17. 17.
    Mak, B., Barnard, E.: Phone Clustering Using the Bhattacharyya Distance. In: ICSLP 1996. Proceedings of International Conference on Spoken Language Processing, pp. 2005–2008 (1996)Google Scholar
  18. 18.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Chichester (2001)MATHGoogle Scholar
  19. 19.
    Hermansky, H., Fousek, P.: Multi-Resolution RASTA Filtering for TANDEM-based ASR. In: Proceedings of Interspeech (2005)Google Scholar
  20. 20.
    Bourlard, H., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach, vol. 247. Kluwer Academic Publishers, Boston (1993)Google Scholar
  21. 21.
    Dupont, S., Bourlard, H., Deroo, O., Fontaine, V., Boite, J.M.: Hybrid HMM/ANN Systems for Training Independent Tasks: Experiments on Phonebook and Related Improvements. In: ICASSP 1997. Proceedings of International Conference on Acoustics, Speech and Signal Processing (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Guillermo Aradilla
    • 1
  • Hervé Bourlard
    • 1
  1. 1.IDIAP Research InstituteMartignySwitzerland

Personalised recommendations