Acoustic Modelling Using Continuous Rational Kernels



Many discriminative classification algorithms are designed for tasks where samples can be represented by fixed-length vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variable-length sequences of vectors. Although several dynamic kernels have been proposed for mapping sequences of discrete observations into fixed-dimensional feature-spaces, few kernels exist for sequences of continuous observations. This paper introduces continuous rational kernels, an extension of standard rational kernels, as a general framework for classifying sequences of continuous observations. In addition to allowing new task-dependent kernels to be defined, continuous rational kernels allow existing continuous dynamic kernels, such as Fisher and generative kernels, to be calculated using standard weighted finite-state transducer algorithms. Preliminary results on both a large vocabulary continuous speech recognition (LVCSR) task and the TIMIT database are presented.


augmented statistical models rational kernels speech recognition TIMIT database 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    L.A. Rabiner, “A Tutorial on Hidden Markov Models and Selective Applications in Speech Recognition,” in Proc. of the IEEE, vol. 77, 1989, pp. 257-286, February.Google Scholar
  2. 2.
    D. Povey, Discriminative Training for Large Vocabulary Speech Recognition, Ph.D. thesis, University of Cambridge, July 2004.Google Scholar
  3. 3.
    V.N. Vapnik, Statistical Learning Theory, Wiley, 1998.Google Scholar
  4. 4.
    H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification Using String Kernels,” J. Mach. Learn. Res., vol. 2, 2002, pp. 419–444.MATHCrossRefGoogle Scholar
  5. 5.
    K. Tsuda, T. Kin, and K. Asai, “Marginalized Kernels for Biological Sequences,” Bioinformatics, vol. 18, 2002, pp. S268–S275.Google Scholar
  6. 6.
    T. Jaakkola and D. Hausser, “Exploiting Generative Models in Disciminative Classifiers,” in Advances in Neural Information Processing Systems 11, S.A. Solla and D.A. Cohn (Eds.), MIT, 1999, pp. 487–493.Google Scholar
  7. 7.
    N. Smith and M. Gales, “Speech Recognition using SVMs,” in Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), MIT, 2002, pp. 1197–1204.Google Scholar
  8. 8.
    C. Cortes, P. Haffner, and M. Mohri, “Positive Definite Rational Kernels,” in 16th Annual Conference on Computational Learning Theory (COLT 2003), Washington DC, August 2003, pp. 656–670.Google Scholar
  9. 9.
    C. Cortes, P. Haffner, and M. Mohri, “Rational Kernels: Theory and Algorithms,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1035–1062.MathSciNetGoogle Scholar
  10. 10.
    M. Mohri, F. Pereira, and M. Riley, “Weighted Finite-state Transducers in Speech Recognition,” Comput. Speech Lang., vol. 16, 2002, pp. 69–88, January.CrossRefGoogle Scholar
  11. 11.
    F.C.N. Pereira and M.D. Riley, “Speech Recognition by Composition of Weighted Finite Automata,” in Finite-State Devices for Natural Language Processing, E. Roche and Y. Schabes (Eds.), MIT, 1997.Google Scholar
  12. 12.
    J.S. Garofolo et al., DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM, 1993.Google Scholar
  13. 13.
    N.D. Smith and M.J.F. Gales, “Using SVMs to Classify Variable Length Speech Patterns,” Tech. Rep. CUED/F-INFENG/TR.412, Department of Engineering, University of Cambridge, April 2002.Google Scholar
  14. 14.
    M.I. Layton, Augmented Statistical Models for Classifying Sequence Data, Ph.D. thesis, University of Cambridge, September 2006.Google Scholar
  15. 15.
    F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychol. Rev., vol. 65, no. 6, 1958, pp. 386–408.CrossRefMathSciNetGoogle Scholar
  16. 16.
    V. Venkataramani, S. Chakrabartty, and W. Byrne, “Support Vector Machines for Segmental Minimum Bayes Risk Decoding of Continuous Speech,” in ASRU 2003, 2003, pp. 13–18.Google Scholar
  17. 17.
    M. Mohri, “Finite-state Transducers in Language and Speech Processing,” Comput. Linguist., vol. 23, no. 2, 1997, pp. 269–311.MathSciNetGoogle Scholar
  18. 18.
    M. Mohri, “Semiring Frameworks and Algorithms for Shortest-distance Problems,” J. Autom. Lang. Comb., vol. 7, 2002, pp. 321–350.MATHMathSciNetGoogle Scholar
  19. 19.
    J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.Google Scholar
  20. 20.
    L. E. Baum and J. A. Eagon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology,” Bull. Am. Math. Soc., vol. 73, 1967, pp. 360–363.MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    L.R. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” in Proc. ICASSP, Tokyo, 1986.Google Scholar
  22. 22.
    O. Cappé, E. Moulines, and T. Rydén, Inference in Hidden Markov Models, Springer, 2005, Springer Series in Statistics.Google Scholar
  23. 23.
    G. Evermann, H.Y. Chan, M.J.F. Gales, B. Jia, D. Mrva, P.C. Woodland, and K. Yu, “Training LVCSR Systems on Thousands of Hours of Data,” in Proc. ICASSP, 2005, pp. 209–212.Google Scholar
  24. 24.
    L. Mangu, E. Brill, and A. Stolcke, “Finding Consensus among Words: Lattice-based Word Error Minimization,” in Proc. Eurospeech, 1999, pp. 495–498.Google Scholar
  25. 25.
    N.D. Smith, Using Augmented Statistical Models and Score Spaces for Classification, Ph.D. thesis, University of Cambridge, September 2003.Google Scholar
  26. 26.
    A. Gunawardana, M. Mahajan, A. Acero, and J.C. Platt, “Hidden Conditional Random Fields for Phone Classification,” in Interspeech, 2005.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of EngineeringUniversity of CambridgeCambridgeUK

Personalised recommendations