Non-uniform Kernel Allocation Based Parsimonious HMM

  • Peng Liu
  • Jian-Lai Zhou
  • Frank Soong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


In conventional Gaussian mixture based Hidden Markov Model (HMM), all states are usually modeled with a uniform, fixed number of Gaussian kernels. In this paper, we propose to allocate kernels non-uniformly to construct a more parsimonious HMM. Different number of Gaussian kernels are allocated to states in a non-uniform and parsimonious way so as to optimize the Minimum Description Length (MDL) criterion, which is a combination of data likelihood and model complexity penalty. By using the likelihoods obtained in Baum-Welch training, we develop an effcient backward kernel pruning algorithm, and it is shown to be optimal under two mild assumptions. Two databases, Resource Management and Microsoft Mandarin Speech Toolbox, are used to test the proposed parsimonious modeling algorithm. The new parsimonious models improve the baseline word recognition error rate by 11.1% and 5.7%, relatively. Or at the same performance level, a 35-50% model compressions can be obtained.


Hide Markov Model Gaussian Kernel Minimum Description Length Word Error Rate Bayesian Information Criterion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ariew, R.: Occam’s razor: A historical and philosophical analysis of Ockham’s principle of pasimony. Philosophy, Champaigh-Urbara, University of Illinois (1976)Google Scholar
  2. 2.
    Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csake, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
  3. 3.
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (1978)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Rissanen, J.: Stochastic complexity in statistical inquiry. World Scientific Publishing Company, Singapore (1989)MATHGoogle Scholar
  5. 5.
    Li, X.B., Soong, F.K., Myroll, T.A., Wang, R.H.: Optimal Clustering and Non-uniform Allocation of Gaussian Kernels in Scalar Dimension for HMM Compression. In: Proc. ICASSP 2005, vol. 1, pp. 669–672 (2005)Google Scholar
  6. 6.
    Takami, J., Sagayama, S.: A Successive State Splitting Algorithm for Efficient Allophone Modeling. In: Proc. ICASSP 1992, vol. I, pp. 573–576 (1992)Google Scholar
  7. 7.
    Huang, X.D., et al.: The SPHINX-II speech recognition system: An overview. Comput. Speech Language 2, 137–148 (1993)CrossRefGoogle Scholar
  8. 8.
    Baum, L.E., Eagon, J.A.: An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bulletin of American Mathematical Society 41, 360–363 (1970)MathSciNetGoogle Scholar
  9. 9.
    Kullback, S., Leibler, R.A.: On Information and Sufficiency. Ann. Math. Stat. 22, 79–86 (1951)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, NewYork (1991)MATHCrossRefGoogle Scholar
  11. 11.
    Chang, E., Shi, Y., Zhou, J.-L., Huang, C.: Speech lab in a box: A Mandarin speech toolbox to jumpstart speech related research toolbox. In: Eurospeech 2001, pp. 2799–2782 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Peng Liu
    • 1
  • Jian-Lai Zhou
    • 1
  • Frank Soong
    • 1
  1. 1.Microsoft Research AsiaBeijing

Personalised recommendations