Large Margin Learning of Bayesian Classifiers Based on Gaussian Mixture Models

  • Franz Pernkopf
  • Michael Wohlmayr
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)


We present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm [1]. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application. The experiments show that CL-optimized GMMs (CL-GMMs) achieve a lower performance compared to MM-optimized GMMs (MM-GMMs), whereas both discriminative GMMs (DGMMs) perform significantly better than generatively learned GMMs. We also show that the generative discriminatively parameterized GMM classifiers still allow to marginalize over missing features, a case where generative classifiers have an advantage over purely discriminative classifiers such as support vector machines or neural networks.


Support Vector Machine Bayesian Network Speech Recognition Gaussian Mixture Model Decision Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gopalakrishnan, O., Kanevsky, D., Nàdas, A., Nahamoo, D.: An inequality for rational functions with applications to some statistical estimation problems. IEEE Transactions on Information Theory 37(1), 107–113 (1991)zbMATHCrossRefGoogle Scholar
  2. 2.
    Vapnik, V.: Statistical learning theory. Wiley & Sons, Chichester (1998)zbMATHGoogle Scholar
  3. 3.
    Schölkopf, B., Smola, A.: Learning with kernels: Support Vector Machines, regularization, optimization, and beyond. MIT Press, Cambridge (2001)Google Scholar
  4. 4.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Advances in Neural Information Processing Systems, NIPS (2003)Google Scholar
  5. 5.
    Guo, Y., Wilkinson, D., Schuurmans, D.: Maximum margin Bayesian networks. In: International Conference on Uncertainty in Artificial Intelligence, UAI (2005)Google Scholar
  6. 6.
    Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., Tirri, H.: On discriminative Bayesian network classifiers and logistic regression. Machine Learning 59, 267–296 (2005)zbMATHGoogle Scholar
  7. 7.
    Sha, F., Saul, L.: Large margin Gaussian mixture modeling for phonetic classification and recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2006)Google Scholar
  8. 8.
    Sha, F., Saul, L.: Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 313–316 (2007)Google Scholar
  9. 9.
    Heigold, G., Deselaers, T., Schlüter, R., Ney, H.: Modified MMI/MPE: A direct evaluation of the margin in speech recognition. In: International Conference on Machine Learning (ICML), pp. 384–391 (2008)Google Scholar
  10. 10.
    Collobert, R., Siz, F., Weston, J., Bottou, L.: Trading convexity for scalability. In: International Conference on Machine Learning (ICML), pp. 201–208 (2006)Google Scholar
  11. 11.
    Schlüter, R., Macherey, W., Müller, B., Ney, H.: Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication 34, 287–310 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Bahl, L., Brown, P., de Souza, P., Mercer, R.: Maximum Mutual Information estimation of HMM parameters for speech recognition. In: IEEE Conf. on Acoustics, Speech, and Signal Proc., pp. 49–52 (1986)Google Scholar
  13. 13.
    Woodland, P., Povey, D.: Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech and Language 16, 25–47 (2002)CrossRefGoogle Scholar
  14. 14.
    Klautau, A., Jevtić, N., Orlitsky, A.: Discriminative Gaussian mixture models: A comparison with kernel classifiers. In: Inter. Conf. on Machine Learning (ICML), pp. 353–360 (2003)Google Scholar
  15. 15.
    Pernkopf, F., Van Pham, T., Bilmes, J.: Broad phonetic classification using discriminative Bayesian networks. Speech Communication 143(1), 123–138 (2008)Google Scholar
  16. 16.
    Bishop, C.M.: Pattern recognition and machine learning. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  17. 17.
    Pernkopf, F., Bouchaffra, D.: Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1344–1348 (2005)CrossRefGoogle Scholar
  18. 18.
    Merialdo, B.: Phonetic recognition using hidden Markov models and maximum mutual information training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 111–114 (1988)Google Scholar
  19. 19.
    Normandin, Y., Morgera, S.: An improved MMIE training algorithm for speaker-independent small vocabulary, continuous speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 537–540 (1991)Google Scholar
  20. 20.
    Normandin, Y., Cardin, R., De Mori, R.: High-performance connected digit recognition using maximum mutual information estimation. IEEE Trans. on Speech and Audio Proc. 2(2), 299–311 (1994)CrossRefGoogle Scholar
  21. 21.
    Lamel, L., Kassel, R., Seneff, S.: Speech database development: Design and analysis of the acoustic-phonetic corpus. In: DARPA Speech Recognition Workshop, Report No. SAIC-86/1546 (1986)Google Scholar
  22. 22.
    Crammer, K., Singer, Y.: On the algorithmic interpretation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)CrossRefGoogle Scholar
  23. 23.
    Jain, A., Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition in practice. Handbook of Statistics, vol. 2. North-Holland, Amsterdam (1982)Google Scholar
  24. 24.
    Baum, L., Eagon, J.: An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology. Bull. Amer. Math. Soc. 73, 360–363 (1967)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Franz Pernkopf
    • 1
  • Michael Wohlmayr
    • 1
  1. 1.Graz University of TechnologyGrazAustria

Personalised recommendations