Discriminative learning of generative models: large margin multinomial mixture models for document classification

Jiang, Hui; Pan, Zhenyu; Hu, Pingzhao

doi:10.1007/s10044-014-0382-x

Discriminative learning of generative models: large margin multinomial mixture models for document classification

Theoretical Advances
Published: 06 June 2014

Volume 18, pages 535–551, (2015)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Hui Jiang¹,
Zhenyu Pan¹ &
Pingzhao Hu¹

231 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, a novel discriminative learning method is proposed to estimate generative models for multi-class pattern classification tasks, where a discriminative objective function is formulated with separation margins according to certain discriminative learning criterion, such as large margin estimation (LME). Furthermore, the so-called approximation-maximization (AM) method is proposed to optimize the discriminative objective function w.r.t. parameters of generative models. The AM approach provides a good framework to deal with latent variables in generative models and it is flexible enough to discriminatively learn many rather complicated generative models. In this paper, we are interested in a group of generative models derived from multinomial distributions. Under some minor relaxation conditions, it is shown that the AM-based discriminative learning methods for these generative models result in linear programming (LP) problems that can be solved effectively and efficiently even for rather large-scale models. As a case study, we have studied to learn multinomial mixture models (MMMs) for text document classification based on the large margin criterion. The proposed methods have been evaluated on a standard RCV1 text corpus. Experimental results show that large margin MMMs significantly outperform the conventional MMMs as well as pure discriminative models such as support vector machines (SVM), where over 25 % relative classification error reduction is observed in three independent RCV1 test sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

A linear program is an optimization problem where its objective function and constraints are all linear. Linear programming is a special case of convex optimization and it can be reliably solved with great efficiency.
Since the auxiliary function is tangent to the original objective function, the whole optimization process can be approximately viewed as a gradient ascent method as long as the box sizes are small enough.

References

Altun Y, Tsochantaridis I, Hofmann T (2003) Hidden Markov support vector machines. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington D.C., pp 3–10
Arenas-Garcia J, Perez-Cruz F (2003) Multi-class support vector machines: a new approach. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP’2003), Hong Kong, pp II-781–II-784
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brown LD (1986) Fundamentals of statistical exponential families, with applications in statistical decision theory. Institute of Mathematical Statistics, Hayward
MATH Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Transac Intell Syst Technol 2(3):27.1–27.27
Chu-Carroll J, Carpenter B (1999) Vector-based natural language call routing. Comput Linguist 25(3):361–388
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
MathSciNet Google Scholar
Druck G, Pal C, Zhu X, Mccallum A (2007) Semi-supervised classification with hybrid generative/discriminative methods. In: ACM international conference on knowledge discovery and data mining, pp 280–289
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. In: Proceedings of the 5th Pacific-Asia conference on knowledge discovery and data mining, Hong Kong
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Net 13:415–425
Article Google Scholar
Jaakkola T, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Proceedings of advances in neural information processing systems (NIPS), no. 11
Jaakkola T, Meila M, Jebara T (1999) Maximum entropy discrimination. In: Proceedings of advances in neural information processing systems (NIPS), no. 12
Jebara T, Pentland A (1998) Maximum conditional likelihood via bound maximization and the CEM algorithm. In: Proceedings of advances in neural information processing systems (NIPS), no. 11
Jebara T (2002.) Discriminative, generative and imitative learning. Ph.D. thesis, MIT, Feb 2002
Jiang H, Li X, Liu C-J (2006) Large margin hidden markov models for speech recognition. IEEE Trans Audio Speech Lang Process 15(5):1584–1595
Article Google Scholar
Jiang H, Li X (2007) Incorporating training errors for large margin HMMs under semi-definite programming framework. In: Proceedings of 2007 IEEE international conference on acoustic, speech, and signal processing (ICASSP’2007), pp 629–632, Hawaii
Jiang H, Li X (2007) A general approximation-optimization approach to large margin estimation of HMMs. In: Kodic V (ed) Speech recognition and synthesis. I-tech
Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang 24(4):589–608
Article Google Scholar
Jiang H, Li X (2010) Parameter estimation of statistical models using convex optimization: an advanced method of discriminative training for speech and language processing. IEEE Signal Process Mag 27(3):115–127
Article Google Scholar
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European conference on machine learning (ECML), Springer
Jordan MI (2004) Graphical models. Stat Sci (Spec Issue Bayesian Stat) 19:140–155
MATH Google Scholar
Katagiri S, Juang B-H, Lee C-H (1998) Pattern recognition using a generalized probabilistic descent method. Proc IEEE 86(11):2345–2373
Article Google Scholar
Lewis DD, Yang Y, Rose T, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Li X, Jiang H, Liu C-J (2005) Large margin HMMs for speech recognition. In: Proceedings of 2005 IEEE international conference on acoustic, speech, and signal processing (ICASSP’2005), Philadelphia, pp V513–V516
Li X, Jiang H (2005) A constrained joint optimization method for large margin HMM estimation. In: Proceedings of 2005 IEEE workshop on automatic speech recognition and understanding
Li X, Jiang H (2006) Solving large margin HMM estimation via semi-definite programming. In: Proceedings of 2006 international conference on spoken language processing (ICSLP’2006), Pittsburgh
Li X, Jiang H (2007) Solving large margin hidden markov model estimation via semidefinite programming. IEEE Trans Audio Speech Lang Process 15(8):2383–2392
Article Google Scholar
Liu P, Jiang H, Zitouni I (2004) Discriminative training of Naive Bayes classifiers for natural language call routing. In: Proceedings of international conference on spoken language processing (ICSLP’2004), Jeju Island, Oct 2004
Liu C-J, Jiang H, Li X (2005) Discriminative training of CDHMMs for maximum relative separation margin. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP’2005), Philadelphia, pp V101–V104
Liu C, Liu P, Jiang H, Soong F, Wang R-H (2007) A constrained line search optimization for discriminative training in speech recognition. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP’2007), Hawaii
McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. In: Proceedings of the AAAI-98 workshop on learning for text categorization, AAAI Press
Neal R, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. Kluwer Academic Publishers, Dordrecht, pp 355–368
Chapter Google Scholar
Novovicova J, Malik A (2003) Application of multinomial mixture model to text classification, pattern recognition and image analysis, lecture notes in computer science, vol 2652. Springer, Berlin, pp 646–653
Google Scholar
Pan Z-Y, Jiang H (2008) Large margin multinomial mixture model for text categorization. In: Proceedings of interspeech 2008, Brisbane, pp 1566–1569, Sept 2008
Quattoni A, Collins M, Darrell T (2004) Conditional random fields for object recognition. In: Proceedings of neural information processing systems conference (NIPS), MIT Press, pp 1097–1104
Smola AJ, Bartlett P, Scholkopf B, Schuurmans D (eds) (1999) Advances in large margin classifiers, The MIT Press, Cambridge, Massachusetts
Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. In: Proceedings of neural information processing systems conference (NIPS), no. 16
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In: Proceedings of European symposium on artificial neural networks
Yan Y, Jiang H (2007) A fast optimization method for large margin estimation of HMMs based on second order cone programming. In: Proceedings of interspeech 2007
Yan Y, Jiang H (2009) Second order cone programming (SOCP) relaxations for large margin HMMs in speech recognition. In: Proceedings of 2009 IEEE international symposium on circuits and systems, Taiwan
Yu C-N, Joachims T (2009) Learning Structural SVMs with latent variables. In: Proceedings of the 26th international conference on machine learning, Montreal, pp 1169–1176, June 2009

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, York University, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada
Hui Jiang, Zhenyu Pan & Pingzhao Hu

Authors

Hui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Pingzhao Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Pan, Z. & Hu, P. Discriminative learning of generative models: large margin multinomial mixture models for document classification. Pattern Anal Applic 18, 535–551 (2015). https://doi.org/10.1007/s10044-014-0382-x

Download citation

Received: 09 August 2012
Accepted: 21 May 2014
Published: 06 June 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10044-014-0382-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative learning of generative models: large margin multinomial mixture models for document classification

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative learning of generative models: large margin multinomial mixture models for document classification

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation