Skip to main content

Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels


We propose an alternative to the generative classifier that usually models both the class conditionals and class priors separately, and then uses the Bayes theorem to compute the posterior distribution of classes given the training set as a decision boundary. Because SVM (support vector machine) is not a probabilistic framework, it is really difficult to implement a direct posterior distribution-based discriminative classifier. As SVM lacks in full Bayesian analysis, we propose a hybrid (generative–discriminative) technique where the generative topic features from a Bayesian learning are fed to the SVM. The standard latent Dirichlet allocation topic model with its Dirichlet (Dir) prior could be defined as Dir–Dir topic model to characterize the Dirichlet placed on the document and corpus parameters. With very flexible conjugate priors to the multinomials such as generalized Dirichlet (GD) and Beta-Liouville (BL) in our proposed approach, we define two new topic models: the BL–GD and GD–BL. We take advantage of the geometric interpretation of our generative topic (latent) models that associate a K-dimensional manifold (K is the size of the topics) embedded into a V-dimensional feature space (word simplex) where V is the vocabulary size. Under this structure, the low-dimensional topic simplex (the subspace) defines a document as a single point on its manifold and associates each document with a single probability. The SVM, with its kernel trick, performs on these document probabilities in classification where it utilizes the maximum margin learning approach as a decision boundary. The key note is that points or documents that are close to each other on the manifold must belong to the same class. Experimental results with text documents and images show the merits of the proposed framework.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Holub AD, Welling M, Perona P (2005) Combining generative models and fisher kernels for object recognition. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 136–143

  2. 2.

    Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. MIT Press, pp 841–848.

  3. 3.

    Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 64–71

  4. 4.

    Ihou KE, Bouguila N (2017) A new latent generalized dirichlet allocation model for image classification. In: 2017 Seventh international conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6

  5. 5.

    Ihou KE, Bouguila N (2019) Variational-based latent generalized dirichlet allocation model in the collapsed space and applications. Neurocomputing 332:372–395

    Google Scholar 

  6. 6.

    Bouguila N (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202

    Google Scholar 

  7. 7.

    Bouguila N, Ziou D (2010) A dirichlet process mixture of generalized dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122

    Google Scholar 

  8. 8.

    Bouguila N (2011) Count data modeling and classification using finite mixtures of distributions. IEEE Trans Neural Netw 22(2):186–198

    Google Scholar 

  9. 9.

    Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682

    Google Scholar 

  10. 10.

    Weber M, Welling M, Perona P (2000) Towards automatic discovery of object categories. In: cvpr, p 39

  11. 11.

    Fergus R, Perona P, Zisserman A et al (2003) Object class recognition by unsupervised scale-invariant learning. In: CVPR (2), pp 264–271

  12. 12.

    Leibe B, Schiele B (2004) Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Joint pattern recognition symposium. Springer, pp 145–153

  13. 13.

    Schneiderman H (2004) Learning a restricted bayesian network for object detection. CVPR 2(4):639–646

    Google Scholar 

  14. 14.

    Bakhtiari AS, Bouguila N (2014) A variational Bayes model for count data learning and classification. Eng Appl Artif Intell 35:176–186

    Google Scholar 

  15. 15.

    Bakhtiari AS, Bouguila N (2014) Online learning for two novel latent topic models. In: Information and communication technology: second IFIP TC 5/8 international conference, ICT-EurAsia 2014, Bali, Indonesia, 14–17 Apr, proceedings, vol 8407. Springer, p 286

  16. 16.

    Fei-Fei L (2004) Learning generative visual models from few training examples. In: Workshop on generative-model based vision. In: IEEE Proceedings CVPR

  17. 17.

    Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  18. 18.

    Yeh C, Tsai YH, Wang YF Generative-discriminative variational model for visual recognition. CoRR arXiv:1706.02295

  19. 19.

    Roth W, Peharz R, Tschiatschek S, Pernkopf F (2018) Hybrid generative-discriminative training of Gaussian mixture models. Pattern Recogn Lett 112:131–137

    Google Scholar 

  20. 20.

    Zheng W, Liu Y, Lu H, Tang H (2017) Discriminative topic sparse representation for text categorization. In: 2017 10th International symposium on computational intelligence and design (ISCID), vol 1. IEEE, pp 454–457

  21. 21.

    Jaakkola T, David H (1999) Exploiting generative models in discriminative classifiers. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11. MIT Press, pp 487–493.

  22. 22.

    Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5(Jul):819–844

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Vasconcelos N, Ho P, Moreno P (2004) The Kullback–Leibler kernel as a framework for discriminant and localized representations for visual recognition. In: European conference on computer vision. Springer, pp 430–441

  24. 24.

    Tsuda K, Kawanabe M, Rätsch G, Sonnenburg S, Müller K-R (2002) A new discriminative kernel from probabilistic models. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems 14. MIT Press, pp 977–984.

  25. 25.

    Prasad KR, Mohammed M, Noorullah R (2019) Visual topic models for healthcare data clustering. Evol Intell 12:1–17

    Google Scholar 

  26. 26.

    Xia L, Luo D, Zhang C, Wu Z (2019) A survey of topic models in text classification. In: 2019 2nd international conference on artificial intelligence and big data (ICAIBD). IEEE, pp 244–250

  27. 27.

    Steinhauer HJ, Helldin T, Mathiason G, Karlsson A (2019) Topic modeling for anomaly detection in telecommunication networks. J Ambient Intell Human Comput 10:1–12

    Google Scholar 

  28. 28.

    Laib L, Allili MS, Ait-Aoudia S (2019) A probabilistic topic model for event-based image classification and multi-label annotation. Sig Process Image Commun 76:283–294

    Google Scholar 

  29. 29.

    Yao F, Wang Y (2019) Tracking urban geo-topics based on dynamic topic model. Comput Environ Urban Syst 79:101419

    Google Scholar 

  30. 30.

    Venkatesaramani R, Downey D, Malin B, Vorobeychik Y (2019) A semantic cover approach for topic modeling. In: Proceedings of the eighth joint conference on lexical and computational semantics (*SEM 2019). Association for Computational Linguistics, Minneapolis, Minnesota, pp 92–102

  31. 31.

    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  32. 32.

    Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 2. IEEE, pp 524–531

  33. 33.

    Yang Y, Jia J, Zhang S, Wu B, Chen Q, Li J, Xing C, Tang J (2014) How do your friends on social media disclose your emotions? In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 306–312

  34. 34.

    Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, pp 99–108

  35. 35.

    Leng B, Zeng J, Yao M, Xiong Z (2015) 3D object retrieval with multitopic model combining relevance feedback and LDA model. IEEE Trans Image Process 24(1):94–105

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Caballero KL, Barajas J, Akella R (2012) The generalized dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 773–782

  37. 37.

    Foulds J, Boyles L, DuBois C, Smyth P, Welling M (2013) Stochastic collapsed variational bayesian inference for latent dirichlet allocation. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 446–454

  38. 38.

    Ghorbani B, Javadi H, Montanari A (2019) An instability in variational inference for topic models. In: International conference on machine learning, pp 2221–2231

  39. 39.

    Zhang AY, Zhou HH Theoretical and computational guarantees of mean field variational inference for community detection. arXiv preprint arXiv:1710.11268

  40. 40.

    Bakhtiari AS, Bouguila N (2016) A latent beta-Liouville allocation model. Expert Syst Appl 45:260–272

    Google Scholar 

  41. 41.

    Ihou KE, Bouguila N (2020) Stochastic topic models for large scale and nonstationary data. Eng Appl Artif Intell 88:103364

    Google Scholar 

  42. 42.

    Teh YW, Newman D, Welling M (2007) A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: Advances in neural information processing systems, pp 1353–1360

  43. 43.

    Bhagat P, Choudhary P (2018) Image annotation: then and now. Image Vis Comput 80:1–23

    Google Scholar 

  44. 44.

    Tian D, Shi Z (2019) A two-stage hybrid probabilistic topic model for refining image annotation. Int J Mach Learn Cybern 11:417–431

    Google Scholar 

  45. 45.

    Fan W, Bouguila N (2013) Learning finite beta-Liouville mixture models via variational bayes for proportional data clustering. In: IJCAI, pp 1323–1329

  46. 46.

    Moreno PJ, Ho PP, Vasconcelos N (2004) A Kullback–Leibler divergence based kernel for svm classification in multimedia applications. In: Advances in neural information processing systems, pp 1385–1392

  47. 47.

    Blei DM, Jordan MI et al (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–144

    MathSciNet  MATH  Google Scholar 

  48. 48.

    Fan W, Bouguila N (2014) Online data clustering using variational learning of a hierarchical dirichlet process mixture of dirichlet distributions. In: International conference on database systems for advanced applications. Springer, pp 18–32

  49. 49.

    Zhao H, Du L, Buntine W, Liu G (2017) Metalda: a topic model that efficiently incorporates meta information. In: 2017 IEEE international conference on data mining (ICDM). IEEE, pp 635–644

  50. 50.

    Kherwa P, Bansal P (2018) Topic modeling: a comprehensive review. ICST Trans Scalable Inf Syst 7:159623

    Google Scholar 

  51. 51.

    Li W, McCallum A (2006) Pachinko allocation: dag-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 577–584

  52. 52.

    Liu L, Huang H, Gao Y, Zhang Y, Wei X (2019) Neural variational correlated topic modeling. In: The world wide web conference. ACM, pp 1142–1152

  53. 53.

    Xun G, Li Y, Zhao WX, Gao J, Zhang A (2017) A correlated topic model using word embeddings. In: IJCAI, pp 4207–4213

  54. 54.

    Blei D, Lafferty J (2006) Correlated topic models. Adv Neural Inf Process Syst 18:147

    Google Scholar 

  55. 55.

    Korshunova I, Xiong H, Fedoryszak M, Theis L (2019) Discriminative topic modeling with logistic LDA. In: Advances in neural information processing systems 32. Curran Associates, Inc., pp 6767–6777

  56. 56.

    Mcauliffe JD, Blei DM (2008) Supervised topic models. In: Advances in neural information processing systems, pp 121–128

  57. 57.

    Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, pp 248–256

  58. 58.

    Lacoste-Julien S, Sha F, Jordan MI (2009) Disclda: discriminative learning for dimensionality reduction and classification. In: Advances in neural information processing systems, pp 897–904

  59. 59.

    Dieng AB, Ruiz FJR, Blei DM The dynamic embedded topic model. CoRR arXiv:1907.05545

  60. 60.

    Chi R, Wu B, Wang L (2018) Expert identification based on dynamic LDA topic model. In: 2018 IEEE third international conference on data science in cyberspace (DSC). IEEE, pp 881–888

  61. 61.

    Blei DM, Lafferty JD (2006) Dynamic topic models, In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120

  62. 62.

    Chen J, Zhu J, Lu J, Liu S (2018) Scalable training of hierarchical topic models. Proc VLDB Endow 11(7):826–839

    Google Scholar 

  63. 63.

    Banerjee A, Dhillon IS, Ghosh J, Sra S (2005) Clustering on the unit hypersphere using von Mises–Fisher distributions. J Mach Learn Res 6(Sep):1345–1382

    MathSciNet  MATH  Google Scholar 

  64. 64.

    Li Y, Liu C, Zhao M, Li R, Xiao H, Wang K, Zhang J (2016) Multi-topic tracking model for dynamic social network. Physica A 454:51–65

    Google Scholar 

  65. 65.

    Espinoza I, Mendoza M, Ortega P, Rivera D, Weiss F. Viscovery: trend tracking in opinion forums based on dynamic topic models, CoRR. arXiv:1805.00457

  66. 66.

    He Y, Lin C, Gao W, Wong K-F (2013) Dynamic joint sentiment-topic model. ACM Trans Intell Syst Technol (TIST) 5(1):6

    Google Scholar 

  67. 67.

    Fenglei J, Cuiyun G et al (2019) An online topic modeling framework with topics automatically labeled. In: Proceedings of the 2019 workshop on widening NLP, pp 73–76

  68. 68.

    Gao C, Zeng J, Lyu MR, King I (2018) Online app review analysis for identifying emerging issues. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE). IEEE, pp 48–58

  69. 69.

    Bui X, Vu T, Than K (2016) Stochastic bounds for inference in topic models. In: International conference on advances in information and communication technology. Springer, pp 582–592

  70. 70.

    AlSumait L, Barbará D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking, In: 2008 Eighth IEEE international conference on data mining. IEEE, pp 3–12

  71. 71.

    Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Ling 33(2):161–199

    MATH  Google Scholar 

  72. 72.

    Valdez D, Pickett AC, Goodson P (2018) Topic modeling: latent semantic analysis for the social sciences. Soc Sci Q 99(5):1665–1679

    Google Scholar 

  73. 73.

    Chang J, Blei D (2009) Relational topic models for document networks. In: van Dyk D, Welling M (eds) Proceedings of machine learning research, vol 5. PMLR, pp 81–88.

  74. 74.

    Blei DM, Franks K, Jordan MI, Mian IS (2006) Statistical modeling of biomedical corpora: mining the Caenorhabditis genetic center bibliography for genes related to life span. BMC Bioinform 7(1):250

    Google Scholar 

  75. 75.

    Xiong S, Wang K, Ji D, Wang B (2018) A short text sentiment-topic model for product reviews. Neurocomputing 297:94–102

    Google Scholar 

  76. 76.

    Hajjem M, Latiri C (2017) Combining IR and LDA topic modeling for filtering microblogs. Procedia Comput Sci 112:761–770

    Google Scholar 

  77. 77.

    Fritz M, Schiele B (2008) Decomposition, discovery and detection of visual categories using topic models. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  78. 78.

    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 370–377

  79. 79.

    Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s image search. In: Tenth IEEE international conference on computer vision (ICCV'05) Volume 1, vol 2, pp 1816–1823

  80. 80.

    Bouguila N (2008) Clustering of count data using generalized dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474

    Google Scholar 

  81. 81.

    Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543

    Google Scholar 

  82. 82.

    Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 50–57

  83. 83.

    Wu L, Shen L, Li Z (2016) A kernel method based on topic model for very high spatial resolution (VHSR) remote sensing image classification. ISPRS Int Arch Photogram Remote Sens Spatial Inf Sci XLI–B7:399–403

    Google Scholar 

  84. 84.

    Lienou M, Maitre H, Datcu M (2009) Semantic annotation of satellite images using latent dirichlet allocation. IEEE Geosci Remote Sens Lett 7(1):28–32

    Google Scholar 

  85. 85.

    Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581.

    MathSciNet  Article  MATH  Google Scholar 

  86. 86.

    Rematas K, Fritz M, Tuytelaars T (2012) Kernel density topic models: visual topics without visual words. In: NIPS workshops, modern nonparametric methods in machine learning

  87. 87.

    Nguyen V, Phung D, Venkatesh S (2015) Topic model kernel classification with probabilistically reduced features. J Data Sci 13(2):323–340

    Google Scholar 

  88. 88.

    Hennig P, Stern D, Herbrich R, Graepel T (2012) Kernel topic models, In: Artificial intelligence and statistics, pp 511–519

  89. 89.

    Muandet K, Fukumizu K, Dinuzzo F, Schölkopf B (2012) Learning from distributions via support measure machines. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 10–18.

  90. 90.

    Yoshikawa Y, Iwata T, Sawada H (2014) Latent support measure machines for bag-of-words data classification. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates, Inc., pp 1961–1969.

  91. 91.

    Bdiri T, Bouguila N (2013) Bayesian learning of inverted dirichlet mixtures for SVM kernels generation. Neural Comput Appl 23(5):1443–1458

    Google Scholar 

  92. 92.

    Than K, Doan T Guaranteed inference in topic models. arXiv preprint arXiv:1512.03308

  93. 93.

    Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: why priors matter. In: Proceedings of the 22nd international conference on neural information processing systems. Curran Associates Inc., pp 1973–1981

  94. 94.

    Wallach HM, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models, In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1105–1112

  95. 95.

    Chan AB, Vasconcelos N, Moreno PJ A family of probabilistic kernels based on information divergence. Technical Report, SVCL-TR-2004-1, University of California, San Diego, CA

  96. 96.

    Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151

    MathSciNet  MATH  Google Scholar 

  97. 97.

    Jebara T, Kondor R (2003) Bhattacharyya and expected likelihood kernels. In: Schölkopf B, Warmuth MK (eds) Learning theory and kernel machines. Springer, Berlin Heidelberg, pp 57–71

    MATH  Google Scholar 

  98. 98.

    Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 361–368

  99. 99.

    Zeng J, Liu Z-Q, Cao X-Q (2015) Fast online EM for big topic modeling. IEEE Trans Knowl Data Eng 28(3):675–688

    Google Scholar 

  100. 100.

    Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 27–34

  101. 101.

    Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28. Curran Associates, Inc., pp 649–657.

  102. 102.

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178

  103. 103.

    Wang JZ, Li J, Wiederhold G (2001) Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 9:947–963

    Google Scholar 

Download references


The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information



Corresponding author

Correspondence to Nizar Bouguila.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ihou, K.E., Bouguila, N. & Bouachir, W. Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels. Pattern Anal Applic 24, 217–241 (2021).

Download citation


  • Hybrid (generative–discriminative) models
  • Support vector machine
  • Conjugate priors
  • Beta-Liouville
  • Generalized Dirichlet
  • Probabilistic kernels
  • Document classification