Supervised topic models with weighted words: multi-label document classification

  • Yue-peng Zou
  • Ji-hong Ouyang
  • Xi-ming Li


Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks. Representative models include labeled latent Dirichlet allocation (L-LDA) and dependency-LDA. However, these models neglect the class frequency information of words (i.e., the number of classes where a word has occurred in the training data), which is significant for classification. To address this, we propose a method, namely the class frequency weight (CF-weight), to weight words by considering the class frequency knowledge. This CF-weight is based on the intuition that a word with higher (lower) class frequency will be less (more) discriminative. In this study, the CF-weight is used to improve L-LDA and dependency-LDA. A number of experiments have been conducted on real-world multi-label datasets. Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models.

Key words

Supervised topic model Multi-label classification Class frequency Labeled latent Dirichlet allocation (L-LDA) Dependency-LDA 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blei DM, McAuliffe JD, 2007. Supervised topic models. 20th Int Conf on Neural Information Processing Systems, p.121–128.Google Scholar
  2. Blei DM, Ng AY, Jordan MI, 2003. Latent Dirichlet allocation. J Mach Learn Res, 3:993–1022.zbMATHGoogle Scholar
  3. Chang CC, Lin CJ, 2016. LIBSVM—a Library for Support Vector Machines. [Accessed on May 22, 2018].Google Scholar
  4. Debole F, Sebastiani F, 2004. Supervised term weighting for automated text categorization. In: Sirmakessis S (Ed.), Text Mining and Its Applications. Springer, Berlin, p.81–97. Google Scholar
  5. Ghahramani Z, 2001. An introduction to hidden Markov models and Bayesian networks. Int J Patt Recogn Artif Intell, 15(1):9–42. CrossRefGoogle Scholar
  6. Griffiths TL, Steyvers M, 2004. Finding scientific topics. Proc Nat Acad Sci USA, 101(Suppl 1):5228–5235. CrossRefGoogle Scholar
  7. Guan H, Zhou JY, Guo MY, 2009. A class-feature-centroid classifier for text categorization. 18th Int Conf on World Wide Web, p.201–210. Google Scholar
  8. Kim D, Kim S, Oh A, 2012. Dirichlet process with mixed random measures: a nonparametric topic model for labeled data. 29th Int Conf on Machine Learning, p.675–682.Google Scholar
  9. Lacoste-Julien S, Sha F, Jordan MI, 2008. DiscLDA: discriminative learning for dimensionality reduction and classification. 21st Int Conf on Neural Information Processing Systems, p.897–904.Google Scholar
  10. Lee S, Kim J, Myaeng SH, 2015. An extension of topic models for text classification: a term weighting approach. Int Conf on Big Data and Smart Computing, p.217–224. Google Scholar
  11. Li XM, Ouyang JH, Zhou XT, 2015a. Centroid prior topic model for multi-label classification. Patt Recogn Lett, 62:8–13. CrossRefGoogle Scholar
  12. Li XM, Ouyang JH, Zhou XT, 2015b. Supervised topic models for multi-label classification. Neurocomputing, 149:811–819. CrossRefGoogle Scholar
  13. Machine Learning & Knowledge Discovery Group, 2011. Learning from Multi-label Data. [Accessed on May 12, 2018].Google Scholar
  14. Madsen RE, Kauchak D, Elkan C, 2005. Modeling word burstiness using the Dirichlet distribution. 22nd Int Conf on Machine Learning, p.545–552. Google Scholar
  15. Petterson J, Smola A, Caetano T, et al., 2010. Word features for latent Dirichlet allocation. 23rd Int Conf on Neural Information Processing Systems, p.1921–1929.Google Scholar
  16. Ramage D, Hall D, Nallapati R, et al., 2009. Labeled LDA: a supervised topic model for credit attribution in multilabeled corpora. Conf on Empirical Methods in Natural Language Processing, p.248–256. Google Scholar
  17. Ramage D, Manning CD, Dumais S, 2011. Partially labeled topic models for interpretable text mining. 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.457-465. Google Scholar
  18. Reisinger J, Waters A, Silverthorn B, et al., 2010. Spherical topic models. Proc 27th Int Conf on Machine Learning, p.1-8.Google Scholar
  19. Rubin TN, Chambers A, Smyth P, et al., 2012. Statistical topic models for multi-label document classification. Mach Learn, 88(1–2): 157–208. MathSciNetCrossRefzbMATHGoogle Scholar
  20. Salton G, Buckley C, 1988. Term-weighting approaches in automatic text retrieval. Inform Process Manag, 24(5): 513–523. CrossRefGoogle Scholar
  21. Shang LF, Chan KP, Pan GD, 2011. DTTM: a discriminative temporal topic model for facial expression recognition. 7th Int Conf on Advances in Visual Computing, p.596–606. CrossRefGoogle Scholar
  22. Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, et al., 2011a. Mulan: a Java library for multi-label learning. J Mach Learn Res, 12(7):2411–2414.MathSciNetzbMATHGoogle Scholar
  23. Tsoumakas G, Katakis I, Vlahavas I, 2011b. Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng, 23(7):1079–1089. CrossRefGoogle Scholar
  24. Wilson AT, Chew PA, 2010. Term weighting schemes for latent Dirichlet allocation. Human Language Technologies: Annual Conf of the North American Chapter of the Association for Computational Linguistics, p.465–473.Google Scholar
  25. Zhu J, Ahmed A, Xing EP, 2012. MedLDA: maximum margin supervised topic models. 26th Annual Int Conf on Machine Learning, p.1257–1264. Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.MOE Key Laboratory of Symbolic Computation and Knowledge EngineeringJilin UniversityChangchunChina

Personalised recommendations