Knowledge and Information Systems

, Volume 59, Issue 1, pp 93–115 | Cite as

Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexity

  • Sophie BurkhardtEmail author
  • Stefan Kramer
Regular Paper


Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear in the number of topics per document by using an approximation based on Metropolis–Hastings sampling. By utilizing a single global distribution, we are able to further improve the test set log-likelihood of this approximation. Furthermore, we propose a novel model of stacked HDPs utilizing this sampling method. An extensive analysis reveals the importance of the correct setting of hyperparameters for classification and shows the convergence properties of our method. Experiments demonstrate the effectiveness of the proposed approach in the context of multi-label classification as compared to previous Dependency-LDA models.


Multi-label classification Hierarchical Dirichlet process Topic model Alias sampling 



We thank the anonymous reviewers for useful comments and suggestions, Jinseok Nam for providing the source code of the neural network classifier and Andrey Tyukin for helpful discussions on Stirling numbers. The first author was supported by a scholarship from PRIME Research, Mainz.


  1. 1.
    Antoniak CE (1974) Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022zbMATHGoogle Scholar
  3. 3.
    Buntine W, Hutter M (2010) A Bayesian view of the Poisson–Dirichlet process. arXiv preprint arXiv:1007.0296
  4. 4.
    Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 881–890Google Scholar
  5. 5.
    Burkhardt S, Kramer S (2017) Multi-label classification using stacked hierarchical dirichlet processes with reduced sampling complexity. In: ICBK 2017—international conference on big knowledge, IEEE, pp 1–8Google Scholar
  6. 6.
    Burkhardt S, Kramer S (2017) Online sparse collapsed hybrid variational-gibbs algorithm for hierarchical dirichlet process topic models. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Proceedings of ECML-PKDD 2017. Springer International Publishing, Cham, pp 189–204Google Scholar
  7. 7.
    Chen C, Du L, Buntine W (2011) Sampling table configurations for the hierarchical Poisson–Dirichlet process. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Proceedings of ECML-PKDD. Springer, Heidelberg, pp 296–311Google Scholar
  8. 8.
    Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: ECML-PKDD discovery challenge, vol 75Google Scholar
  9. 9.
    Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397Google Scholar
  10. 10.
    Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 891–900Google Scholar
  11. 11.
    Li C, Cheung WK, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383CrossRefGoogle Scholar
  12. 12.
    Li W (2007) Pachinko allocation: DAG-structured mixture models of topic correlations. Ph.D. thesis, University of Massachusetts AmherstGoogle Scholar
  13. 13.
    Loza Mencía E, Fürnkranz J (2010) Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi E, Montemagni S, Peters W, Tiscornia D (eds) Semantic processing of legal texts—where the language of law meets the law of language. Lecture notes in artificial intelligence, vol 6036, 1st edn. Springer, pp 192–215Google Scholar
  14. 14.
    Nam J, Kim J, Loza Mencía E, Gurevych I, Fürnkranz J (2014) Large-scale multi-label text classification—revisiting neural networks. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Proceedings of ECML-PKDD, part II. Springer, Heidelberg, pp 437–452Google Scholar
  15. 15.
    Papanikolaou Y, Foulds JR, Rubin, TN, Tsoumakas G (2015) Dense distributions from sparse samples: improved Gibbs sampling parameter estimators for LDA. ArXiv e-printsGoogle Scholar
  16. 16.
    Prabhu Y, Varma M (2014) Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, ACM, New York, NY, USA, pp 263–272Google Scholar
  17. 17.
    Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1, EMNLP ’09, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 248–256Google Scholar
  18. 18.
    Ramage D, Manning CD, Dumais S (2011) Partially labeled topic models for interpretable text mining. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, ACM, New York, NY, USA, pp 457–465Google Scholar
  19. 19.
    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRefGoogle Scholar
  20. 20.
    Ren L, Dunson DB, Carin L (2008) The dynamic hierarchical Dirichlet process. In: Proceedings of the 25th ICML international conference on machine learning, ACM, pp 824–831Google Scholar
  21. 21.
    Rodríguez A, Dunson DB, Gelfand AE (2008) The nested Dirichlet process. J Am Stat Assoc 103(483):1131–1154MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Salakhutdinov R, Tenenbaum JB, Torralba A (2013) Learning with hierarchical-deep models. IEEE Trans Pattern Anal Mach Intell 35(8):1958–1971CrossRefGoogle Scholar
  24. 24.
    Shimosaka M, Tsukiji T, Tominaga S, Tsubouchi K (2016) Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Proceedings of ECML-PKDD. Springer International Publishing, Cham, pp 230–246Google Scholar
  25. 25.
    Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Tsoumakas G, Katakis I, Vlahavas IP (2008) Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 workshop on mining multidimensional dataGoogle Scholar
  27. 27.
    Wood F, Archambeau C, Gasthaus J, James L, Teh YW (2009) A stochastic memoizer for sequence data. In: Proceedings of the 26th ICML international conference on machine learning, ACM, pp 1129–1136Google Scholar
  28. 28.
    Yen IEH, Huang X, Ravikumar P, Zhong K, Dhillon I (2016) Pd-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd international conference on machine learning, ACM, pp 3069–3077Google Scholar
  29. 29.
    Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Computer ScienceJohannes Gutenberg-University of MainzMainzGermany

Personalised recommendations