, Volume 45, Issue 2, pp 363–387 | Cite as

Experiments with learning graphical models on text

  • Joan Capdevila
  • He Zhao
  • François Petitjean
  • Wray BuntineEmail author
Invited Paper


A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.


Graphical models Document analysis Unsupervised learning Matrix factorisation Latent variables Evaluation 

Supplementary material


  1. Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal C, Zhai C (eds) Mining text data, pp 77–128Google Scholar
  2. Archambeau C, Lakshminarayanan B, Bouchard G (2015) Latent IBP compound Dirichlet allocation. IEEE Trans Patt Anal Mach Intell 37(2):321–333CrossRefGoogle Scholar
  3. Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84CrossRefGoogle Scholar
  4. Buntine W, Jakulin A (2004) Applying discrete PCA in data analysis. In: 20th Conference on Uncertainty in Artificial Intelligence, Banff, CanadaGoogle Scholar
  5. Buntine W, Mishra S (2014) Experiments with non-parametric topic models. In: Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 881–890Google Scholar
  6. Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Patt Anal Mach Intell 33(8):1548–1560CrossRefGoogle Scholar
  7. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58Google Scholar
  8. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intellent Syst Technol 2(3):27:1–27:27Google Scholar
  9. Chen P, Zhang N, Liu T, Poon L, Chen Z, Khawar F (2017) Latent tree models for hierarchical topic detection. Artif Intell 250:105–124MathSciNetCrossRefGoogle Scholar
  10. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proc. of the 25th International Conference on Machine Learning, ACM, pp 160–167Google Scholar
  11. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proc. of the 23rd international conference on Machine learning, ACM, pp 233–240Google Scholar
  12. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163CrossRefGoogle Scholar
  13. Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proc. of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp 601–602Google Scholar
  14. Heckerman D, Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. In: Machine Learning, pp 20–197Google Scholar
  15. Hu C, Rai P, Carin L (2016) Non-negative matrix factorization for discrete data with hierarchical side-information. In: Gretton A, Robert C (eds) Proc. of the 19th International Conference on Artificial Intelligence and Statistics, pp 1124–1132Google Scholar
  16. Li C, Wang H, Zhang Z, Sun A, Ma Z (2016) Topic modeling for short texts with auxiliary word embeddings. In: Proc. of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp 165–174Google Scholar
  17. Lim K, Buntine W (2016) Bibliographic analysis on research publications using authors, categorical labels and the citation network. Mach Learn 103:185–213MathSciNetCrossRefGoogle Scholar
  18. Liu T, Zhang N, Chen P (2014) Hierarchical latent tree analysis for topic detection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 256–272Google Scholar
  19. Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instr Comput 28(2):203–208CrossRefGoogle Scholar
  20. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionally. In: Advances in Neural Information Processing Systems, pp 3111–3119Google Scholar
  21. Nguyen D, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313Google Scholar
  22. Paisley J, Wang C, Blei D, Jordan M (2015) Nested hierarchical Dirichlet processes. IEEE Trans Patt Anal Mach Intell 37(2):256–270CrossRefGoogle Scholar
  23. Petitjean F, Webb G (2015) Scaling log-linear analysis to datasets with thousands of variables. In: Proc. of the 2015 SIAM International Conference on Data Mining, SIAM, pp 469–477Google Scholar
  24. Petitjean F, Webb G (2016) Scalable learning of graphical models. In: Proc. of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp 2131–2132Google Scholar
  25. Robertson S, Zaragoza H (2009) The probabilistic relevance framework. Now Publishers Inc., Hanover, MA, USAGoogle Scholar
  26. Silander T (2016) Bayesian network structure learning with a quotient normalized maximum likelihood criterion. In: Proc. of the Ninth Workshop on Information Theoretic Methods in Science and Engineering, Helsinki, Finland, pp 32–35Google Scholar
  27. Socher R, Huval B, Manning C, Ng A (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proc. of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pp 1201–1211Google Scholar
  28. Suzuki J (2017) A theoretical analysis of the BDeu scores in Bayesian network structure learning. Behaviormetrika 44(1):97–116CrossRefGoogle Scholar
  29. Suzuki J, Kawahara J (2017) Branch and bound for regular Bayesian network structure learning. In: Conference on Uncertainty in Artificial Intelligence, Sydney, AustraliaGoogle Scholar
  30. Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical Dirichlet processes. J ASA 101(476):1566–1581MathSciNetzbMATHGoogle Scholar
  31. Wallach H, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models. In: Bottou L, Littman M (eds) Proc. of the 26th International Conference on Machine LearningGoogle Scholar
  32. Wei X, Croft W (2006) LDA-based document models for ad-hoc retrieval. In: Proc. of the 29th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 178–185Google Scholar
  33. Yin J, Wang J (2014) A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 233–242Google Scholar
  34. Zhou M (2015) Infinite edge partition models for overlapping community detection and link prediction. In: Proc. of 18th International Conference on Artificial Intelligence and Statistics, pp 1135–1143Google Scholar
  35. Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and Poisson factor analysis. In: Proc. of the 15th International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, pp 1462–1471Google Scholar

Copyright information

© The Behaviormetric Society 2018

Authors and Affiliations

  1. 1.Universitat Politècnica de Catalunya & Barcelona Supercomputing CenterBarcelonaSpain
  2. 2.Monash UniversityClaytonAustralia

Personalised recommendations