Experiments with learning graphical models on text

Abstract

A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    https://github.com/fpetitjean/Chordalysis.

  2. 2.

    https://github.com/kmpoon/hlta.

  3. 3.

    https://github.com/kmpoon/hlta.

  4. 4.

    http://www.cs.nyu.edu/~roweis/data.html.

  5. 5.

    http://qwone.com/~jason/20Newsgroups.

  6. 6.

    http://catalog.ldc.upenn.edu/LDC2008T19.

References

  1. Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal C, Zhai C (eds) Mining text data, pp 77–128

  2. Archambeau C, Lakshminarayanan B, Bouchard G (2015) Latent IBP compound Dirichlet allocation. IEEE Trans Patt Anal Mach Intell 37(2):321–333

    Article  Google Scholar 

  3. Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  4. Buntine W, Jakulin A (2004) Applying discrete PCA in data analysis. In: 20th Conference on Uncertainty in Artificial Intelligence, Banff, Canada

  5. Buntine W, Mishra S (2014) Experiments with non-parametric topic models. In: Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 881–890

  6. Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Patt Anal Mach Intell 33(8):1548–1560

    Article  Google Scholar 

  7. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58

  8. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intellent Syst Technol 2(3):27:1–27:27

  9. Chen P, Zhang N, Liu T, Poon L, Chen Z, Khawar F (2017) Latent tree models for hierarchical topic detection. Artif Intell 250:105–124

    MathSciNet  Article  Google Scholar 

  10. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proc. of the 25th International Conference on Machine Learning, ACM, pp 160–167

  11. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proc. of the 23rd international conference on Machine learning, ACM, pp 233–240

  12. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163

    Article  Google Scholar 

  13. Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proc. of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp 601–602

  14. Heckerman D, Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. In: Machine Learning, pp 20–197

  15. Hu C, Rai P, Carin L (2016) Non-negative matrix factorization for discrete data with hierarchical side-information. In: Gretton A, Robert C (eds) Proc. of the 19th International Conference on Artificial Intelligence and Statistics, pp 1124–1132

  16. Li C, Wang H, Zhang Z, Sun A, Ma Z (2016) Topic modeling for short texts with auxiliary word embeddings. In: Proc. of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp 165–174

  17. Lim K, Buntine W (2016) Bibliographic analysis on research publications using authors, categorical labels and the citation network. Mach Learn 103:185–213

    MathSciNet  Article  Google Scholar 

  18. Liu T, Zhang N, Chen P (2014) Hierarchical latent tree analysis for topic detection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 256–272

  19. Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instr Comput 28(2):203–208

    Article  Google Scholar 

  20. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionally. In: Advances in Neural Information Processing Systems, pp 3111–3119

  21. Nguyen D, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313

    Google Scholar 

  22. Paisley J, Wang C, Blei D, Jordan M (2015) Nested hierarchical Dirichlet processes. IEEE Trans Patt Anal Mach Intell 37(2):256–270

    Article  Google Scholar 

  23. Petitjean F, Webb G (2015) Scaling log-linear analysis to datasets with thousands of variables. In: Proc. of the 2015 SIAM International Conference on Data Mining, SIAM, pp 469–477

  24. Petitjean F, Webb G (2016) Scalable learning of graphical models. In: Proc. of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp 2131–2132

  25. Robertson S, Zaragoza H (2009) The probabilistic relevance framework. Now Publishers Inc., Hanover, MA, USA

    Google Scholar 

  26. Silander T (2016) Bayesian network structure learning with a quotient normalized maximum likelihood criterion. In: Proc. of the Ninth Workshop on Information Theoretic Methods in Science and Engineering, Helsinki, Finland, pp 32–35

  27. Socher R, Huval B, Manning C, Ng A (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proc. of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pp 1201–1211

  28. Suzuki J (2017) A theoretical analysis of the BDeu scores in Bayesian network structure learning. Behaviormetrika 44(1):97–116

    Article  Google Scholar 

  29. Suzuki J, Kawahara J (2017) Branch and bound for regular Bayesian network structure learning. In: Conference on Uncertainty in Artificial Intelligence, Sydney, Australia

  30. Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical Dirichlet processes. J ASA 101(476):1566–1581

    MathSciNet  MATH  Google Scholar 

  31. Wallach H, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models. In: Bottou L, Littman M (eds) Proc. of the 26th International Conference on Machine Learning

  32. Wei X, Croft W (2006) LDA-based document models for ad-hoc retrieval. In: Proc. of the 29th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 178–185

  33. Yin J, Wang J (2014) A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 233–242

  34. Zhou M (2015) Infinite edge partition models for overlapping community detection and link prediction. In: Proc. of 18th International Conference on Artificial Intelligence and Statistics, pp 1135–1143

  35. Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and Poisson factor analysis. In: Proc. of the 15th International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, pp 1462–1471

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Wray Buntine.

Additional information

J. Capdevila is supported by Obra Social “la Caixa”; Dr F. Petitjean is the recipient of an Australian Research Council Discovery Early Career Award (project number DE170100037) funded by the Australian Government.

Communicated by Brandon Malone and Joe Suzuki.

Appendices

A Log-likelihood

figurea
figureb

B Omni-directional prediction

figurec
figured
figuree

C Anomaly detection

See Figs. 9 and 10.

Fig. 9
figure9

Anomaly detection in WS data set

Fig. 10
figure10

Anomaly detection in 20Newsgroups data set

D Running times

See Figs. 11 and 12.

Fig. 11
figure11

Running time in WS data set

Fig. 12
figure12

Running time in 20Newsgroups data set

About this article

Verify currency and authenticity via CrossMark

Cite this article

Capdevila, J., Zhao, H., Petitjean, F. et al. Experiments with learning graphical models on text. Behaviormetrika 45, 363–387 (2018). https://doi.org/10.1007/s41237-018-0050-3

Download citation

Keywords

  • Graphical models
  • Document analysis
  • Unsupervised learning
  • Matrix factorisation
  • Latent variables
  • Evaluation