Experiments with learning graphical models on text

Capdevila, Joan; Zhao, He; Petitjean, François; Buntine, Wray

doi:10.1007/s41237-018-0050-3

Experiments with learning graphical models on text

Invited Paper
Published: 08 May 2018

Volume 45, pages 363–387, (2018)
Cite this article

Behaviormetrika Aims and scope Submit manuscript

Joan Capdevila¹,
He Zhao²,
François Petitjean² &
…
Wray Buntine ORCID: orcid.org/0000-0001-9292-1015²

184 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

Article 25 May 2020

Variational Bayes estimation of hierarchical Dirichlet-multinomial mixtures for text clustering

Article 15 May 2023

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Notes

References

Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal C, Zhai C (eds) Mining text data, pp 77–128
Archambeau C, Lakshminarayanan B, Bouchard G (2015) Latent IBP compound Dirichlet allocation. IEEE Trans Patt Anal Mach Intell 37(2):321–333
Article Google Scholar
Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Article Google Scholar
Buntine W, Jakulin A (2004) Applying discrete PCA in data analysis. In: 20th Conference on Uncertainty in Artificial Intelligence, Banff, Canada
Buntine W, Mishra S (2014) Experiments with non-parametric topic models. In: Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 881–890
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Patt Anal Mach Intell 33(8):1548–1560
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intellent Syst Technol 2(3):27:1–27:27
Chen P, Zhang N, Liu T, Poon L, Chen Z, Khawar F (2017) Latent tree models for hierarchical topic detection. Artif Intell 250:105–124
Article MathSciNet Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proc. of the 25th International Conference on Machine Learning, ACM, pp 160–167
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proc. of the 23rd international conference on Machine learning, ACM, pp 233–240
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163
Article Google Scholar
Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proc. of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp 601–602
Heckerman D, Chickering D (1995) Learning Bayesian networks: The combination of knowledge and statistical data. In: Machine Learning, pp 20–197
Hu C, Rai P, Carin L (2016) Non-negative matrix factorization for discrete data with hierarchical side-information. In: Gretton A, Robert C (eds) Proc. of the 19th International Conference on Artificial Intelligence and Statistics, pp 1124–1132
Li C, Wang H, Zhang Z, Sun A, Ma Z (2016) Topic modeling for short texts with auxiliary word embeddings. In: Proc. of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp 165–174
Lim K, Buntine W (2016) Bibliographic analysis on research publications using authors, categorical labels and the citation network. Mach Learn 103:185–213
Article MathSciNet Google Scholar
Liu T, Zhang N, Chen P (2014) Hierarchical latent tree analysis for topic detection. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 256–272
Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instr Comput 28(2):203–208
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionally. In: Advances in Neural Information Processing Systems, pp 3111–3119
Nguyen D, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313
Google Scholar
Paisley J, Wang C, Blei D, Jordan M (2015) Nested hierarchical Dirichlet processes. IEEE Trans Patt Anal Mach Intell 37(2):256–270
Article Google Scholar
Petitjean F, Webb G (2015) Scaling log-linear analysis to datasets with thousands of variables. In: Proc. of the 2015 SIAM International Conference on Data Mining, SIAM, pp 469–477
Petitjean F, Webb G (2016) Scalable learning of graphical models. In: Proc. of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pp 2131–2132
Robertson S, Zaragoza H (2009) The probabilistic relevance framework. Now Publishers Inc., Hanover, MA, USA
Google Scholar
Silander T (2016) Bayesian network structure learning with a quotient normalized maximum likelihood criterion. In: Proc. of the Ninth Workshop on Information Theoretic Methods in Science and Engineering, Helsinki, Finland, pp 32–35
Socher R, Huval B, Manning C, Ng A (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proc. of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, pp 1201–1211
Suzuki J (2017) A theoretical analysis of the BDeu scores in Bayesian network structure learning. Behaviormetrika 44(1):97–116
Article Google Scholar
Suzuki J, Kawahara J (2017) Branch and bound for regular Bayesian network structure learning. In: Conference on Uncertainty in Artificial Intelligence, Sydney, Australia
Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical Dirichlet processes. J ASA 101(476):1566–1581
MathSciNet MATH Google Scholar
Wallach H, Murray I, Salakhutdinov R, Mimno D (2009) Evaluation methods for topic models. In: Bottou L, Littman M (eds) Proc. of the 26th International Conference on Machine Learning
Wei X, Croft W (2006) LDA-based document models for ad-hoc retrieval. In: Proc. of the 29th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 178–185
Yin J, Wang J (2014) A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 233–242
Zhou M (2015) Infinite edge partition models for overlapping community detection and link prediction. In: Proc. of 18th International Conference on Artificial Intelligence and Statistics, pp 1135–1143
Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and Poisson factor analysis. In: Proc. of the 15th International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, pp 1462–1471

Download references

Author information

Authors and Affiliations

Universitat Politècnica de Catalunya & Barcelona Supercomputing Center, 1-3 Jordi Girona, 08034, Barcelona, Spain
Joan Capdevila
Monash University, 25 Exhibition walk, Clayton, 3800, Australia
He Zhao, François Petitjean & Wray Buntine

Authors

Joan Capdevila
View author publications
You can also search for this author in PubMed Google Scholar
He Zhao
View author publications
You can also search for this author in PubMed Google Scholar
François Petitjean
View author publications
You can also search for this author in PubMed Google Scholar
Wray Buntine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wray Buntine.

Additional information

Communicated by Brandon Malone and Joe Suzuki.

J. Capdevila is supported by Obra Social “la Caixa”; Dr F. Petitjean is the recipient of an Australian Research Council Discovery Early Career Award (project number DE170100037) funded by the Australian Government.

Appendices

A Log-likelihood

B Omni-directional prediction

C Anomaly detection

See Figs. 9 and 10.

D Running times

See Figs. 11 and 12.

About this article

Cite this article

Capdevila, J., Zhao, H., Petitjean, F. et al. Experiments with learning graphical models on text. Behaviormetrika 45, 363–387 (2018). https://doi.org/10.1007/s41237-018-0050-3

Download citation

Received: 26 January 2018
Accepted: 27 April 2018
Published: 08 May 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s41237-018-0050-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Experiments with learning graphical models on text

Abstract

Access this article

Similar content being viewed by others

Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data

Variational Bayes estimation of hierarchical Dirichlet-multinomial mixtures for text clustering

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Notes

References