Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

Qiang, Jipeng; Li, Yun; Yuan, Yunhao; Liu, Wei

doi:10.1007/s10489-018-1192-4

Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

Published: 10 May 2018

Volume 48, pages 3963–3975, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jipeng Qiang¹,
Yun Li¹,
Yunhao Yuan¹ &
…
Wei Liu¹

547 Accesses
9 Citations
Explore all metrics

Abstract

Recently many topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) have made important progress towards generating high-level knowledge from a large corpus. However, these algorithms based on random initialization generate different results on the same corpus using the same parameters, denoted as instability problem. For solving this problem, ensembles of NMF are known to be much more stable and accurate than individual NMFs. However, training multiple NMFs for ensembling is computationally expensive. In this paper, we propose a novel scheme to obtain the seemingly contradictory goal of ensembling multiple NMFs without any additional training cost. We train a single NMF algorithm with the cyclical learning rate schedule, which can converge to several local minima along its optimization path. We save the results to the ensemble when the model converges, and then restart the optimization with a large learning rate that can help escape the current local minimum. Based on experiments performed on text corpora using a number of measures to assess, our method can reduce instability at no additional training cost, while simultaneously yields more accurate topic models than traditional single methods and ensemble methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

Article 08 January 2018

How Many Topics? Stability Analysis for Topic Models

Robust Initialization for Learning Latent Dirichlet Allocation

References

Arora S, Ge R, Moitra A (2012) Learning topic models–going beyond svd. In: FOCS, pp 1–10
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Bdiri T, Bouguila N, Ziou D (2016) Variational bayesian inference for infinite generalized inverted dirichlet mixtures with feature selection and its application to clustering. Appl Intell 44(3):507–525
Article Google Scholar
Belford M, Mac Namee B, Greene D (2018) Stability of topic modeling via matrix factorization. Expert Syst Appl 91:159–169
Article Google Scholar
Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific symposium on biocomputing. vol 7, pp 6–17
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL Conference, vol 156
Boutsidis C, Gallopoulos E (2008) Svd based initialization: a head start for nonnegative matrix factorization. Pattern Recogn 41(4):1350–1362
Article Google Scholar
Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169
Article Google Scholar
Chen Z, Liu B (2014) Mining topics in documents: standing on the shoulders of big data. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1116–1125
Cheng X, Yan X, Lan Y, Guo J (2014) Btm: topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941
Article Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Amer Soc Inform Sci 41(6):391
Article Google Scholar
Gao H, Nie F, Heng H (2017) Local centroids structured non-negative matrix factorization. In: AAAI, pp 1905–1911
Garc SA, Ndez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
Greene D, Cagney G, Krogan N, Cunningham P (2008) Ensemble non-negative matrix factorization methods for clustering protein–protein interactions. Bioinformatics 24(15):1722–1728
Article Google Scholar
Greene D, Cunningham P (2005) Producing accurate interpretable clusters from high-dimensional data. In: PKDD. Springer, pp 486–494
Greene D, O’Callaghan D, Cunningham P (2014) How many topics? stability analysis for topic models. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 498–513
Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235
Article Google Scholar
Hadjitodorov ST, Kuncheva LI, Todorova LP (2006) Moderate diversity for better cluster ensembles. Inf Fusion 7(3):264–275
Article Google Scholar
Hang G, Li Y, Pleiss G (2017) Snapshot ensembles: train 1, get m for free. In: ICLR
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57
Hofree M, Shen JP, Carter H, Gross A, Ideker T (2013) Network-based stratification of tumor mutations. Natur Methods 10(11):1108–1115
Article Google Scholar
Kim H, Park H (2008) Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730
Article MathSciNet Google Scholar
Kuang D, Choo J, Park H (2015) Nonnegative matrix factorization for interactive topic modeling and document clustering. In: Partitional clustering algorithms. Springer, pp 215–243
Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist (NRL) 2(1-2):83–97
Article MathSciNet Google Scholar
Kuncheva LI, Vetrov DP (2006) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell 28(11):1798–1808
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401 (6755):788–791
Article Google Scholar
Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779
Article MathSciNet Google Scholar
Loshchilov I, Hutter F (2016) Sgdr: stochastic gradient descent with restarts. arXiv:1608.03983
Minaei-Bidgoli B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: 2004. Proceedings. ITCC 2004. International conference on Information technology: coding and computing, vol 2. IEEE, pp 188–192
Newman D, Bonilla EV, Buntine W (2011) Improving topic coherence with regularized topic models. In: Advances in neural information processing systems, pp 496–504
O’Callaghan D, Greene D, Carthy J, Cunningham P (2015) An analysis of the coherence of descriptors in topic modeling. Expert Syst Appl 42(13):5645–5657
Article Google Scholar
Qiang J, Li Y, Yuan Y, Wu X (2018) Short text clustering based on pitman-yor process mixture model. Applied Intelligence, https://doi.org/10.1007/s10489-017-1055-4
Article Google Scholar
Sandhaus E (2008) The new york times annotated corpus. Linguistic Data Consortium. Philadelphia 6 (12):e26,752
Google Scholar
Smith LN (2015) No more pesky learning rate guessing games. Arxiv June
Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T (2004) Probabilistic author-topic models for information discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 306–315
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
MathSciNet MATH Google Scholar
Suh S, Choo J, Lee J, Reddy CK (2016) L-ensnmf: boosted local topic discovery via ensemble of nonnegative matrix factorization
Wang Z, Gu s, Xu X (2018) Gslda: Lda-based group spamming detection in product reviews. Applied Intelligence, https://doi.org/10.1007/s10489-018-1142-1
Article Google Scholar
Wild S, Curry J, Dougherty A (2004) Improving non-negative matrix factorizations through structured initialization. Pattern Recogn 37(11):2217–2232
Article Google Scholar
Xie P, Yang D, Xing EP (2015) Incorporating word correlation knowledge into topic modeling. In: Conference of the north american chapter of the association for computational linguistics
Zhou X, Ouyang J, Li X (2018) Two time-efficient gibbs sampling inference algorithms for biterm topic model. Appl Intell 48(3):730–754
Article Google Scholar

Download references

Acknowledgements

This research is partially supported by the the National Natural Science Foundation of China under grants (61703362, 61702441, 61402203), Natural Science Foundation of Jiangsu Province of China under grants (BK20170513, BK20161338), the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province of China under grant 17KJB520045, and the Science and Technology Planning Project of Yangzhou of China under grant YZ2016238.

Author information

Authors and Affiliations

Department of Computer Science, Yangzhou University, Yangzhou, China
Jipeng Qiang, Yun Li, Yunhao Yuan & Wei Liu

Authors

Jipeng Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Yun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunhao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jipeng Qiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiang, J., Li, Y., Yuan, Y. et al. Snapshot ensembles of non-negative matrix factorization for stability of topic modeling. Appl Intell 48, 3963–3975 (2018). https://doi.org/10.1007/s10489-018-1192-4

Download citation

Published: 10 May 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10489-018-1192-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

Abstract

Access this article

Similar content being viewed by others

Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

How Many Topics? Stability Analysis for Topic Models

Robust Initialization for Learning Latent Dirichlet Allocation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

Abstract

Access this article

Similar content being viewed by others

Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

How Many Topics? Stability Analysis for Topic Models

Robust Initialization for Learning Latent Dirichlet Allocation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation