Advertisement

Learning Latent Topics from the Word Co-occurrence Network

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 768)

Abstract

Topic modeling is widely used to uncover the latent thematic structure in corpora. Based on the separability assumption, the spectral method focuses on the word co-occurrence patterns at the document-level and it includes two steps: anchor selection and topic recovery. Biterm Topic Model (BTM) utilizes the word co-occurrence patterns in the whole corpus. Inspired by the word-pair pattern in BTM, we build a Word Co-occurrence Network (WCN) where nodes correspond to words and weights of edges stand for the empirical co-occurrence probability of word pairs. We exploit existing methods to deal with the word co-occurrence network for anchor selection. We find a K-clique in the unweighted complementary graph, or the maximum edge-weight clique in the weighted complementary graph for the anchor word selection. Experiments on real-world corpora evaluated on topic quality and interpretability demonstrate the effectiveness of the proposed approach.

Keywords

Topic model Word co-occurrence network Maximum edge-weight clique K-clique 

Notes

Acknowledgments

This research work is supported by National Natural Science Foundation of China (61772219, 61472147), US Army Research Office (W911NF-14-1-0477) and Shenzhen Science and Technology Planning Project (JCYJ20170307154749425). We also thank Junru Shao for valuable discussions.

References

  1. 1.
    Alidaee, B., Glover, F., Kochenberger, G., Wang, H.: Solving the maximum edge weight clique problem via unconstrained quadratic programming. European Journal of Operational Research 181(2), 592–597 (2007)CrossRefMATHGoogle Scholar
  2. 2.
    Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. In: ICML, pp. 280–288 (2013)Google Scholar
  3. 3.
    Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: FOCS, pp. 1–10. IEEE, (2012)Google Scholar
  4. 4.
    Bhadury, A., Chen, J., Zhu, J., Liu, S.: Scaling up dynamic topic models. In: WWW, pp. 381–390 (2016)Google Scholar
  5. 5.
    Blei, D.M., Lafferty, J.D.: A correlated topic model of science. In: The Annals of Applied Statistics, pp. 17–35 (2007)Google Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)Google Scholar
  7. 7.
    Chen, J., Zhu, J., Wang, Z., Zheng, X., Zhang, B.: Scalable inference for logistic-normal topic models. In: NIPS, pp. 2445–2453 (2013)Google Scholar
  8. 8.
    Foulds, J., Boyles, L., DuBois, C., Smyth, P., Welling, M.: Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In: KDD, pp. 446–454. ACM, (2013)Google Scholar
  9. 9.
    Gillis, N.: Robustness analysis of Hottopixx, a linear programming model for factoring nonnegative matrices. SIAM Journal on Matrix Analysis and Applications 34(3), 1189–1212 (2013)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Gillis, N., Vavasis, S.A.: Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence 36(4), 698–714 (2014)CrossRefMATHGoogle Scholar
  11. 11.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National academy of Sciences 101(Suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  12. 12.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international conference on Research and development in information retrieval, pp. 50–57. ACM, (1999)Google Scholar
  13. 13.
    Jiang, D., Leung, K.W.T., Ng, W.: Fast topic discovery from web search streams. In: WWW, pp. 949–960. ACM, (2014)Google Scholar
  14. 14.
    Jo, Y., Hopcroft, J.E., Lagoze, C.The web of topics: discovering the topology of topic evolution in a corpus. InWWW, pp. 257–266. ACM, (2011)Google Scholar
  15. 15.
    Kataria, S., Agarwal, A.: Supervised Topic Models for Microblog Classification. In: ICDM, pp. 793–798. IEEE, (2015)Google Scholar
  16. 16.
    Li, A.Q., Ahmed, A., Ravi, S., Smola, A.J.: Reducing the sampling complexity of topic models. In: KDD, pp. 891–900. ACM, (2014)Google Scholar
  17. 17.
    Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International conference on Research and Development in Information Retrieval, pp. 165–174. ACM, (2016)Google Scholar
  18. 18.
    Lin, T., Tian, W., Mei, Q., Cheng, H.: The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp. 539–550. ACM, (2014)Google Scholar
  19. 19.
    Liu, X., Zeng, J., Yang, X., Yan, J., Yang, Q.: Scalable parallel EM algorithms for latent Dirichlet allocation in multi-core systems. In: WWW, pp. 669–679. (2015)Google Scholar
  20. 20.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272. ACL, (2011)Google Scholar
  21. 21.
    Nguyen, T., Hu, Y., Boyd-Graber, J.L.: Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. In: ACL, pp. 359–369 (2014)Google Scholar
  22. 22.
    Palla, G., Dernyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. (2005)Google Scholar
  23. 23.
    Pullan, W.: Approximating the maximum vertex/edge weighted clique using local search. Journal of Heuristics 14(2), 117–134 (2008)CrossRefMATHGoogle Scholar
  24. 24.
    Recht, B., Re, C., Tropp, J., Bittorf, V.: Factoring nonnegative matrices with linear programs. In: NIPS, pp. 1214–1222 (2012)Google Scholar
  25. 25.
    Wang, S., Chen, Z., Fei, G., Liu, B., Emery, S.: Targeted Topic Modeling for Focused Analysis. In: KDD, pp. 1235–1244 (2016)Google Scholar
  26. 26.
    Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: WWW, pp. 1445–1456. ACM, (2013)Google Scholar
  27. 27.
    Yan, X., Guo, J., Liu, S., Cheng, X., Wang, Y.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: Proceedings of the 2013 International Conference on Data Mining, pp. 749–757. SIAM, (2013)Google Scholar
  28. 28.
    Yang, S.H., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: KDD, pp. 1907–1916. ACM, (2014)Google Scholar
  29. 29.
    Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434. ACM, (2015)Google Scholar
  30. 30.
    Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., Xiong, H.: Topic Modeling of Short Texts: A Pseudo-Document View. In: KDD, pp. 2105–2114. ACM, (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Wu Wang
    • 1
    • 3
  • Houquan Zhou
    • 1
  • Kun He
    • 1
    • 2
  • John E. Hopcroft
    • 2
  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.Computer Science DepartmentCornell UniversityIthacaUSA
  3. 3.Shenzhen Research Institute of Huazhong University of Science and TechnologyShenzhenChina

Personalised recommendations