Advertisement

Entity-Centric Topic Extraction and Exploration: A Network-Based Approach

  • Andreas Spitz
  • Michael Gertz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)

Abstract

Topic modeling is an important tool in the analysis of corpora and the classification and clustering of documents. Various extensions of the underlying graphical models have been proposed to address hierarchical or dynamical topics. However, despite their popularity, topic models face problems in the exploration and correlation of the (often unknown number of) topics extracted from a document collection, and rely on compute-intensive graphical models. In this paper, we present a novel framework for exploring evolving corpora of news articles in terms of topics covered over time. Our approach is based on implicit networks representing the cooccurrences of entities and terms in the documents as weighted edges. Edges with high weight between entities are indicative of topics, allowing the context of a topic to be explored incrementally by growing network sub-structures. Since the exploration of topics corresponds to local operations in the network, it is efficient and interactive. Adding new news articles to the collection simply updates the network, thus avoiding expensive recomputations of term and topic distributions.

Keywords

Networks Topic models Evolving networks 

Notes

Acknowledgements

We would like to thank the Ambiverse Ambinauts for kindly providing access to their named entity linking and disambiguation API.

References

  1. 1.
    Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retr. 15(1), 54–92 (2012)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)CrossRefGoogle Scholar
  3. 3.
    Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: NIPS (2009)Google Scholar
  6. 6.
    Evert, S.: The statistics of word cooccurrences: word pairs and collocations. Ph.D. thesis, University of Stuttgart, Germany (2005)Google Scholar
  7. 7.
    Gretarsson, B., O’Donovan, J., Bostandjiev, S., Höllerer, T., Asuncion, A., Newman, D., Smyth, P.: TopicNets: visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3(2), 23:1–23:26 (2012)CrossRefGoogle Scholar
  8. 8.
    Gries, S.T.: 50-something years of work on collocations. Int. J. Corpus Linguist. 18(1), 137–166 (2013)CrossRefGoogle Scholar
  9. 9.
    Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP (2012)Google Scholar
  10. 10.
    Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: KDD (2011)Google Scholar
  11. 11.
    Hu, Y., Boyd-Graber, J., Satinoff, B., Smith, A.: Interactive topic modeling. Mach. Learn. 95(3), 423–469 (2014)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Newman, D., Chemudugunta, C., Smyth, P.: Statistical entity-topic models. In: KDD (2006)Google Scholar
  13. 13.
    Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: CIKM (2013)Google Scholar
  14. 14.
    Sarma, A.D., Jain, A., Yu, C.: Dynamic relationship and event discovery. In: WSDM (2011)Google Scholar
  15. 15.
    Shi, B., Lam, W., Jameel, S., Schockaert, S., Lai, K.P.: Jointly learning word embeddings and latent topics. In: SIGIR (2017)Google Scholar
  16. 16.
    Spitz, A., Almasian, S., Gertz, M.: EVELIN: exploration of event and entity links in implicit networks. In: WWW Companion (2017)Google Scholar
  17. 17.
    Spitz, A., Dixit, V., Richter, L., Gertz, M., Geiss, J.: State of the union: a data consumer’s perspective on Wikidata and its properties for the classification and resolution of entities. In: Wikipedia Workshop at ICWSM (2016)Google Scholar
  18. 18.
    Spitz, A., Gertz, M.: Terms over LOAD: leveraging named entities for cross-document extraction and summarization of events. In: SIGIR (2016)Google Scholar
  19. 19.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)Google Scholar
  20. 20.
    Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Heidelberg UniversityHeidelbergGermany

Personalised recommendations