Probabilistic Topic Modelling with Semantic Graph

  • Long ChenEmail author
  • Joemon M. Jose
  • Haitao Yu
  • Fajie Yuan
  • Huaizhi Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


In this paper we propose a novel framework, topic model with semantic graph (TMSG), which couples topic model with the rich knowledge from DBpedia. To begin with, we extract the disambiguated entities from the document collection using a document entity linking system, i.e., DBpedia Spotlight, from which two types of entity graphs are created from DBpedia to capture local and global contextual knowledge, respectively. Given the semantic graph representation of the documents, we propagate the inherent topic-document distribution with the disambiguated entities of the semantic graphs. Experiments conducted on two real-world datasets show that TMSG can significantly outperform the state-of-the-art techniques, namely, author-topic Model (ATM) and topic model with biased propagation (TMBP).


Topic model Semantic graph DBpedia 



We thank the anonymous reviewer for their helpful comments. We acknowledge support from the EPSRC funded project named A Situation Aware Information Infrastructure Project (EP/L026015) and the Integrated Multimedia City Data (IMCD), a project within the ESRC-funded Urban Big Data Centre (ES/L011921/1). This work was also partly supported by NSF grant #61572223. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the sponsor.


  1. 1.
    Bao, Y., Collier, N., Datta, A.: A partially supervised cross-collection topic model for cross-domain text classification. In: CIKM 2013, pp. 239–248 (2013)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. 3, 459–565Google Scholar
  3. 3.
    Cai, L., Zhou, G., Liu, K., Zhao, J.: Large-scale question classification in cqa by leveraging wikipedia semantic knowledge. In: CIKM 2011, pp. 1321–1330 (2011)Google Scholar
  4. 4.
    Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: KDD 2012, pp. 96–104 (2012)Google Scholar
  5. 5.
    Deng, H., Han, J., Zhao, B., Yintao, Y., Lin, C.X.: Probabilistic topic models with biased propagation on heterogeneous information networks. In: KDD 2011, pp. 1271–1279 (2011)Google Scholar
  6. 6.
    Guo, W., Diab, M.: Semantic topic models: Combining word distributional statistics and dictionary definitions. In: EMNLP 2011, pp. 552–561 (2011)Google Scholar
  7. 7.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 45, 256–269Google Scholar
  8. 8.
    Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: KDD 2011, pp. 832–840 (2011)Google Scholar
  9. 9.
    Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. WSDM 2013, pp. 465–474 (2013)Google Scholar
  10. 10.
    Kim, H., Sun, Y., Hockenmaier, J., Han, J.: Etm: Entity topic models for mining documents associated with entities. In: ICDM 2012, pp. 349–358 (2012)Google Scholar
  11. 11.
    Li, F., He, T., Xinhui, T., Xiaohua, H.: Incorporating word correlation into tag-topic model for semantic knowledge acquisition. In: CIKM 2012, pp. 1622–1626 (2012)Google Scholar
  12. 12.
    Li, H., Li, Z., Lee, W.-C., Lee, D.L.: A probabilistic topic-based ranking framework for location-sensitive domain information retrieval. In: SIGIR 2009, pp. 331–338 (2009)Google Scholar
  13. 13.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: WWW 2008, pp. 342–351 (2008)Google Scholar
  14. 14.
    Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: WSDM 2014, pp. 543–552 (2014)Google Scholar
  15. 15.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Zhong, S.: Arnetminer: extraction and mining of academic social networks. In: KDD 2008, pp. 428–437 (2008)Google Scholar
  16. 16.
    Xing Wei, W., Croft, B.: Lda-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 326–335 (2009)Google Scholar
  17. 17.
    Wei, X., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: SIGIR 2003, pp. 267–273 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Long Chen
    • 1
    Email author
  • Joemon M. Jose
    • 1
  • Haitao Yu
    • 1
  • Fajie Yuan
    • 1
  • Huaizhi Zhang
    • 1
  1. 1.School of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations