Knowledge and Information Systems

, Volume 44, Issue 3, pp 529–558 | Cite as

Constructing topical hierarchies in heterogeneous information networks

  • Chi WangEmail author
  • Jialu Liu
  • Nihit Desai
  • Marina Danilevsky
  • Jiawei Han
Regular Paper


Many digital documentary data collections (e.g., scientific publications, enterprise reports, news articles, and social media) can be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work, we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high-quality, multi-typed topical hierarchies.


Topic hierarchy Information network Link mining Text mining  Topic modeling 



Research was sponsored in part by the Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), the Army Research Office under Cooperative Agreement No. W911NF-13-1-0193, National Science Foundation IIS-1017362, IIS-1320617, and IIS-1354329, DTRA, and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC. Chi Wang was supported by a Microsoft Research PhD Fellowship. Marina Danilevsky was supported by a National Science Foundation Graduate Research Fellowship Grant NSF DGE 07-15088.


  1. 1.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  2. 2.
    Chang J, Boyd-Graber J, Wang C, Gerrish S, Blei DM (2009) Reading tea leaves: how humans interpret topic models. NIPSGoogle Scholar
  3. 3.
    Chen X, Zhou M, Carin L (2012) The contextual focused topic model. In: KDDGoogle Scholar
  4. 4.
    Chuang SL, Chien LF (2004) A practical web-based approach to generating topic hierarchy for text segments. In: CIKMGoogle Scholar
  5. 5.
    Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: KDDGoogle Scholar
  6. 6.
    Di Caro L, Candan KS, Sapino ML (2008) Using tagflake for condensing navigable tag hierarchies from tag clouds. In: KDDGoogle Scholar
  7. 7.
    Gauch S, Chaffee J, Pretschner A (2003) Ontology-based personalized search and browsing. Web Intell Agent Syst 1(3/4):219–234Google Scholar
  8. 8.
    Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196Google Scholar
  10. 10.
    Kim H, Sun Y, Hockenmaier J, Han J (2012) Etm: Entity topic models for mining documents associated with entities. In: ICDMGoogle Scholar
  11. 11.
    Lawrie D, Croft WB (2000) Discovering and comparing topic hierarchies. In: Proceedings of RIAOGoogle Scholar
  12. 12.
    Li Q, Ji H, Huang L (2013) Joint event extraction via structured prediction with global features. In: ACLGoogle Scholar
  13. 13.
    Liu X, Song Y, Liu S, Wang H (2012) Automatic taxonomy construction from keywords. In: KDDGoogle Scholar
  14. 14.
    Navigli R, Velardi P, Faralli S (2011) A graph-based algorithm for inducing lexical taxonomies from scratch. In: IJCAIGoogle Scholar
  15. 15.
    Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: NAACL-HLTGoogle Scholar
  16. 16.
    Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1):63–72CrossRefGoogle Scholar
  17. 17.
    Snow R, Jurafsky D, Ng AY (2004) Learning syntactic patterns for automatic hypernym discovery. NIPSGoogle Scholar
  18. 18.
    Sun Y, Han J, Gao J, Yu Y (2009a) itopicmodel: information network-integrated topic modeling. In: ICDMGoogle Scholar
  19. 19.
    Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: KDDGoogle Scholar
  20. 20.
    Tang J, Zhang M, Mei Q (2013) One theme in all views: modeling consensus topics in multiple contexts. In: KDDGoogle Scholar
  21. 21.
    Wang C, Danilevsky M, Desai N, Zhang Y, Nguyen P, Taula T, Han J (2013) A phrase mining framework for recursive construction of a topical hierarchy. In: KDDGoogle Scholar
  22. 22.
    Wong W, Liu W, Bennamoun M (2012) Ontology learning from text: a look back and into the future. ACM Comput Surv (CSUR) 44(4):20CrossRefGoogle Scholar
  23. 23.
    Zavitsanos E, Paliouras G, Vouros GA, Petridis S (2007) Discovering subsumption hierarchies of ontology concepts from text corpora. In: Proceedings of IEEE/WIC/ACM international conference on web intelligenceGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Chi Wang
    • 1
    Email author
  • Jialu Liu
    • 1
  • Nihit Desai
    • 1
  • Marina Danilevsky
    • 1
  • Jiawei Han
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations