Skip to main content
Log in

Exploring heterogeneous information networks and random walk with restart for academic search

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we explore heterogenous information networks in which each vertex represents one entity and the edges reflect linkage relationships. Heterogenous information networks contain vertices of several entity types, such as papers, authors and terms, and hence can fully reflect multiple linkage relationships among different entities. Such a heterogeneous information network is similar to a mixed media graph (MMG). By representing a bibliographic dataset as an MMG, the performance obtained when searching relevant entities (e.g., papers) can be improved. Furthermore, our academic search enables multiple-entity search, where a variety of entity search results are provided, such as relevant papers, authors and conferences, via a one-time query. Explicitly, given a bibliographic dataset, we propose a Global-MMG, in which a global heterogeneous information network is built. When a user submits a query keyword, we perform a random walk with restart (RWR) to retrieve papers or other types of entity objects. To reduce the query response time, algorithm Net-MMG (standing for NetClus-based MMG) is developed. Algorithm Net-MMG first divides a heterogeneous information network into a collection of sub-networks. Afterward, the Net-MMG performs a RWR on a set of selected relevant sub-networks. We implemented our academic search and conducted extensive experiments using the ACM Digital Library. The experimental results show that by exploring heterogeneous information networks and RWR, both the Global-MMG and Net-MMG achieve better search quality compared with existing academic search services. In addition, the Net-MMG has a shorter query response time while still guaranteeing good quality in search results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749

    Article  Google Scholar 

  2. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM press, New York

    Google Scholar 

  3. Bharat K, Kamba T, Albers M (1998) Personalized, interactive news on the web. Multimed Syst 6(5): 349–358

    Google Scholar 

  4. Breese JS, Heckerman D, Kadie C et al. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of uncertainty in artificial intelligence, pp 43–52

  5. Cheng H, Tan PN, Sticklen J, Punch WF (2007) Recommendation via query centered random walk on K-partite graph. In: Proceedings of IEEE computer society international conference on data mining, pp 457–462

  6. Cui J, Liu H, He J, Li P, Du X, Wang P (2011) Tagclus: a random walk-based method for tag clustering. Knowl Inform Syst 27(2):193–225

    Article  MATH  Google Scholar 

  7. Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos

    MATH  Google Scholar 

  8. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos

    Google Scholar 

  9. http://academic.research.microsoft.com

  10. http://arnetminer.org

  11. http://scholar.google.com

  12. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of SIGKDD. ACM, New York, NY, pp 538–543

  13. Jiawei H, Jian P, Yiwen Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp 1–12

  14. Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inform Syst 27(2):303–325

    Google Scholar 

  15. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  16. Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40(3):87

    Article  Google Scholar 

  17. Konstas I, Stathopoulos V, Jose Joemon M (2009) On social networks and collaborative recommendation. In: Procedings of SIGIR, pp 195–202

  18. Liu B (2007) Web data mining: exploring hyperlinks, contents, and usage data. Springer, Berlin

    MATH  Google Scholar 

  19. Liu NN, Yang Q (2008) Eigenrank: a ranking-oriented approach to collaborative filtering. In: Proceedings of SIGIR. ACM, New York, pp 83–90

  20. Liu X, Bollen J, Nelson ML, Van de Sompel H (2005) Co-authorship networks in the digital library research community. Inform Process Manag 41(6):1462–1480

    Article  Google Scholar 

  21. Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: Proceedings of SIGKDD. ACM, New York, p 326

  22. Page L, Brin S, Motwani R, Winograd T (1998) Bringing order to the web. The pagerank citation ranking.

  23. Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of SIGKDD, pp 653–658

  24. Peng W, Li T (2011) Temporal relation co-clustering on directional social network and author-topic evolution. Knowl Inform Syst 26(3):467–486

    Google Scholar 

  25. Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of WWW. ACM, New York, p 295

  26. Silberschatz A, Korth HF, Sudarshan S (2002) Database system concepts. McGraw-Hill, New York

    Google Scholar 

  27. Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th EDBT. ACM, New York, pp 565–576

  28. Sun Y, Wu T, Yin Z, Cheng H, Han J, Yin X, Zhao P (2008) BibNetMiner: mining bibliographic information networks. In: Proceedings of SIGMOD. ACM, New York, pp 1341–1344

  29. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of SIGKDD. ACM, New York, pp 797–806

  30. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of SIGKDD. ACM, New York, pp 990–998

  31. Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of ICDM, pp 613–622

  32. Tong H, Papadimitriou S, Yu PS, Faloutsos C (2008) Proximity tracking on time-evolving bipartite graphs. In Proceedings of SIAM. Citeseer, pp 704–715

  33. Wang JL (2008) Academic literature search based on collaborative recommendation by authors. Master’s thesis, National Chengchi University

  34. Wang X, Sun J-T, Chen Z (2007) Shine: search heterogeneous interrelated entities. In: Proceedings of CIKM, pp 583–592

  35. Zhou D, Orshanskiy SA, Zha H, Lee GC (2007) Co-ranking authors and documents in a heterogeneous network. In Proceedings of ICDM. IEEE Computer Society, pp 739–744

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Chih Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiang, MF., Liou, JJ., Wang, JL. et al. Exploring heterogeneous information networks and random walk with restart for academic search. Knowl Inf Syst 36, 59–82 (2013). https://doi.org/10.1007/s10115-012-0523-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0523-8

Keywords

Navigation