Advertisement

Relevance Measure in Large-Scale Heterogeneous Networks

  • Xiaofeng Meng
  • Chuan Shi
  • Yitong Li
  • Lei Zhang
  • Bin Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8709)

Abstract

Recently, there is a surge of heterogeneous information network analysis, where network includes multiple types of objects or links. Many data mining tasks have been studied on it, among which similarity measure is a basic and important function. Several similarity measures have been proposed in heterogeneous information network. However, they suffer from high computation and memory demand. In this paper, we propose a novel measure, called AvgSim, which can measure similarity of same or different-typed object pairs in a uniform framework and has some good properties. AvgSim value of two objects is evaluated through two random walk processes along the given meta-path and the reverse meta-path, respectively. In addition, we implement AvgSim using MapReduce parallel model in order to enable the application in large-scale networks. Experiments on real data sets verify the effectiveness and efficiency of AvgSim.

Keywords

Heterogeneous information network Similarity search Random walk MapReduce 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: EDBT, pp. 565–576 (2009)Google Scholar
  2. 2.
    Kong, X., Yu, P.S., Ding, Y., Wild, D.J.: Meta path-based collective classification in heterogeneous information networks. In: CIKM, pp. 1567–1571 (2012)Google Scholar
  3. 3.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Stanford University Database Group. Technical report (1998)Google Scholar
  4. 4.
    Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)Google Scholar
  5. 5.
    Sun, Y., Han, J., Yan, X., Yu, P., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB, pp. 992–1003 (2011)Google Scholar
  6. 6.
    Lao, N., Cohen, W.: Relational retrieval using a combination of path-constrained random walks. Machine Learning 81(1), 53–67 (2010)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks. In: CoRR, pp.abs/1309.7393 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Xiaofeng Meng
    • 1
  • Chuan Shi
    • 1
  • Yitong Li
    • 1
  • Lei Zhang
    • 1
  • Bin Wu
    • 1
  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations