Skip to main content
Log in

Measuring in-network node similarity based on neighborhoods: a unified parametric approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many applications, we need to measure similarity between nodes in a large network based on features of their neighborhoods. Although in-network node similarity based on proximity has been well investigated, surprisingly, measuring in-network node similarity based on neighborhoods remains a largely untouched problem in literature. One challenge is that in different applications we may need different measurements that manifest different meanings of similarity. Furthermore, we often want to make trade-offs between specificity of neighborhood matching and efficiency. In this paper, we investigate the problem in a principled and systematic manner. We develop a unified parametric model and a series of four instance measures. Those instance similarity measures not only address a spectrum of various meanings of similarity, but also present a series of trade-offs between computational cost and strictness of matching between neighborhoods of nodes being compared. By extensive experiments and case studies, we demonstrate the effectiveness of the proposed model and its instances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.iam.unibe.ch/fki/databases/iam-graph-database.

  2. The code is available at http://web.eecs.umich.edu/ dkoutra/CODE/fabp.zip(FaBP) [17]. Since FaBP is for binary classification and generates a belief of being positive for every node, we ran FaBP for each label in the dataset and the label of an unlabeled node is the label that has the highest belief value.

References

  1. Borgatti SP, Everett MG (1993) Two algorithms for computing regular equivalence. Soc. Netw. 15(4):361–376

    Article  Google Scholar 

  2. Chein M, Mugnier M-L (2008) Graph-based knowledge representation: computational foundations of conceptual graphs. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  3. Deza MM, Deza E (2009) Encyclopedia of distances. Springer, New York

    Book  MATH  Google Scholar 

  4. Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 991–1000. ACM

  5. Gärtner T, Flach P, Wrobel S (2003) On graph kernels: hardness results and efficient alternatives. In: Schölkopf B, Warmuth M. (eds) Proceedings of the sixteenth annual conference on computational learning theory and the seventh annual workshop on kernel machines. Lecture notes in computer science, vol 2777. Springer, Heidelberg, pp 129–143

  6. Gilpin S, Eliassi-Rad T, Davidson I (2013) Guided learning for role discovery (glrd): framework, algorithms, and applications. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 113–121. ACM

  7. Gregson RAM (1975) Psychometrics of similarity. Academic, New York

    Google Scholar 

  8. Han J, Wen J-R (2013) Mining frequent neighborhood patterns in a large labeled graph. In: Proceedings of the 22nd ACM international conference on Conference on information and knowledge management, pp 259–268. ACM

  9. Han J, Wen J-R, Pei J (2014) Within-network classification using radius-constrained neighborhood patterns. In: Proceedings of the 23rd ACM international conference on Conference on information and knowledge management. ACM

  10. Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) Rolx: structural role extraction & mining in large graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1231–1239. ACM

  11. Henderson K, Gallagher B, Li L, Akoglu L, Eliassi-Rad T, Tong H, Faloutsos C (2011) It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 663–671. ACM

  12. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543. ACM

  13. Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web, pp 271–279. ACM

  14. Jin R, Lee V.E, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 922–930. ACM

  15. Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. ICML 3:321–328

    Google Scholar 

  16. Kleinberg J (2000) The small-world phenomenon: An algorithmic perspective. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, pp 163–170. ACM

  17. Koutra D, Ke T-Y, Kang U, Chau DH, Pao H-KK, Faloutsos C (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Greece, Athens, pp 245–260

  18. Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C, Ghahramani Z (2010) Kronecker graphs: an approach to modeling networks. J Mach Learn Res 11:985–1042

    MathSciNet  MATH  Google Scholar 

  19. Lorrain F, White HC (1971) Structural equivalence of individuals in social networks. J Math Sociol 1(1):49–80

    Article  Google Scholar 

  20. Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104

    Article  MathSciNet  Google Scholar 

  21. Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856

    Google Scholar 

  22. Shervashidze N, Petri T, Mehlhorn K, Borgwardt KM, Vishwanathan S (2009) Efficient graphlet kernels for large graph comparison. In: International conference on artificial intelligence and statistics, pp 488–495

  23. Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561

    MathSciNet  MATH  Google Scholar 

  24. Sparrow MK (1993) A linear algorithm for computing automorphic equivalence classes: the numerical signatures approach. Soc Netw 15(2):151–170

    Article  MathSciNet  Google Scholar 

  25. Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003

  26. Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 797–806. ACM

  27. Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, pp 613–622. IEEE

  28. Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 51(7):2282–2312

    Article  MathSciNet  MATH  Google Scholar 

  29. Yu W, Lin X, Zhang W, Chang L, Pei J (2013) More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. Proc VLDB Endow 7(1):13–24

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Pei, J. & Al-Barakati, A. Measuring in-network node similarity based on neighborhoods: a unified parametric approach. Knowl Inf Syst 53, 43–70 (2017). https://doi.org/10.1007/s10115-017-1033-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1033-5

Keywords

Navigation