Abstract
In many applications, we need to measure similarity between nodes in a large network based on features of their neighborhoods. Although in-network node similarity based on proximity has been well investigated, surprisingly, measuring in-network node similarity based on neighborhoods remains a largely untouched problem in literature. One challenge is that in different applications we may need different measurements that manifest different meanings of similarity. Furthermore, we often want to make trade-offs between specificity of neighborhood matching and efficiency. In this paper, we investigate the problem in a principled and systematic manner. We develop a unified parametric model and a series of four instance measures. Those instance similarity measures not only address a spectrum of various meanings of similarity, but also present a series of trade-offs between computational cost and strictness of matching between neighborhoods of nodes being compared. By extensive experiments and case studies, we demonstrate the effectiveness of the proposed model and its instances.
Similar content being viewed by others
Notes
The code is available at http://web.eecs.umich.edu/ dkoutra/CODE/fabp.zip(FaBP) [17]. Since FaBP is for binary classification and generates a belief of being positive for every node, we ran FaBP for each label in the dataset and the label of an unlabeled node is the label that has the highest belief value.
References
Borgatti SP, Everett MG (1993) Two algorithms for computing regular equivalence. Soc. Netw. 15(4):361–376
Chein M, Mugnier M-L (2008) Graph-based knowledge representation: computational foundations of conceptual graphs. Springer Science & Business Media, Berlin
Deza MM, Deza E (2009) Encyclopedia of distances. Springer, New York
Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 991–1000. ACM
Gärtner T, Flach P, Wrobel S (2003) On graph kernels: hardness results and efficient alternatives. In: Schölkopf B, Warmuth M. (eds) Proceedings of the sixteenth annual conference on computational learning theory and the seventh annual workshop on kernel machines. Lecture notes in computer science, vol 2777. Springer, Heidelberg, pp 129–143
Gilpin S, Eliassi-Rad T, Davidson I (2013) Guided learning for role discovery (glrd): framework, algorithms, and applications. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 113–121. ACM
Gregson RAM (1975) Psychometrics of similarity. Academic, New York
Han J, Wen J-R (2013) Mining frequent neighborhood patterns in a large labeled graph. In: Proceedings of the 22nd ACM international conference on Conference on information and knowledge management, pp 259–268. ACM
Han J, Wen J-R, Pei J (2014) Within-network classification using radius-constrained neighborhood patterns. In: Proceedings of the 23rd ACM international conference on Conference on information and knowledge management. ACM
Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) Rolx: structural role extraction & mining in large graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1231–1239. ACM
Henderson K, Gallagher B, Li L, Akoglu L, Eliassi-Rad T, Tong H, Faloutsos C (2011) It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 663–671. ACM
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543. ACM
Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web, pp 271–279. ACM
Jin R, Lee V.E, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 922–930. ACM
Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. ICML 3:321–328
Kleinberg J (2000) The small-world phenomenon: An algorithmic perspective. In: Proceedings of the thirty-second annual ACM symposium on theory of computing, pp 163–170. ACM
Koutra D, Ke T-Y, Kang U, Chau DH, Pao H-KK, Faloutsos C (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECML PKDD), Greece, Athens, pp 245–260
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C, Ghahramani Z (2010) Kronecker graphs: an approach to modeling networks. J Mach Learn Res 11:985–1042
Lorrain F, White HC (1971) Structural equivalence of individuals in social networks. J Math Sociol 1(1):49–80
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
Shervashidze N, Petri T, Mehlhorn K, Borgwardt KM, Vishwanathan S (2009) Efficient graphlet kernels for large graph comparison. In: International conference on artificial intelligence and statistics, pp 488–495
Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. J Mach Learn Res 12:2539–2561
Sparrow MK (1993) A linear algorithm for computing automorphic equivalence classes: the numerical signatures approach. Soc Netw 15(2):151–170
Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003
Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 797–806. ACM
Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, pp 613–622. IEEE
Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 51(7):2282–2312
Yu W, Lin X, Zhang W, Chang L, Pei J (2013) More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. Proc VLDB Endow 7(1):13–24
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Y., Pei, J. & Al-Barakati, A. Measuring in-network node similarity based on neighborhoods: a unified parametric approach. Knowl Inf Syst 53, 43–70 (2017). https://doi.org/10.1007/s10115-017-1033-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1033-5