Abstract
Assortative matching is a network phenomenon that arises when nodes exhibit a bias towards connections to others of similar characteristics. While mixing patterns in networks have been studied in the literature, and there are well-defined metrics that capture the degree of assortativity (e.g., assortativity coefficient), the latter deal only with single-dimensional enumerative or scalar features. Nevertheless, various complex behaviors of network entities—e.g., human behaviors in social networks—are captured through vector attributes. To date, no formal metric able to cope with similar situations has been defined. In this paper, we propose a novel, two-step process that extends the applicability of the assortativity coefficient to multi-dimensional attributes. In brief, we first apply clustering of the vertices on their vector characteristic. After clustering is completed, each network node is assigned a cluster label, which is an enumerative characteristic and we can compute the assortativity coefficient on the cluster labels. We further compare this method with an alternative baseline, which is an immediate extension of the assortativity coefficient, namely, the assortativity vector. The latter treats each element of the node’s attribute vector separately and then combines the independent results in a single value. Finally, we apply our method and the baseline on two different social network datasets. We also use synthetic network data to delve into the details of each metric/method. Our findings indicate that while the baseline of assortativity vector performs satisfactory when the variance of the elements of the vector attribute across the network population is kept low, it provides biased results as this variance increases. On the contrary, our approach appears to be robust in such scenarios.
Similar content being viewed by others
Notes
We will overview these metrics in Sect. 2.
Features 1, 2, 3 and 4 correspond to smoking, drug use, alcohol use and sporting activity, respectively.
Edges between vertices of the same type are never generated.
References
Bearman P, Moddy J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110:44–91
Bishop CM (2006) Pattern recognition and machine learning, Information science and statistics. Springer, New York. ISBN:978-0387310732, http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738
Calinski R, Harabasz J (1974) A dendrite method for clustering analysis. Commun Stat 3:1–27
Catania J, Coates T, Kegelsa S, Fullilove M (1992) The population-based amen (aids in multi-ethnic neighborhoods) study. Am J Public Health 82:284–287
Cho E, Myers SA, LeskovecJ (2011) Friendship and mobility: friendship and mobility: user movement in location-based social networks. In: ACM KDD, pp 279–311
Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: ACM SIGKDD
Erdös P, Rènyi A (1959) On random graphs. Publ Math 6:290–297
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: ACM KDD
Feld S (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035
Foster JG, Foster DV, Grassberger P, Paczuski M (2010) Edge direction and the structure of networks. Proc Natl Acad Sci 107(24):10815–10820. doi:10.1073/pnas.0912671107
Frey B, Dueck D (February 2007) Clustering by passing messages between data points. Science 315:972–976
Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008) ergm: a package to fit, simulate and diagnose exponential-family models for networks. J Stat Softw 24(3):nihpa54860. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743438/
Kim M, Leskovec J (2012) Latent multi-group membership graph model. In: ICML
Lauw H, Shafer J, Agrawal R, Ntoulas A (2010) Homophily in the digital world: a livejournal case study. In: IEEE Internet Computing
Luxburg U (1997) A tutorial on spectral clustering. In: Technical report 149, Max Planc Institute for biological cybernetics
McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Moddy J (2001) Race, school integration, and friendship segregation in America. Am J Sociol 107:679–716
Middle School Friendship Dataset (2000) http://www.casos.cs.cmu.edu/computational_tools/datasets/external/50women/index11.php. Accessed 9 Apr 2014
National Survey of Family Growth, cycle v, 1995 (1997) US Department of Health and Human Sevices, National Center for Health Statistics, Hyattsville
Newman M (2002) Mixing patterns in networks. arXiv:cond-mat/0209450v2[cond-mat.stat-mech]
Newman M (2010) Networks: an introduction. Oxford University Press, New York. ISBN:978-0199206650, http://www.amazon.com/Networks-An-Introduction-Mark-Newman/dp/0199206651
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS
Pearson M, Michell L (2000) Smoke rings: social network analysis of friendship groups, smoking and drug-taking. Drugs: Edu Prev Policy 7:21–37
Scellato S, Noulas A, Mascolo C (2011) Exploiting place features in link prediction on location-based social networks. In: ACM KDD
Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Conference on computer vision and pattern recognition
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63 (Part 2):411–423
Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: SIAM SDM
Zhao K, Ngamassi L, Yen J, Maitland C, Tapia A (2010) Assortativity patterns in multi-dimensional inter-organizational networks: a case study of the humanitarian relief sector. In: SBP
Author information
Authors and Affiliations
Corresponding author
Appendix: Toy example for biases introduced by assortativity vector
Appendix: Toy example for biases introduced by assortativity vector
Consider a toy-network in which each node is described by one of the vectors \({\bf u}_z=(0,0)\), \({\bf u}_w=(0,1)\), \({\bf u}_t=(1,0)\) and \({\bf u}_q=(1,1)\). For ease of presentation let us assume that we have \(p\) nodes of each type, and that \(z\)-vertices are connected with \(w\)-vertices and \(t\)-vertices with \(q\)-vertices. If we calculate the assortativity vector for this network is \({\bf r}=(1,-1).\) Hence, we get \(r^{\rm mean}=0.\) The latter implies that associations between vertices in this network are made at random, regardless of their vector attribute. However, in the 2-dimensional space the vectors that describe the nodes of the network can form four distinct classes (each of which contains \(p\) vertices) and clearly all the connections in this network are among vertices belonging to different groups (i.e., dissimilar nodes). Even if connections were made at random, one might have expected approximately 25 % of the edges to connect vertices with the same vector attribute, which clearly it is not the case in our toy-example. Hence, \(r^{\rm mean}\) for this network with regard to \({\bf u}\) should be negative to capture the underlying mixing patterns. Of course, as mentioned in Section negative mixing is closer to random mixing (compared to positive mixing), but still this example illustrates possible biases introduced by considering elements of u independently.
Rights and permissions
About this article
Cite this article
Pelechrinis, K. Matching patterns in networks with multi-dimensional attributes: a machine learning approach. Soc. Netw. Anal. Min. 4, 188 (2014). https://doi.org/10.1007/s13278-014-0188-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0188-2