Skip to main content
Log in

Matching patterns in networks with multi-dimensional attributes: a machine learning approach

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Assortative matching is a network phenomenon that arises when nodes exhibit a bias towards connections to others of similar characteristics. While mixing patterns in networks have been studied in the literature, and there are well-defined metrics that capture the degree of assortativity (e.g., assortativity coefficient), the latter deal only with single-dimensional enumerative or scalar features. Nevertheless, various complex behaviors of network entities—e.g., human behaviors in social networks—are captured through vector attributes. To date, no formal metric able to cope with similar situations has been defined. In this paper, we propose a novel, two-step process that extends the applicability of the assortativity coefficient to multi-dimensional attributes. In brief, we first apply clustering of the vertices on their vector characteristic. After clustering is completed, each network node is assigned a cluster label, which is an enumerative characteristic and we can compute the assortativity coefficient on the cluster labels. We further compare this method with an alternative baseline, which is an immediate extension of the assortativity coefficient, namely, the assortativity vector. The latter treats each element of the node’s attribute vector separately and then combines the independent results in a single value. Finally, we apply our method and the baseline on two different social network datasets. We also use synthetic network data to delve into the details of each metric/method. Our findings indicate that while the baseline of assortativity vector performs satisfactory when the variance of the elements of the vector attribute across the network population is kept low, it provides biased results as this variance increases. On the contrary, our approach appears to be robust in such scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We will overview these metrics in Sect. 2.

  2. Features 1, 2, 3 and 4 correspond to smoking, drug use, alcohol use and sporting activity, respectively.

  3. Edges between vertices of the same type are never generated.

References

  • Bearman P, Moddy J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110:44–91

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning, Information science and statistics. Springer, New York. ISBN:978-0387310732, http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738

  • Calinski R, Harabasz J (1974) A dendrite method for clustering analysis. Commun Stat 3:1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Catania J, Coates T, Kegelsa S, Fullilove M (1992) The population-based amen (aids in multi-ethnic neighborhoods) study. Am J Public Health 82:284–287

    Article  Google Scholar 

  • Cho E, Myers SA, LeskovecJ (2011) Friendship and mobility: friendship and mobility: user movement in location-based social networks. In: ACM KDD, pp 279–311

  • Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: ACM SIGKDD

  • Erdös P, Rènyi A (1959) On random graphs. Publ Math 6:290–297

    MATH  Google Scholar 

  • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: ACM KDD

  • Feld S (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035

    Article  Google Scholar 

  • Foster JG, Foster DV, Grassberger P, Paczuski M (2010) Edge direction and the structure of networks. Proc Natl Acad Sci 107(24):10815–10820. doi:10.1073/pnas.0912671107

    Google Scholar 

  • Frey B, Dueck D (February 2007) Clustering by passing messages between data points. Science 315:972–976

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008) ergm: a package to fit, simulate and diagnose exponential-family models for networks. J Stat Softw 24(3):nihpa54860. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743438/

  • Kim M, Leskovec J (2012) Latent multi-group membership graph model. In: ICML

  • Lauw H, Shafer J, Agrawal R, Ntoulas A (2010) Homophily in the digital world: a livejournal case study. In: IEEE Internet Computing

  • Luxburg U (1997) A tutorial on spectral clustering. In: Technical report 149, Max Planc Institute for biological cybernetics

  • McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  • Moddy J (2001) Race, school integration, and friendship segregation in America. Am J Sociol 107:679–716

    Article  Google Scholar 

  • Middle School Friendship Dataset (2000) http://www.casos.cs.cmu.edu/computational_tools/datasets/external/50women/index11.php. Accessed 9 Apr 2014

  • National Survey of Family Growth, cycle v, 1995 (1997) US Department of Health and Human Sevices, National Center for Health Statistics, Hyattsville

  • Newman M (2002) Mixing patterns in networks. arXiv:cond-mat/0209450v2[cond-mat.stat-mech]

  • Newman M (2010) Networks: an introduction. Oxford University Press, New York. ISBN:978-0199206650, http://www.amazon.com/Networks-An-Introduction-Mark-Newman/dp/0199206651

  • Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Article  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS

  • Pearson M, Michell L (2000) Smoke rings: social network analysis of friendship groups, smoking and drug-taking. Drugs: Edu Prev Policy 7:21–37

    Google Scholar 

  • Scellato S, Noulas A, Mascolo C (2011) Exploiting place features in link prediction on location-based social networks. In: ACM KDD

  • Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Conference on computer vision and pattern recognition

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63 (Part 2):411–423

    Google Scholar 

  • Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: SIAM SDM

  • Zhao K, Ngamassi L, Yen J, Maitland C, Tapia A (2010) Assortativity patterns in multi-dimensional inter-organizational networks: a case study of the humanitarian relief sector. In: SBP

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantinos Pelechrinis.

Appendix: Toy example for biases introduced by assortativity vector

Appendix: Toy example for biases introduced by assortativity vector

Consider a toy-network in which each node is described by one of the vectors \({\bf u}_z=(0,0)\), \({\bf u}_w=(0,1)\), \({\bf u}_t=(1,0)\) and \({\bf u}_q=(1,1)\). For ease of presentation let us assume that we have \(p\) nodes of each type, and that \(z\)-vertices are connected with \(w\)-vertices and \(t\)-vertices with \(q\)-vertices. If we calculate the assortativity vector for this network is \({\bf r}=(1,-1).\) Hence, we get \(r^{\rm mean}=0.\) The latter implies that associations between vertices in this network are made at random, regardless of their vector attribute. However, in the 2-dimensional space the vectors that describe the nodes of the network can form four distinct classes (each of which contains \(p\) vertices) and clearly all the connections in this network are among vertices belonging to different groups (i.e., dissimilar nodes). Even if connections were made at random, one might have expected approximately 25 % of the edges to connect vertices with the same vector attribute, which clearly it is not the case in our toy-example. Hence, \(r^{\rm mean}\) for this network with regard to \({\bf u}\) should be negative to capture the underlying mixing patterns. Of course, as mentioned in Section negative mixing is closer to random mixing (compared to positive mixing), but still this example illustrates possible biases introduced by considering elements of u independently.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pelechrinis, K. Matching patterns in networks with multi-dimensional attributes: a machine learning approach. Soc. Netw. Anal. Min. 4, 188 (2014). https://doi.org/10.1007/s13278-014-0188-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0188-2

Keywords

Navigation