Matching patterns in networks with multi-dimensional attributes: a machine learning approach

Original Article

Abstract

Assortative matching is a network phenomenon that arises when nodes exhibit a bias towards connections to others of similar characteristics. While mixing patterns in networks have been studied in the literature, and there are well-defined metrics that capture the degree of assortativity (e.g., assortativity coefficient), the latter deal only with single-dimensional enumerative or scalar features. Nevertheless, various complex behaviors of network entities—e.g., human behaviors in social networks—are captured through vector attributes. To date, no formal metric able to cope with similar situations has been defined. In this paper, we propose a novel, two-step process that extends the applicability of the assortativity coefficient to multi-dimensional attributes. In brief, we first apply clustering of the vertices on their vector characteristic. After clustering is completed, each network node is assigned a cluster label, which is an enumerative characteristic and we can compute the assortativity coefficient on the cluster labels. We further compare this method with an alternative baseline, which is an immediate extension of the assortativity coefficient, namely, the assortativity vector. The latter treats each element of the node’s attribute vector separately and then combines the independent results in a single value. Finally, we apply our method and the baseline on two different social network datasets. We also use synthetic network data to delve into the details of each metric/method. Our findings indicate that while the baseline of assortativity vector performs satisfactory when the variance of the elements of the vector attribute across the network population is kept low, it provides biased results as this variance increases. On the contrary, our approach appears to be robust in such scenarios.

References

  1. Bearman P, Moddy J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110:44–91CrossRefGoogle Scholar
  2. Bishop CM (2006) Pattern recognition and machine learning, Information science and statistics. Springer, New York. ISBN:978-0387310732, http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738
  3. Calinski R, Harabasz J (1974) A dendrite method for clustering analysis. Commun Stat 3:1–27CrossRefMathSciNetMATHGoogle Scholar
  4. Catania J, Coates T, Kegelsa S, Fullilove M (1992) The population-based amen (aids in multi-ethnic neighborhoods) study. Am J Public Health 82:284–287CrossRefGoogle Scholar
  5. Cho E, Myers SA, LeskovecJ (2011) Friendship and mobility: friendship and mobility: user movement in location-based social networks. In: ACM KDD, pp 279–311Google Scholar
  6. Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: ACM SIGKDDGoogle Scholar
  7. Erdös P, Rènyi A (1959) On random graphs. Publ Math 6:290–297MATHGoogle Scholar
  8. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: ACM KDDGoogle Scholar
  9. Feld S (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035CrossRefGoogle Scholar
  10. Foster JG, Foster DV, Grassberger P, Paczuski M (2010) Edge direction and the structure of networks. Proc Natl Acad Sci 107(24):10815–10820. doi:10.1073/pnas.0912671107 Google Scholar
  11. Frey B, Dueck D (February 2007) Clustering by passing messages between data points. Science 315:972–976CrossRefMathSciNetMATHGoogle Scholar
  12. Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008) ergm: a package to fit, simulate and diagnose exponential-family models for networks. J Stat Softw 24(3):nihpa54860. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743438/
  13. Kim M, Leskovec J (2012) Latent multi-group membership graph model. In: ICMLGoogle Scholar
  14. Lauw H, Shafer J, Agrawal R, Ntoulas A (2010) Homophily in the digital world: a livejournal case study. In: IEEE Internet ComputingGoogle Scholar
  15. Luxburg U (1997) A tutorial on spectral clustering. In: Technical report 149, Max Planc Institute for biological cyberneticsGoogle Scholar
  16. McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRefGoogle Scholar
  17. Moddy J (2001) Race, school integration, and friendship segregation in America. Am J Sociol 107:679–716CrossRefGoogle Scholar
  18. Middle School Friendship Dataset (2000) http://www.casos.cs.cmu.edu/computational_tools/datasets/external/50women/index11.php. Accessed 9 Apr 2014
  19. National Survey of Family Growth, cycle v, 1995 (1997) US Department of Health and Human Sevices, National Center for Health Statistics, HyattsvilleGoogle Scholar
  20. Newman M (2002) Mixing patterns in networks. arXiv:cond-mat/0209450v2[cond-mat.stat-mech]Google Scholar
  21. Newman M (2010) Networks: an introduction. Oxford University Press, New York. ISBN:978-0199206650, http://www.amazon.com/Networks-An-Introduction-Mark-Newman/dp/0199206651
  22. Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRefGoogle Scholar
  23. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPSGoogle Scholar
  24. Pearson M, Michell L (2000) Smoke rings: social network analysis of friendship groups, smoking and drug-taking. Drugs: Edu Prev Policy 7:21–37Google Scholar
  25. Scellato S, Noulas A, Mascolo C (2011) Exploiting place features in link prediction on location-based social networks. In: ACM KDDGoogle Scholar
  26. Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Conference on computer vision and pattern recognitionGoogle Scholar
  27. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63 (Part 2):411–423Google Scholar
  28. Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: SIAM SDMGoogle Scholar
  29. Zhao K, Ngamassi L, Yen J, Maitland C, Tapia A (2010) Assortativity patterns in multi-dimensional inter-organizational networks: a case study of the humanitarian relief sector. In: SBPGoogle Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  1. 1.School of Information SciencesUniversity of PittsburghPittsburghUSA

Personalised recommendations