Matching patterns in networks with multi-dimensional attributes: a machine learning approach

Pelechrinis, Konstantinos

doi:10.1007/s13278-014-0188-2

Matching patterns in networks with multi-dimensional attributes: a machine learning approach

Original Article
Published: 12 April 2014

Volume 4, article number 188, (2014)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Konstantinos Pelechrinis¹

244 Accesses
2 Citations
Explore all metrics

Abstract

Assortative matching is a network phenomenon that arises when nodes exhibit a bias towards connections to others of similar characteristics. While mixing patterns in networks have been studied in the literature, and there are well-defined metrics that capture the degree of assortativity (e.g., assortativity coefficient), the latter deal only with single-dimensional enumerative or scalar features. Nevertheless, various complex behaviors of network entities—e.g., human behaviors in social networks—are captured through vector attributes. To date, no formal metric able to cope with similar situations has been defined. In this paper, we propose a novel, two-step process that extends the applicability of the assortativity coefficient to multi-dimensional attributes. In brief, we first apply clustering of the vertices on their vector characteristic. After clustering is completed, each network node is assigned a cluster label, which is an enumerative characteristic and we can compute the assortativity coefficient on the cluster labels. We further compare this method with an alternative baseline, which is an immediate extension of the assortativity coefficient, namely, the assortativity vector. The latter treats each element of the node’s attribute vector separately and then combines the independent results in a single value. Finally, we apply our method and the baseline on two different social network datasets. We also use synthetic network data to delve into the details of each metric/method. Our findings indicate that while the baseline of assortativity vector performs satisfactory when the variance of the elements of the vector attribute across the network population is kept low, it provides biased results as this variance increases. On the contrary, our approach appears to be robust in such scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Link prediction in social networks using hyper-motif representation on hypergraph

Article 12 April 2024

Complex Networks: a Mini-review

Article 13 July 2020

The homophily principle in social network analysis: A survey

Article 18 January 2022

Notes

We will overview these metrics in Sect. 2.
Features 1, 2, 3 and 4 correspond to smoking, drug use, alcohol use and sporting activity, respectively.
Edges between vertices of the same type are never generated.

References

Bearman P, Moddy J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks. Am J Sociol 110:44–91
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning, Information science and statistics. Springer, New York. ISBN:978-0387310732, http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738
Calinski R, Harabasz J (1974) A dendrite method for clustering analysis. Commun Stat 3:1–27
Article MathSciNet MATH Google Scholar
Catania J, Coates T, Kegelsa S, Fullilove M (1992) The population-based amen (aids in multi-ethnic neighborhoods) study. Am J Public Health 82:284–287
Article Google Scholar
Cho E, Myers SA, LeskovecJ (2011) Friendship and mobility: friendship and mobility: user movement in location-based social networks. In: ACM KDD, pp 279–311
Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: ACM SIGKDD
Erdös P, Rènyi A (1959) On random graphs. Publ Math 6:290–297
MATH Google Scholar
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: ACM KDD
Feld S (1981) The focused organization of social ties. Am J Sociol 86(5):1015–1035
Article Google Scholar
Foster JG, Foster DV, Grassberger P, Paczuski M (2010) Edge direction and the structure of networks. Proc Natl Acad Sci 107(24):10815–10820. doi:10.1073/pnas.0912671107
Google Scholar
Frey B, Dueck D (February 2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet MATH Google Scholar
Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008) ergm: a package to fit, simulate and diagnose exponential-family models for networks. J Stat Softw 24(3):nihpa54860. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743438/
Kim M, Leskovec J (2012) Latent multi-group membership graph model. In: ICML
Lauw H, Shafer J, Agrawal R, Ntoulas A (2010) Homophily in the digital world: a livejournal case study. In: IEEE Internet Computing
Luxburg U (1997) A tutorial on spectral clustering. In: Technical report 149, Max Planc Institute for biological cybernetics
McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444
Article Google Scholar
Moddy J (2001) Race, school integration, and friendship segregation in America. Am J Sociol 107:679–716
Article Google Scholar
Middle School Friendship Dataset (2000) http://www.casos.cs.cmu.edu/computational_tools/datasets/external/50women/index11.php. Accessed 9 Apr 2014
National Survey of Family Growth, cycle v, 1995 (1997) US Department of Health and Human Sevices, National Center for Health Statistics, Hyattsville
Newman M (2002) Mixing patterns in networks. arXiv:cond-mat/0209450v2[cond-mat.stat-mech]
Newman M (2010) Networks: an introduction. Oxford University Press, New York. ISBN:978-0199206650, http://www.amazon.com/Networks-An-Introduction-Mark-Newman/dp/0199206651
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: NIPS
Pearson M, Michell L (2000) Smoke rings: social network analysis of friendship groups, smoking and drug-taking. Drugs: Edu Prev Policy 7:21–37
Google Scholar
Scellato S, Noulas A, Mascolo C (2011) Exploiting place features in link prediction on location-based social networks. In: ACM KDD
Shi J, Malik J (1997) Normalized cuts and image segmentation. In: Conference on computer vision and pattern recognition
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B 63 (Part 2):411–423
Google Scholar
Vuokko N, Terzi E (2010) Reconstructing randomized social networks. In: SIAM SDM
Zhao K, Ngamassi L, Yen J, Maitland C, Tapia A (2010) Assortativity patterns in multi-dimensional inter-organizational networks: a case study of the humanitarian relief sector. In: SBP

Download references

Author information

Authors and Affiliations

School of Information Sciences, University of Pittsburgh, Pittsburgh, USA
Konstantinos Pelechrinis

Authors

Konstantinos Pelechrinis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konstantinos Pelechrinis.

Appendix: Toy example for biases introduced by assortativity vector

Consider a toy-network in which each node is described by one of the vectors \({\bf u}_z=(0,0)\), \({\bf u}_w=(0,1)\), \({\bf u}_t=(1,0)\) and \({\bf u}_q=(1,1)\). For ease of presentation let us assume that we have \(p\) nodes of each type, and that \(z\)-vertices are connected with \(w\)-vertices and \(t\)-vertices with \(q\)-vertices. If we calculate the assortativity vector for this network is \({\bf r}=(1,-1).\) Hence, we get \(r^{\rm mean}=0.\) The latter implies that associations between vertices in this network are made at random, regardless of their vector attribute. However, in the 2-dimensional space the vectors that describe the nodes of the network can form four distinct classes (each of which contains \(p\) vertices) and clearly all the connections in this network are among vertices belonging to different groups (i.e., dissimilar nodes). Even if connections were made at random, one might have expected approximately 25 % of the edges to connect vertices with the same vector attribute, which clearly it is not the case in our toy-example. Hence, \(r^{\rm mean}\) for this network with regard to \({\bf u}\) should be negative to capture the underlying mixing patterns. Of course, as mentioned in Section negative mixing is closer to random mixing (compared to positive mixing), but still this example illustrates possible biases introduced by considering elements of u independently.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pelechrinis, K. Matching patterns in networks with multi-dimensional attributes: a machine learning approach. Soc. Netw. Anal. Min. 4, 188 (2014). https://doi.org/10.1007/s13278-014-0188-2

Download citation

Received: 23 December 2013
Accepted: 26 March 2014
Published: 12 April 2014
DOI: https://doi.org/10.1007/s13278-014-0188-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Matching patterns in networks with multi-dimensional attributes: a machine learning approach

Abstract

Access this article

Similar content being viewed by others

Link prediction in social networks using hyper-motif representation on hypergraph

Complex Networks: a Mini-review

The homophily principle in social network analysis: A survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Toy example for biases introduced by assortativity vector

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Matching patterns in networks with multi-dimensional attributes: a machine learning approach

Abstract

Access this article

Similar content being viewed by others

Link prediction in social networks using hyper-motif representation on hypergraph

Complex Networks: a Mini-review

The homophily principle in social network analysis: A survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Toy example for biases introduced by assortativity vector

Appendix: Toy example for biases introduced by assortativity vector

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation