Skip to main content

The social distributional hypothesis: a pragmatic proxy for homophily in online social networks


Applications of the Social Web are ubiquitous and have become an integral part of everyday life: Users make friends, for example, with the help of online social networks, share thoughts via Twitter, or collaboratively write articles in Wikipedia. All such interactions leave digital traces; thus, users participate in the creation of heterogeneous, distributed, collaborative data collections. In linguistics, the Distributional Hypothesis states that words with similar distributional characteristics tend to be semantically related, i.e., words which occur in similar contexts are assumed to have a similar meaning. Considering users as (social) entities, their distributional characteristics can be observed by collecting interactions in social web applications. Accordingly, we state the social distributional hypothesis: we presume, that users with similar interaction characteristics tend to be related. We conduct a series of experiments on social interaction networks from Twitter, Flickr, and BibSonomy and investigate the relatedness concerning the interactions, their frequency, and the specific interaction characteristics. The results indicate interrelations between structurally similarity of interaction characteristics and semantic relatedness of users, supporting the social distributional hypothesis.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8






  5. Note: For privacy reasons a user may deactivate this feature.

  6. (November 2011).


  • Atzmueller M, Mitzlaff F (2011) Efficient descriptive community mining. In: Proceedings 24th international FLAIRS conference, AAAI Press, pp 459–464

  • Becchetti L, Castillo C, Donato D, Fazzone A, Rome I. (2006) A comparison of sampling techniques for web graph characterization. In: Proceedings of the workshop on link analysis (LinkKDD’06), Philadelphia, PA

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117

    Article  Google Scholar 

  • Butts CT, Carley KM (2005) Some simple algorithms for structural comparison. Comput Math Org Theory 11:291–305. doi:10.1007/s10588-005-5586-6.

    Article  MATH  Google Scholar 

  • Cattuto C, Benz D, Hotho A, Stumme G (2008) Semantic grounding of tag relatedness in social bookmarking systems. In: The Semantic Web—ISWC 2008, Proceedings of international semantic web conference 2008, LNAI, vol 5318. Springer, Heidelberg, pp 615–631

  • Chiluka N, Andrade N, Pouwelse J (2011) A link prediction approach to recommendations in large-scale user-generated content systems. In: Clough P, Foley C, Gurrin C, Jones G, Kraaij W, Lee H, Mudoch V (eds) Advances in information retrieval. Lecture notes in computer science, vol 6611. Springer, Berlin Heidelberg, pp 189–200

  • Crandall DJ, Cosley D, Huttenlocher DP, Kleinberg JM, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of 14th ACM SIGKDD internatinal conference on knowledge discovery and data mining, ACM, pp 160–168

  • de Sá H, Prudencio R (2011) Supervised link prediction in weighted networks. In: The 2011 international joint conference on neural networks (IJCNN), pp 2281–2288. IEEE

  • Diestel R (2006) Graph theory. Springer, Berlin

    Google Scholar 

  • Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM’12. IEEE computer society, Washington, DC, USA, pp 181–190

  • Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci 106(36):15274–15278. doi:10.1073/pnas.0900282106

    Article  Google Scholar 

  • Gaertler M (2004) Clustering. In: Brandes U, Erlebach T (eds) Network analysis, LNCS, vol 3418. Springer, Berlin, pp 178–215

    Chapter  Google Scholar 

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Select Areas Commun 29(9):1872–1892

    Article  Google Scholar 

  • Harris ZS (1954) Distributional structure. Word

  • Hornby AS, Cowie AP, Gimson AC, Lewis JW (1974) Oxford advanced learner’s dictionary of current English, vol 1428. Cambridge Univ Press, Cambridge

    Google Scholar 

  • Islam A, Inkpen D (2006) Second order co-occurrence PMI for determining the semantic similarity of words. In: Proceedings of the international conference on language resources and evaluation (LREC 2006), pp 1033–1038

  • Kaltenbrunner A, Scellato S, Volkovich Y, Laniado D, Currie D, Jutemar EJ, Mascolo C (2012) Far from the eyes, close on the web: impact of geographic distance on online social interactions. In: Proceedings ACM SIGCOMM workshop on online social networks (WOSN 2012) Helsinki, Finland

  • Kashoob S, Caverlee J, Kamath K (2010) Community-based ranking of the social web. In: Proceedings of the 21st ACM conference on hypertext and hypermedia

  • Kolaczyk E (2009) Statistical analysis of network data: methods and models. Springer Series in Statistics, p 386

  • Krause B, J\(\ddot{a}\)schke R, Hotho A, Stumme G (2008) Logsonomy-social information retrieval with logdata. In: Proceedings 19th conference on hypertext and hypermedia, ACM, pp 157–166

  • Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web. ACM, pp 591–600

  • Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10. ACM, New York, NY, USA, pp 393–402

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Luhmann N (1993) Gesellschaftsstruktur und Semantik: Studien zur Wissenssoziologie der modernen Gesellschaft, vol 1. Suhrkamp Frankfurt/M

  • Markines B, Cattuto C, Menczer F, Benz D, Hotho A, Stumme G (2009) Evaluating similarity measures for emergent semantics of social tagging. In: Proceedings of 18th international world wide web conference (WWW’09), pp 641–650

  • Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296(5569):910

    Article  Google Scholar 

  • McGee J, Caverlee JA, Cheng Z (2011) A geographic study of tie strength in social media. In: Proceedings of 20th ACM international conference on information and knowledge management, CIKM ’11, ACM, New York, NY, USA, pp 2333–2336

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444. doi:10.1146/annurev.soc.27.1.415

    Article  Google Scholar 

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27, pp 415–444 (2001).

  • Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, ACM, pp 29–42

  • Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2011) Community assessment using evidence networks. In: Atzmueller M, Hotho A, Chin A, Helic D (eds) Analysis of social media and ubiquitous data, LNAI, vol 6904. Springer, Heidelberg, Germany, pp 79–98

    Chapter  Google Scholar 

  • Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2013) User-relatedness and community structure in social interaction networks. CoRR/abs

  • Mitzlaff F, Benz D, Stumme G, Hotho A (2010) Visit me, click me, be my friend: an analysis of evidence networks of user relationships in bibsonomy. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. Toronto, Canada

  • Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Web Intelligence, IEEE/WIC/ACM international conference on, pp 85–88 IEEE

  • Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    Article  MathSciNet  MATH  Google Scholar 

  • Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 69(2):1–15

    Article  Google Scholar 

  • Scellato S, Noulas A, Lambiotte R, Mascolo C (2011) Socio-spatial properties of online location-based social networks. In: Proceedings of the fifth international conference on weblogs and social media (ICWSM) vol 11, pp 329–336

  • Schifanella R, Barrat A, Cattuto C, Markines B, Menczer F (2010) Folks in folksonomies: social link prediction from shared metadata. In: Proceedings 3rd ACM international conference on web search and data mining, ACM, New York, NY, USA, pp 271–280

  • van de Rijt A, Kang SM, Restivo M, Patil A (2014) Field experiments of success-breeds-success dynamics. Proc Natl Acad Sci p 201316836

  • Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 177–186

Download references


This work has been partially supported by the Commune project funded by the Hertie foundation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Folke Mitzlaff.

Additional information

This article is part of the Topical Collection on Social Systems as Complex Networks.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mitzlaff, F., Atzmueller, M., Hotho, A. et al. The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Soc. Netw. Anal. Min. 4, 216 (2014).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Social networks
  • Social interactions
  • Social media
  • Analysis
  • Distributional semantics