The social distributional hypothesis: a pragmatic proxy for homophily in online social networks

  • Folke MitzlaffEmail author
  • Martin Atzmueller
  • Andreas Hotho
  • Gerd Stumme
Original Article
Part of the following topical collections:
  1. Social Systems as Complex Networks


Applications of the Social Web are ubiquitous and have become an integral part of everyday life: Users make friends, for example, with the help of online social networks, share thoughts via Twitter, or collaboratively write articles in Wikipedia. All such interactions leave digital traces; thus, users participate in the creation of heterogeneous, distributed, collaborative data collections. In linguistics, the Distributional Hypothesis states that words with similar distributional characteristics tend to be semantically related, i.e., words which occur in similar contexts are assumed to have a similar meaning. Considering users as (social) entities, their distributional characteristics can be observed by collecting interactions in social web applications. Accordingly, we state the social distributional hypothesis: we presume, that users with similar interaction characteristics tend to be related. We conduct a series of experiments on social interaction networks from Twitter, Flickr, and BibSonomy and investigate the relatedness concerning the interactions, their frequency, and the specific interaction characteristics. The results indicate interrelations between structurally similarity of interaction characteristics and semantic relatedness of users, supporting the social distributional hypothesis.


Social networks Social interactions Social media  Analysis Distributional semantics  



This work has been partially supported by the Commune project funded by the Hertie foundation.


  1. Atzmueller M, Mitzlaff F (2011) Efficient descriptive community mining. In: Proceedings 24th international FLAIRS conference, AAAI Press, pp 459–464Google Scholar
  2. Becchetti L, Castillo C, Donato D, Fazzone A, Rome I. (2006) A comparison of sampling techniques for web graph characterization. In: Proceedings of the workshop on link analysis (LinkKDD’06), Philadelphia, PAGoogle Scholar
  3. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRefGoogle Scholar
  4. Butts CT, Carley KM (2005) Some simple algorithms for structural comparison. Comput Math Org Theory 11:291–305. doi: 10.1007/s10588-005-5586-6. CrossRefzbMATHGoogle Scholar
  5. Cattuto C, Benz D, Hotho A, Stumme G (2008) Semantic grounding of tag relatedness in social bookmarking systems. In: The Semantic Web—ISWC 2008, Proceedings of international semantic web conference 2008, LNAI, vol 5318. Springer, Heidelberg, pp 615–631Google Scholar
  6. Chiluka N, Andrade N, Pouwelse J (2011) A link prediction approach to recommendations in large-scale user-generated content systems. In: Clough P, Foley C, Gurrin C, Jones G, Kraaij W, Lee H, Mudoch V (eds) Advances in information retrieval. Lecture notes in computer science, vol 6611. Springer, Berlin Heidelberg, pp 189–200Google Scholar
  7. Crandall DJ, Cosley D, Huttenlocher DP, Kleinberg JM, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of 14th ACM SIGKDD internatinal conference on knowledge discovery and data mining, ACM, pp 160–168Google Scholar
  8. de Sá H, Prudencio R (2011) Supervised link prediction in weighted networks. In: The 2011 international joint conference on neural networks (IJCNN), pp 2281–2288. IEEEGoogle Scholar
  9. Diestel R (2006) Graph theory. Springer, BerlinGoogle Scholar
  10. Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM’12. IEEE computer society, Washington, DC, USA, pp 181–190Google Scholar
  11. Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci 106(36):15274–15278. doi: 10.1073/pnas.0900282106 CrossRefGoogle Scholar
  12. Gaertler M (2004) Clustering. In: Brandes U, Erlebach T (eds) Network analysis, LNCS, vol 3418. Springer, Berlin, pp 178–215CrossRefGoogle Scholar
  13. Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Select Areas Commun 29(9):1872–1892CrossRefGoogle Scholar
  14. Harris ZS (1954) Distributional structure. WordGoogle Scholar
  15. Hornby AS, Cowie AP, Gimson AC, Lewis JW (1974) Oxford advanced learner’s dictionary of current English, vol 1428. Cambridge Univ Press, CambridgeGoogle Scholar
  16. Islam A, Inkpen D (2006) Second order co-occurrence PMI for determining the semantic similarity of words. In: Proceedings of the international conference on language resources and evaluation (LREC 2006), pp 1033–1038Google Scholar
  17. Kaltenbrunner A, Scellato S, Volkovich Y, Laniado D, Currie D, Jutemar EJ, Mascolo C (2012) Far from the eyes, close on the web: impact of geographic distance on online social interactions. In: Proceedings ACM SIGCOMM workshop on online social networks (WOSN 2012) Helsinki, FinlandGoogle Scholar
  18. Kashoob S, Caverlee J, Kamath K (2010) Community-based ranking of the social web. In: Proceedings of the 21st ACM conference on hypertext and hypermediaGoogle Scholar
  19. Kolaczyk E (2009) Statistical analysis of network data: methods and models. Springer Series in Statistics, p 386Google Scholar
  20. Krause B, J\(\ddot{a}\)schke R, Hotho A, Stumme G (2008) Logsonomy-social information retrieval with logdata. In: Proceedings 19th conference on hypertext and hypermedia, ACM, pp 157–166Google Scholar
  21. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web. ACM, pp 591–600Google Scholar
  22. Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10. ACM, New York, NY, USA, pp 393–402Google Scholar
  23. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031CrossRefGoogle Scholar
  24. Luhmann N (1993) Gesellschaftsstruktur und Semantik: Studien zur Wissenssoziologie der modernen Gesellschaft, vol 1. Suhrkamp Frankfurt/MGoogle Scholar
  25. Markines B, Cattuto C, Menczer F, Benz D, Hotho A, Stumme G (2009) Evaluating similarity measures for emergent semantics of social tagging. In: Proceedings of 18th international world wide web conference (WWW’09), pp 641–650Google Scholar
  26. Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296(5569):910CrossRefGoogle Scholar
  27. McGee J, Caverlee JA, Cheng Z (2011) A geographic study of tie strength in social media. In: Proceedings of 20th ACM international conference on information and knowledge management, CIKM ’11, ACM, New York, NY, USA, pp 2333–2336Google Scholar
  28. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444. doi: 10.1146/annurev.soc.27.1.415 CrossRefGoogle Scholar
  29. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27, pp 415–444 (2001).
  30. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, ACM, pp 29–42Google Scholar
  31. Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2011) Community assessment using evidence networks. In: Atzmueller M, Hotho A, Chin A, Helic D (eds) Analysis of social media and ubiquitous data, LNAI, vol 6904. Springer, Heidelberg, Germany, pp 79–98CrossRefGoogle Scholar
  32. Mitzlaff F, Atzmueller M, Benz D, Hotho A, Stumme G (2013) User-relatedness and community structure in social interaction networks. CoRR/absGoogle Scholar
  33. Mitzlaff F, Benz D, Stumme G, Hotho A (2010) Visit me, click me, be my friend: an analysis of evidence networks of user relationships in bibsonomy. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. Toronto, CanadaGoogle Scholar
  34. Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Web Intelligence, IEEE/WIC/ACM international conference on, pp 85–88 IEEEGoogle Scholar
  35. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256CrossRefMathSciNetzbMATHGoogle Scholar
  36. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 69(2):1–15CrossRefGoogle Scholar
  37. Scellato S, Noulas A, Lambiotte R, Mascolo C (2011) Socio-spatial properties of online location-based social networks. In: Proceedings of the fifth international conference on weblogs and social media (ICWSM) vol 11, pp 329–336Google Scholar
  38. Schifanella R, Barrat A, Cattuto C, Markines B, Menczer F (2010) Folks in folksonomies: social link prediction from shared metadata. In: Proceedings 3rd ACM international conference on web search and data mining, ACM, New York, NY, USA, pp 271–280Google Scholar
  39. van de Rijt A, Kang SM, Restivo M, Patil A (2014) Field experiments of success-breeds-success dynamics. Proc Natl Acad Sci p 201316836Google Scholar
  40. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 177–186Google Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  • Folke Mitzlaff
    • 1
    Email author
  • Martin Atzmueller
    • 1
  • Andreas Hotho
    • 2
  • Gerd Stumme
    • 1
  1. 1.Knowledge and Data Engineering GroupUniversity of KasselKasselGermany
  2. 2.Data Mining and Information Retrieval GroupUniversity of WuerzburgWuerzburgGermany

Personalised recommendations