Identifying and validating personality traits-based homophilies for an egocentric network


Social network sites (SNS) have touched the lives of millions of people around the world. People share interests, ideas, photos, activities in the social networks with their family, colleagues, friends and acquaintances. However, the degree of interactions among members widely varies. According to a sociology principle, people with similar personality often interact with each other more frequently. A group of connected people with similar personality traits is termed as a homophily. In this paper, we develop a method to identify homophilies by analyzing the Big5 personality traits of users from their interactions in an egocentric network like Facebook. We observe that our homophilies correctly cluster ranged from 73 to 87 % users for different personality traits. We also present a novel validation technique to verify those extracted homophilies in real life. Note that we are the first to validate the extracted homophilies and compare those with baseline techniques from SNS usage in real life using an interview-based method. We notice that our validation results show different agreements ranged from 0.207 (fair) to 0.709 (substantial) among the raters of those homophilies in real-life .

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Adali S, Golbeck J (2012) Predicting personality with social behavior. In: ASONAM. IEEE

  2. Adamopoulos P, Todri V (2015) Personality-based recommendations: evidence from In: Proceedings of the 9th ACM international conference on recommender systems

  3. Aiello LM, Barrat A, Schifanella R, Cattuto C, Markines B, Menczer F (2012) Friendship prediction and homophily in social media. TWEB 6(2):9

    Article  Google Scholar 

  4. Amer-Yahia S, Roy SB, Chawlat A, Das G, Yu C (2009) Group recommendation: semantics and efficiency. Proc VLDB Endow 2(1):754–765

    Article  Google Scholar 

  5. Arnaboldi V, Conti M, Passarella A, Dunbar R (2013) Dynamics of personal social relationships in online social networks: a study on twitter. In: Proceedings of the first ACM conference on online social networks. ACM, pp 15–26

  6. Back MD, Stopfer JM, Vazire S, Gaddis S, Schmukle SC, Egloff B, Gosling SD (2010) Facebook profiles reflect actual personality, not self-idealization. Psychol Sci 21:372

    Article  Google Scholar 

  7. Bisgin H, Agarwal N, Xu X (2010) Investigating homophily in online social networks. In: WI-IAT. IEEE, vol 1, pp 533–536

  8. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  9. Boyd R (2014) Meh: meaning extraction helper (version 1.0.6)

  10. Celli F, Pianesi F, Stillwell D, Kosinski M (2013) Workshop on computational personality recognition (shared task). In: Proceedings of the workshop on computational personality recognition

  11. Chen L, Wu W, He L (2013) How personality influences users’ needs for recommendation diversity? In: CHI’13 extended abstracts on human factors in computing systems. ACM, pp 829–834

  12. Chen J, Hsieh G, Mahmud JU, Nichols J (2014) Understanding individuals’ personal values from social media word use. In: Proceedings of the 17th ACM conference on computer supported cooperative work and social computing. ACM, pp 405–414

  13. Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 160–168

  14. Dev H, Ali ME, Hashem T (2014) User interaction-based community detection in online social networks. In: DASFAA. Springer, pp 296–310

  15. Fast E, Chen B, Bernstein M (2016) Empath: understanding topic signals in large-scale text. arXiv preprint arXiv:1602.06979

  16. Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    MathSciNet  Article  Google Scholar 

  17. Feng H, Qian X (2013) Recommendation via user’s personality and social contextual. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, pp 1521–1524

  18. Fisher D (2005) Using egocentric networks to understand communication. IEEE Internet Comput 9(5):20–28

    Article  Google Scholar 

  19. Gilbert E, Karahalios K (2009) Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 211–220

  20. Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In: CHI’11. ACM, pp 253–262

  21. Gorla J, Lathia N, Robertson S, Wang J (2013) Probabilistic group recommendation via information matching. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 495–504

  22. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  23. Hamid MN, Naser MA, Hasan MK, Mahmud H (2014) A cohesion-based friend-recommendation system. Soc Netw Anal Min 4(1):1–11

    Article  Google Scholar 

  24. Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: insights from a connected world. Morgan Kaufmann, Los Altos

    Google Scholar 

  25. Hastie T, Qian J (2014) Glmnet vignette. Technical report, Stanford

  26. Hornik K, Grün B (2011) Topicmodels: an r package for fitting topic models. J Stat Softw 40(13):1–30

    Google Scholar 

  27. Hsieh G, Chen J, Mahmud JU, Nichols J (2014) You read what you value: understanding personal values and reading interests. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, pp 983–986

  28. Hughes DJ, Rowe M, Batey M, Lee A (2012) A tale of two sites: twitter vs. facebook and the personality predictors of social media usage. Comput Hum Behav 28(2):561–569

    Article  Google Scholar 

  29. Jiménez D (1998) Dynamically weighted ensemble neural networks for classification. In: The 1998 IEEE international joint conference on neural networks proceedings, 1998. IEEE world congress on computational intelligence. IEEE, vol 1, pp 753–756

  30. John OP (2000) The big five personality test. Accessed 25 July 2016

  31. John OP, Srivastava S (1999) The big five trait taxonomy: history, measurement, and theoretical perspectives. Handb Pers: Theory Res 2(1999):102–138

    Google Scholar 

  32. John OP, Naumann LP, Soto CJ (2008) Paradigm shift to the integrative big five trait taxonomy. Handb Pers: Theory Res 3:114–158

    Google Scholar 

  33. Kafeza E, Kanavos A, Makris C, Chiu D (2013) Identifying personality-based communities in social networks. In: Parsons J, Chiu D (eds) Advances in conceptual modeling. Springer, Hong Kong, pp 7–13

  34. Kafeza E, Kanavos A, Makris C, Vikatos P (2014) T-pice: twitter personality-based influential communities extraction system. In: Parsons J, Chiu D (eds) BigData congress. IEEE, PhD Symposium, Hong Kong, pp 212–219

  35. Koch GG (1983) Intraclass correlation coefficient. In: Encyclopedia of statistical sciences, vol 4. Wiley, pp 212–217

  36. Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci 110(15):5802–5805

    Article  Google Scholar 

  37. Kuhn M (2008) Caret package. J Stat Softw 28(5):1–26

    Article  Google Scholar 

  38. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  39. Lumley T, Miller A (2009) Leaps: regression subset selection. R package version 2.9. See

  40. Marshall MN (1996) Sampling for qualitative research. Fam Pract 13(6):522–526

    Article  Google Scholar 

  41. McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. NIPS 2012:548–56

    Google Scholar 

  42. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  43. Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev: Data Min Knowl Discov 2(1):86–97

    MathSciNet  Google Scholar 

  44. Norman WT (1963) Toward an adequate taxonomy of personality attributes: replicated factor structure in peer nomination personality ratings. J Abnorm Soc Psychol 66(6):574

    Article  Google Scholar 

  45. Pennebaker JW, Booth RJ, Francis ME (2007) Linguistic inquiry and word count: Liwc. Austin: liwc. net. Accessed 29 July 2016

  46. Petrocelli T (2014) Closed vs open social networks

  47. Polikar R (2006) Ensemble-based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  48. Schwartz HA, Eichstaedt JC et al (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8(9):e73791

    Article  Google Scholar 

  49. Sill J, Takács G, Mackey L, Lin D (2009) Feature-weighted linear stacking. arXiv preprint arXiv:0911.0460

  50. Sumner C, Byers A, Boochever R, Park GJ (2012) Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In: 2012 11th international conference on machine learning and applications (ICMLA). IEEE, vol 2, pp 386–393

  51. Viera AJ, Garrett JM et al (2005) Understanding interobserver agreement: the kappa statistic. Fam Med 37(5):360–363

    Google Scholar 

  52. Yarkoni T (2010) Personality in 100,000 words: a large-scale analysis of personality and word use among bloggers. J Res Pers 44(3):363–373

    Article  Google Scholar 

Download references


This research is funded by ICT Division, Ministry of Posts, Telecommunications and Information Technology, Government of the People’s Republic of Bangladesh.

Author information



Corresponding author

Correspondence to Md. Saddam Hossain Mukta.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mukta, M.S.H., Ali, M.E. & Mahmud, J. Identifying and validating personality traits-based homophilies for an egocentric network. Soc. Netw. Anal. Min. 6, 74 (2016).

Download citation


  • Regression
  • Classification
  • Clustering
  • Intra-class correlation