Skip to main content

Identifying and validating personality traits-based homophilies for an egocentric network


Social network sites (SNS) have touched the lives of millions of people around the world. People share interests, ideas, photos, activities in the social networks with their family, colleagues, friends and acquaintances. However, the degree of interactions among members widely varies. According to a sociology principle, people with similar personality often interact with each other more frequently. A group of connected people with similar personality traits is termed as a homophily. In this paper, we develop a method to identify homophilies by analyzing the Big5 personality traits of users from their interactions in an egocentric network like Facebook. We observe that our homophilies correctly cluster ranged from 73 to 87 % users for different personality traits. We also present a novel validation technique to verify those extracted homophilies in real life. Note that we are the first to validate the extracted homophilies and compare those with baseline techniques from SNS usage in real life using an interview-based method. We notice that our validation results show different agreements ranged from 0.207 (fair) to 0.709 (substantial) among the raters of those homophilies in real-life .

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  • Adali S, Golbeck J (2012) Predicting personality with social behavior. In: ASONAM. IEEE

  • Adamopoulos P, Todri V (2015) Personality-based recommendations: evidence from In: Proceedings of the 9th ACM international conference on recommender systems

  • Aiello LM, Barrat A, Schifanella R, Cattuto C, Markines B, Menczer F (2012) Friendship prediction and homophily in social media. TWEB 6(2):9

    Article  Google Scholar 

  • Amer-Yahia S, Roy SB, Chawlat A, Das G, Yu C (2009) Group recommendation: semantics and efficiency. Proc VLDB Endow 2(1):754–765

    Article  Google Scholar 

  • Arnaboldi V, Conti M, Passarella A, Dunbar R (2013) Dynamics of personal social relationships in online social networks: a study on twitter. In: Proceedings of the first ACM conference on online social networks. ACM, pp 15–26

  • Back MD, Stopfer JM, Vazire S, Gaddis S, Schmukle SC, Egloff B, Gosling SD (2010) Facebook profiles reflect actual personality, not self-idealization. Psychol Sci 21:372

    Article  Google Scholar 

  • Bisgin H, Agarwal N, Xu X (2010) Investigating homophily in online social networks. In: WI-IAT. IEEE, vol 1, pp 533–536

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Boyd R (2014) Meh: meaning extraction helper (version 1.0.6)

  • Celli F, Pianesi F, Stillwell D, Kosinski M (2013) Workshop on computational personality recognition (shared task). In: Proceedings of the workshop on computational personality recognition

  • Chen L, Wu W, He L (2013) How personality influences users’ needs for recommendation diversity? In: CHI’13 extended abstracts on human factors in computing systems. ACM, pp 829–834

  • Chen J, Hsieh G, Mahmud JU, Nichols J (2014) Understanding individuals’ personal values from social media word use. In: Proceedings of the 17th ACM conference on computer supported cooperative work and social computing. ACM, pp 405–414

  • Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 160–168

  • Dev H, Ali ME, Hashem T (2014) User interaction-based community detection in online social networks. In: DASFAA. Springer, pp 296–310

  • Fast E, Chen B, Bernstein M (2016) Empath: understanding topic signals in large-scale text. arXiv preprint arXiv:1602.06979

  • Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Feng H, Qian X (2013) Recommendation via user’s personality and social contextual. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, pp 1521–1524

  • Fisher D (2005) Using egocentric networks to understand communication. IEEE Internet Comput 9(5):20–28

    Article  Google Scholar 

  • Gilbert E, Karahalios K (2009) Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 211–220

  • Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In: CHI’11. ACM, pp 253–262

  • Gorla J, Lathia N, Robertson S, Wang J (2013) Probabilistic group recommendation via information matching. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 495–504

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  • Hamid MN, Naser MA, Hasan MK, Mahmud H (2014) A cohesion-based friend-recommendation system. Soc Netw Anal Min 4(1):1–11

    Article  Google Scholar 

  • Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: insights from a connected world. Morgan Kaufmann, Los Altos

    Google Scholar 

  • Hastie T, Qian J (2014) Glmnet vignette. Technical report, Stanford

  • Hornik K, Grün B (2011) Topicmodels: an r package for fitting topic models. J Stat Softw 40(13):1–30

    Google Scholar 

  • Hsieh G, Chen J, Mahmud JU, Nichols J (2014) You read what you value: understanding personal values and reading interests. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, pp 983–986

  • Hughes DJ, Rowe M, Batey M, Lee A (2012) A tale of two sites: twitter vs. facebook and the personality predictors of social media usage. Comput Hum Behav 28(2):561–569

    Article  Google Scholar 

  • Jiménez D (1998) Dynamically weighted ensemble neural networks for classification. In: The 1998 IEEE international joint conference on neural networks proceedings, 1998. IEEE world congress on computational intelligence. IEEE, vol 1, pp 753–756

  • John OP (2000) The big five personality test. Accessed 25 July 2016

  • John OP, Srivastava S (1999) The big five trait taxonomy: history, measurement, and theoretical perspectives. Handb Pers: Theory Res 2(1999):102–138

    Google Scholar 

  • John OP, Naumann LP, Soto CJ (2008) Paradigm shift to the integrative big five trait taxonomy. Handb Pers: Theory Res 3:114–158

    Google Scholar 

  • Kafeza E, Kanavos A, Makris C, Chiu D (2013) Identifying personality-based communities in social networks. In: Parsons J, Chiu D (eds) Advances in conceptual modeling. Springer, Hong Kong, pp 7–13

  • Kafeza E, Kanavos A, Makris C, Vikatos P (2014) T-pice: twitter personality-based influential communities extraction system. In: Parsons J, Chiu D (eds) BigData congress. IEEE, PhD Symposium, Hong Kong, pp 212–219

  • Koch GG (1983) Intraclass correlation coefficient. In: Encyclopedia of statistical sciences, vol 4. Wiley, pp 212–217

  • Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci 110(15):5802–5805

    Article  Google Scholar 

  • Kuhn M (2008) Caret package. J Stat Softw 28(5):1–26

    Article  Google Scholar 

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Lumley T, Miller A (2009) Leaps: regression subset selection. R package version 2.9. See

  • Marshall MN (1996) Sampling for qualitative research. Fam Pract 13(6):522–526

    Article  Google Scholar 

  • McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. NIPS 2012:548–56

    Google Scholar 

  • McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  • Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev: Data Min Knowl Discov 2(1):86–97

    MathSciNet  Google Scholar 

  • Norman WT (1963) Toward an adequate taxonomy of personality attributes: replicated factor structure in peer nomination personality ratings. J Abnorm Soc Psychol 66(6):574

    Article  Google Scholar 

  • Pennebaker JW, Booth RJ, Francis ME (2007) Linguistic inquiry and word count: Liwc. Austin: liwc. net. Accessed 29 July 2016

  • Petrocelli T (2014) Closed vs open social networks

  • Polikar R (2006) Ensemble-based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  • Schwartz HA, Eichstaedt JC et al (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8(9):e73791

    Article  Google Scholar 

  • Sill J, Takács G, Mackey L, Lin D (2009) Feature-weighted linear stacking. arXiv preprint arXiv:0911.0460

  • Sumner C, Byers A, Boochever R, Park GJ (2012) Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In: 2012 11th international conference on machine learning and applications (ICMLA). IEEE, vol 2, pp 386–393

  • Viera AJ, Garrett JM et al (2005) Understanding interobserver agreement: the kappa statistic. Fam Med 37(5):360–363

    Google Scholar 

  • Yarkoni T (2010) Personality in 100,000 words: a large-scale analysis of personality and word use among bloggers. J Res Pers 44(3):363–373

    Article  Google Scholar 

Download references


This research is funded by ICT Division, Ministry of Posts, Telecommunications and Information Technology, Government of the People’s Republic of Bangladesh.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Md. Saddam Hossain Mukta.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mukta, M.S.H., Ali, M.E. & Mahmud, J. Identifying and validating personality traits-based homophilies for an egocentric network. Soc. Netw. Anal. Min. 6, 74 (2016).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Regression
  • Classification
  • Clustering
  • Intra-class correlation