Affinity Groups: A Linguistic Analysis for Social Network Groups Identification

  • Jonathan Mendieta
  • Gabriela Baquerizo
  • Mónica Villavicencio
  • Carmen Vaca
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10540)


Socially cohesive groups tend to share similar ideas and express themselves in similar ways when posting their thoughts in online social networks. Therefore, some researchers have conducted studies to uncover the issues discussed by groups who are structurally connected in a network. In this study, we take advantage of the language usage patterns present in online communication to unveil affinity groups, i.e. like-minded people, who are not necessarily interacting in the network currently. We analyze 735K tweets written by 620 unique users and compute scores for 14 grammatical categories using the linguistic inquiry word count software (LIWC). With the LIWC scores, we build a vector for each user, apply a similarity measure and feed an affinity propagation clustering algorithm to find the affinity groups. Following the proposed method, clusters of religious activists, journalists, entrepreneurs, among others emerge. We automatically characterize each cluster using a topic modeling algorithm and validate the generated topics with a user study conducted with 200 people. As a result, more than 70% of the participants agreed on their selection. These results confirm that communities share certain similarities in the use of language, traits that characterize their behavior and grouping.


Twitter LIWC Affinity propagation clustering Linguistic clustering 


  1. 1.
    Aiello, L.M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B., Menczer, F.: Friendship prediction and homophily in social media. ACM Trans. Web (TWEB) 6(2), 9 (2012)Google Scholar
  2. 2.
    Bliss, C.A., Frank, M.R., Danforth, C.M., Dodds, P.S.: An evolutionary algorithm approach to link prediction in dynamic social networks. J. Comput. Sci. 5(5), 750–764 (2014)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Conover, M.D., Gonçalves, B., Ratkiewicz, J., Flammini, A., Menczer, F.: Predicting the political alignment of twitter users. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 192–199. IEEE (2011)Google Scholar
  4. 4.
    Fang, A., Macdonald, C., Ounis, I., Habel, P.: Topics in tweets: a user study of topic coherence metrics for twitter data. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Di Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 492–504. Springer, Cham (2016). doi: 10.1007/978-3-319-30671-1_36 CrossRefGoogle Scholar
  5. 5.
    Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 73–80. IEEE (2011)Google Scholar
  6. 6.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007). MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Godfrey, D., Johns, C., Meyer, C., Race, S., Sadek, C.: A case study in text mining: interpreting twitter data from world cup tweets (2014). arXiv preprint: arXiv:1408.5427
  8. 8.
    Pearce, W., Holmberg, K., Hellsten, I., Nerlich, B.: Climate change on twitter: topics, communities and conversations about the 2013 IPCC working group 1 report. PloS One 9(4), e94785 (2014)CrossRefGoogle Scholar
  9. 9.
    Pita, O., Baquerizo, G., Vaca, C., Mendieta, J., Villavicencio, M., Rodríguez, J.: Linguistic profiles on microblogging platforms to characterize political leaders: the ecuadorian case on twitter. In: Ecuador Technical Chapters Meeting (ETCM), vol. 1, pp. 1–6. IEEE (2016)Google Scholar
  10. 10.
    Qiu, L., Lin, H., Ramsay, J., Yang, F.: You are what you tweet: personality expression and perception on twitter. J. Res. Pers. 46(6), 710–718 (2012)CrossRefGoogle Scholar
  11. 11.
    Quercia, D., Askham, H., Crowcroft, J.: Tweetlda: supervised topic classification and link prediction in twitter. In: Proceedings of the 4th Annual ACM Web Science Conference, pp. 247–250. ACM (2012)Google Scholar
  12. 12.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)Google Scholar
  13. 13.
    Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: Liwc and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)CrossRefGoogle Scholar
  14. 14.
    Wienberg, C., Roemmele, M., Gordon, A.S.: Content-based similarity measures of weblog authors. In: Proceedings of the 5th Annual ACM Web Science Conference, pp. 445–452. ACM (2013)Google Scholar
  15. 15.
    Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 981–990. ACM (2010)Google Scholar
  16. 16.
    Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273. ACM (2003)Google Scholar
  17. 17.
    Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what@ you# tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, pp. 261–270. ACM (2012)Google Scholar
  18. 18.
    Yarkoni, T.: Personality in 100,000 words: a large-scale analysis of personality and word use among bloggers. J. Res. Pers. 44(3), 363–373 (2010)CrossRefGoogle Scholar
  19. 19.
    Yin, D., Hong, L., Davison, B.D.: Structural link analysis and prediction in microblogs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1163–1168. ACM (2011)Google Scholar
  20. 20.
    Yu, B., Kaufmann, S., Diermeier, D.: Classifying party affiliation from political speech. J. Inf. Technol. Polit. 5(1), 33–48 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jonathan Mendieta
    • 1
    • 2
  • Gabriela Baquerizo
    • 3
  • Mónica Villavicencio
    • 1
  • Carmen Vaca
    • 1
  1. 1.Escuela Superior Politécnica del LitoralGuayaquilEcuador
  2. 2.Facultad de Ciencias Naturales y MatemáticasGuayaquilEcuador
  3. 3.Universidad Casa GrandeGuayaquilEcuador

Personalised recommendations