Abstract
Socially cohesive groups tend to share similar ideas and express themselves in similar ways when posting their thoughts in online social networks. Therefore, some researchers have conducted studies to uncover the issues discussed by groups who are structurally connected in a network. In this study, we take advantage of the language usage patterns present in online communication to unveil affinity groups, i.e. like-minded people, who are not necessarily interacting in the network currently. We analyze 735K tweets written by 620 unique users and compute scores for 14 grammatical categories using the linguistic inquiry word count software (LIWC). With the LIWC scores, we build a vector for each user, apply a similarity measure and feed an affinity propagation clustering algorithm to find the affinity groups. Following the proposed method, clusters of religious activists, journalists, entrepreneurs, among others emerge. We automatically characterize each cluster using a topic modeling algorithm and validate the generated topics with a user study conducted with 200 people. As a result, more than 70% of the participants agreed on their selection. These results confirm that communities share certain similarities in the use of language, traits that characterize their behavior and grouping.
References
Aiello, L.M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B., Menczer, F.: Friendship prediction and homophily in social media. ACM Trans. Web (TWEB) 6(2), 9 (2012)
Bliss, C.A., Frank, M.R., Danforth, C.M., Dodds, P.S.: An evolutionary algorithm approach to link prediction in dynamic social networks. J. Comput. Sci. 5(5), 750–764 (2014)
Conover, M.D., Gonçalves, B., Ratkiewicz, J., Flammini, A., Menczer, F.: Predicting the political alignment of twitter users. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 192–199. IEEE (2011)
Fang, A., Macdonald, C., Ounis, I., Habel, P.: Topics in tweets: a user study of topic coherence metrics for twitter data. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Di Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 492–504. Springer, Cham (2016). doi:10.1007/978-3-319-30671-1_36
Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link prediction in social networks using computationally efficient topological features. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 73–80. IEEE (2011)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007). www.psi.toronto.edu/affinitypropagation
Godfrey, D., Johns, C., Meyer, C., Race, S., Sadek, C.: A case study in text mining: interpreting twitter data from world cup tweets (2014). arXiv preprint: arXiv:1408.5427
Pearce, W., Holmberg, K., Hellsten, I., Nerlich, B.: Climate change on twitter: topics, communities and conversations about the 2013 IPCC working group 1 report. PloS One 9(4), e94785 (2014)
Pita, O., Baquerizo, G., Vaca, C., Mendieta, J., Villavicencio, M., Rodríguez, J.: Linguistic profiles on microblogging platforms to characterize political leaders: the ecuadorian case on twitter. In: Ecuador Technical Chapters Meeting (ETCM), vol. 1, pp. 1–6. IEEE (2016)
Qiu, L., Lin, H., Ramsay, J., Yang, F.: You are what you tweet: personality expression and perception on twitter. J. Res. Pers. 46(6), 710–718 (2012)
Quercia, D., Askham, H., Crowcroft, J.: Tweetlda: supervised topic classification and link prediction in twitter. In: Proceedings of the 4th Annual ACM Web Science Conference, pp. 247–250. ACM (2012)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: Liwc and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)
Wienberg, C., Roemmele, M., Gordon, A.S.: Content-based similarity measures of weblog authors. In: Proceedings of the 5th Annual ACM Web Science Conference, pp. 445–452. ACM (2013)
Xiang, R., Neville, J., Rogati, M.: Modeling relationship strength in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 981–990. ACM (2010)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273. ACM (2003)
Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what@ you# tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, pp. 261–270. ACM (2012)
Yarkoni, T.: Personality in 100,000 words: a large-scale analysis of personality and word use among bloggers. J. Res. Pers. 44(3), 363–373 (2010)
Yin, D., Hong, L., Davison, B.D.: Structural link analysis and prediction in microblogs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1163–1168. ACM (2011)
Yu, B., Kaufmann, S., Diermeier, D.: Classifying party affiliation from political speech. J. Inf. Technol. Polit. 5(1), 33–48 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Survey Example
According to your opinion, which of the following topics best describe the set of words presented below. Please, choose only one or two topics. Underline the words that justify your selection.
1.2 A.2 Participants’ Demographics
From the surveyed people, we obtained the demographics described in the following Table 2:
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mendieta, J., Baquerizo, G., Villavicencio, M., Vaca, C. (2017). Affinity Groups: A Linguistic Analysis for Social Network Groups Identification. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-67256-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67255-7
Online ISBN: 978-3-319-67256-4
eBook Packages: Computer ScienceComputer Science (R0)