Abstract
Nowadays controversial topics on social media are often linked to hate speeches, fake news propagation, and biased or misinformation spreading. Detecting controversy in online discussions is a challenging task, but essential to stop these unhealthy behaviours.
In this work, we develop a general pipeline to quantify controversy on social media through content analysis, and we widely test it on Twitter.
Our approach can be outlined in four phases: an initial graph building phase, a community identification phase through graph partitioning, an embedding phase, using language models, and a final controversy score computation phase. We obtain an index that quantifies the intuitive notion of controversy.
To test that our method is general and not domain-, language-, geography- or size-dependent, we collect, clean and analyze 30 Twitter datasets about different topics, half controversial and half not, changing domains and magnitudes, in six different languages from all over the world.
The results confirm that our pipeline can quantify correctly the notion of controversy, reaching a ROC AUC score of 0.996 over controversial and non-controversial scores distributions. It outperforms the state-of-the-art approaches, both in terms of accuracy and computational speed.
Keywords
J. M. O. de Zarate and M. Di Giovanni—Equal contribution.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Kullback–Leibler divergence is a measure of how a probability distribution is different from a reference probability distribution.
- 3.
Code and datasets used in this work are available here: https://github.com/jmanuoz/Measuring-controversy-in-Social-Networks-through-NLP.
- 4.
References
Adamic, L.A., Glance, N.: The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. ACM (2005)
Akoglu, L.: Quantifying political polarity based on bipartite opinion networks. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
Al-Ayyoub, M., Rabab’ah, A., Jararweh, Y., Al-Kabi, M.N., Gupta, B.B.: Studying the controversy in online crowds’ interactions. Appl. Soft Comput. 66, 557–563 (2018)
Allport, G.W., Clark, K., Pettigrew, T.: The Nature of Prejudice. Addison-Wesley, Reading (1954)
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)
Bessi, A., Caldarelli, G., Del Vicario, M., Scala, A., Quattrociocchi, W.: Social determinants of content selection in the age of (mis)information. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 259–268. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_18
Bild, D.R., Liu, Y., Dick, R.P., Mao, Z.M., Wallach, D.S.: Aggregate characterization of user behavior in Twitter and analysis of the retweet graph. ACM Trans. Internet Technol. (TOIT) 15(1), 1–24 (2015)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)
Calvo, E.: Anatomía política de Twitter en argentina. Tuiteando# Nisman. Capital Intelectual, Buenos Aires (2015)
Conover, M.D., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., Flammini, A.: Political polarization on Twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
Dandekar, P., Goel, A., Lee, D.T.: Biased assimilation, homophily, and the dynamics of polarization. Proc. Natl. Acad. Sci. 110(15), 5791–5796 (2013)
De Maesschalck, R., Jouan-Rimbaud, D., Massart, D.L.: The mahalanobis distance. Chemometr. Intell. Lab. Syst. 50(1), 1–18 (2000)
Del Vicario, M., Zollo, F., Caldarelli, G., Scala, A., Quattrociocchi, W.: Mapping social dynamics on Facebook: the Brexit debate. Soc. Netw. 50, 6–16 (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Di Giovanni, M., Brambilla, M., Ceri, S., Daniel, F., Ramponi, G.: Content-based classification of political inclinations of Twitter users. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 4321–4327 (2018)
Dori-Hacohen, S., Allan, J.: Automated controversy detection on the web. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 423–434. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_46
Easley, D., Kleinberg, J., et al.: Networks, Crowds, and Markets, vol. 8. Cambridge University Press, Cambridge (2010)
Feng, W., Wang, J.: Retweet or not?: personalized tweet re-ranking. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 577–586. ACM (2013)
Garimella, K., De Francisci Morales, G., Gionis, A., Mathioudakis, M.: Reducing controversy by connecting opposing views. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM (2017)
Garimella, K., Morales, G.D.F., Gionis, A., Mathioudakis, M.: Quantifying controversy on social media. ACM Trans. Soc. Comput. 1(1), 3 (2018)
Grčar, M., Cherepnalkoski, D., Mozetič, I., Kralj Novak, P.: Stance and influence of Twitter users regarding the Brexit referendum. Comput. Soc. Netw. 4(1), 1–25 (2017). https://doi.org/10.1186/s40649-017-0042-6
Guerra, P.C., Meira Jr., W., Cardie, C., Kleinberg, R.: A measure of polarization on social media networks based on community boundaries. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Hong, S.: Online news on Twitter: newspapers’ social media adoption and their online readership. Inf. Econ. Policy 24(1), 69–74 (2012)
Jacomy, M., Venturini, T., Heymann, S., Bastian, M.: ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS One 9(6), e98679 (2014)
Jang, M.: Probabilistic models for identifying and explaining controversy (2019)
Jang, M., Foley, J., Dori-Hacohen, S., Allan, J.: Probabilistic approaches to controversy detection. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2069–2072 (2016)
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Kulshrestha, J., Zafar, M.B., Noboa, L.E., Gummadi, K.P., Ghosh, S.: Characterizing information diets of social media users. In: Ninth International AAAI Conference on Web and Social Media (2015)
Kumar, S., Hamilton, W.L., Leskovec, J., Jurafsky, D.: Community interaction and conflict on the web. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 933–943. International World Wide Web Conferences Steering Committee (2018)
Kupavskii, A., et al.: Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2335–2338. ACM (2012)
LaCour, M.: A balanced news diet, not selective exposure: evidence from a direct measure of media exposure. In: APSA 2012 Annual Meeting Paper (2015)
Lahoti, P., Garimella, K., Gionis, A.: Joint non-negative matrix factorization for learning ideological leaning on Twitter. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 351–359. ACM (2018)
Matakos, A., Terzi, E., Tsaparas, P.: Measuring and moderating opinion polarization in social networks. Data Min. Knowl. Disc. 31(5), 1480–1505 (2017). https://doi.org/10.1007/s10618-017-0527-9
Morales, A., Borondo, J., Losada, J.C., Benito, R.M.: Measuring political polarization: Twitter shows the two sides of Venezuela. Chaos: Interdisc. J. Nonlinear Sci. 25(3), 033114 (2015)
Munson, S.A., Lee, S.Y., Resnick, P.: Encouraging reading of diverse political viewpoints with a browser widget. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Pettigrew, T.F., Tropp, L.R.: Does intergroup contact reduce prejudice? Recent meta-analytic findings. In: Reducing Prejudice and Discrimination, pp. 103–124. Psychology Press (2013)
Rajadesingan, A., Liu, H.: Identifying users with opposing opinions in Twitter debates. In: Kennedy, W.G., Agarwal, N., Yang, S.J. (eds.) SBP 2014. LNCS, vol. 8393, pp. 153–160. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05579-4_19
Ramponi, G., Brambilla, M., Ceri, S., Daniel, F., Di Giovanni, M.: Vocabulary-based community detection and characterization. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. SAC 2019, pp. 1043–1050. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3297280.3297384
Ramponi, G., Brambilla, M., Ceri, S., Daniel, F., Giovanni, M.D.: Content-based characterization of online social communities. Inf. Process. Manag., 102133 (2019). https://doi.org/10.1016/j.ipm.2019.102133, http://www.sciencedirect.com/science/article/pii/S0306457319303516
Sapienza, F., Groisman, P.: Distancia de fermat y geodesicas en percolacion euclidea:teoriaa y aplicaciones en machine learning. M.sc. thesis (2018). http://cms.dm.uba.ar/academico/carreras/licenciatura/tesis/2018/Sapienza.pdf
Shearer, E., Gottfried, J.: News use across social media platforms 2017. Pew Research Center 7 (2017)
Stewart, L.G., Arif, A., Starbird, K.: Examining trolls and polarization with a retweet network. In: Proceedings of the ACM WSDM, Workshop on Misinformation and Misbehavior Mining on the Web (2018)
Tran, T., Ostendorf, M.: Characterizing the language of online communities and its relation to community reception. arXiv preprint arXiv:1609.04779 (2016)
Trilling, D.: Two different debates? Investigating the relationship between a political debate on TV and simultaneous comments on Twitter. Soc. Sci. Comput. Rev. 33(3), 259–276 (2015)
Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Venturini, T., Jacomy, M., Jensen, P.: What do we see when we look at networks. An introduction to visual network analysis and force-directed layouts. An introduction to visual network analysis and force-directed layouts, 26 April 2019 (2019)
Weller, K., Bruns, A., Burgess, J., Mahrt, M., Puschmann, C.: Twitter and Society, vol. 89. Peter Lang, Bern (2014)
Xiao, H.: Bert-as-service (2018). https://github.com/hanxiao/bert-as-service
Yang, X., Macdonald, C., Ounis, I.: Using word embeddings in Twitter election classification. Inf. Retrieval J. 21(2–3), 183–207 (2017). https://doi.org/10.1007/s10791-017-9319-5
Yardi, S., Boyd, D.: Dynamic debates: an analysis of group polarization over time on Twitter. Bull. Sci. Technol. Soc. 30(5), 316–327 (2010)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)
de Zarate, J.M.O., Feuerstein, E.: Vocabulary-based method for quantifying controversy in social media. arXiv preprint arXiv:2001.09899 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A Details on the discussions
Appendix A Details on the discussions
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
de Zarate, J.M.O., Di Giovanni, M., Feuerstein, E.Z., Brambilla, M. (2020). Measuring Controversy in Social Networks Through NLP. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59212-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59211-0
Online ISBN: 978-3-030-59212-7
eBook Packages: Computer ScienceComputer Science (R0)