Abstract
In the digital era, individuals are increasingly profiled and grouped based on the traces that they leave behind in online social networks such as Twitter and Facebook. In this paper, we develop and evaluate a novel text analysis approach for studying user identity and social roles by redefining identity as a sequence of timestamped items (e.g., tweet texts). We operationalise this idea by developing a novel text distance metric, the time-sensitive semantic edit distance (t-SED), which accounts for the temporal context across multiple traces. To evaluate this method, we undertake a case study of Russian online-troll activity within US political discourse. The novel metric allows us to classify the social roles of trolls based on their traces, in this case tweets, into one of the predefined categories left-leaning, right-leaning, and news feed. We show the effectiveness of the t-SED metric to measure the similarities between tweets while accounting for the temporal context, and we use novel data visualisation techniques and qualitative analysis to uncover new empirical insights into Russian troll activity that have not been identified in the previous work. In addition, we highlight a connection with the field of actor–network theory and the related hypotheses of Gabriel Tarde, and we discuss how social sequence analysis using t-SED may provide new avenues for tackling a longstanding problem in social theory: how to analyse society without separating reality into micro vs. macro-levels.
Similar content being viewed by others
Notes
Available at https://github.com/fivethirtyeight/russian-troll-tweets/.
MAGA is an acronym that stands for Make America Great Again. It was the election slogan used by Donald Trump during his election campaign in 2016, and has subsequently become a central theme of his presidency.
Bernie Sanders was the alternative Democrat Presidential Nominee.
Available at https://github.com/s/preprocessor.
The bag-of-words is used to map a sequence to vector.
Length normalisation: \(\text {sed}(\varvec{a},\varvec{b})/\max (\varvec{|a|},\varvec{|b|})\).
Ratio normalisation: \(\text {sed}(\varvec{a},\varvec{b})/\text {ed}(\varvec{a},\varvec{b})\).
Twitter has a 140-character limitation before Nov. 2017.
References
Abbott, A. (1995). Sequence analysis: New methods for old ideas. Annual Review of Sociology, 21(1), 93–113.
Abbott, A., & Tsay, A. (2000). Sequence analysis and optimal matching methods in sociology: Review and prospect. Sociological Methods & Research, 29(1), 3–33.
Badawy, A., Ferrara, E., & Lerman, K. (2018). Analyzing the digital traces of political manipulation: The 2016 russian interference twitter campaign. arXiv:180204291 (preprint).
Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 U.S. presidential election online discussion. First Monday, 21(11). https://doi.org/10.5210/fm.v21i11.7090. URL http://firstmonday.org/ojs/index.php/fm/article/view/7090.
Broniatowski, D. A., Jamison, A. M., Qi, S., AlKulaib, L., Chen, T., Benton, A., Quinn, S. C., & Dredze, M. (2018). Weaponized health communication: Twitter bots and russian trolls amplify the vaccine debate. American Journal of Public Health, 108(10). https://doi.org/10.2105/AJPH.2018.304567
Buckels, E. E., Trapnell, P. D., & Paulhus, D. L. (2014). Trolls just want to have fun. Personality and Individual Differences, 67, 97–102. https://doi.org/10.1016/j.paid.2014.01.016. http://www.sciencedirect.com/science/article/pii/S0191886914000324 (the Dark Triad of Personality).
Cook, D. M., Waugh, B., Abdipanah, M., Hashemi, O., & Rahman, S. A. (2014). Twitter deception and influence: Issues of identity, slacktivism, and puppetry. Journal of Information Warfare, 13(1), 58–71.
Cornwell, B. (2015). Social sequence analysis: Methods and applications, (Vol. 37). Cambridge: Cambridge University Press.
Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 273–274). International World Wide Web Conferences Steering Committee.
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104.
Flores-Saviaga, C., Keegan, B., & Savage, S. (2018). Mobilizing the trump train: Understanding collective action in a political trolling community. In Twelfth International AAAI Conference on Web and Social Media (ICWSM18). International World Wide Web Conferences Steering Committee.
Herring, S., Job-Sluder, K., Scheckler, R., & Barab, S. (2002). Searching for safety online: Managing “trolling” in a feminist forum. The Information Society, 18(5), 371–384.
Kollanyi, B., Howard, P. N., & Woolley, S. C. (2016). Bots and automation over Twitter during the first U.S. presidential debate. COMPROP Data Memo No. 1. http://blogs.oii.ox.ac.uk/politicalbots/wp-content/uploads/sites/89/2016/10/Data-Memo-First-Presidential-Debate.pdf. Accessed 1 Nov 2018.
Kumar, S., Cheng, J., Leskovec, J., & Subrahmanian, V. (2017). An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web (pp. 857–866). International World Wide Web Conferences Steering Committee.
Latour, B. (2002). Gabriel Tarde and the end of the social. In P. Joyce (Ed.), The Social in Question: New Bearings in History and the Social Sciences (pp. 117–133). London: Routledge.
Latour, B., Jensen, P., Venturini, T., Grauwin, S., & Boullier, D. (2012). ‘The whole is always smaller than its parts’—A digital test of gabriel tardes’ monads. The British Journal of Sociology, 63(4), 590–615.
Leskovec, J., Backstrom, L., Kumar, R., & Tomkins, A. (2008). Microscopic evolution of social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 462-470). ACM.
Boatwright, B. C., Linvill, D. L., & Warren, P. L. (2018). Troll factories: The internet research agency and state-sponsored agenda building. Resource Centre on Media Freedom in Europe. http://pwarren.people.clemson.edu/Linvill_Warren_TrollFactory.pdf. Accessed 1 Nov 2018.
Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(1), 2579–2605.
Mihaylov, T., Georgiev, G., & Nakov, P. (2015). Finding opinion manipulation trolls in news community forums. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning (pp. 310–314).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. Accessed 10 Dec 2018.
Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys (CSUR), 33(1), 31–88.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).
Phinney, J. S. (2000). Ethnic and racial identity: Ethnic identity. In A. E. Kazdin (Ed.), Encyclopedia of Psychology (Vol. 3, pp. 254–259). Washington, DC: American Psychological Association.
Rizoiu, M. A., Graham, T., Zhang, R., Zhang, Y., Ackland, R., & Xie, L. (2018). DEBATENIGHT: The role and influence of socialbots on twitter during the first 2016 US presidential debate. In 12th International AAAI Conference on Web and Social Media, ICWSM 2018.
Rizoiu, M. A., Lee, Y., Mishra, S., & Xie, L. (2018b). Hawkes processes for events in social media. In S.F. Chang (Eds.), Frontiers of Multimedia Research (pp. 191–218). Springer, New York. https://doi.org/10.1145/3122865.3122874,
Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The spread of fake news by social bots (pp. 96–104). https://www.andyblackassociates.co.uk/wp-content/uploads/2015/06/fakenewsbots.pdf. Accessed 20 Oct 2018.
Stewart, L. G., Arif, A., & Starbird, K. (2018). Examining trolls and polarization with a retweet network. In Proceedings of Web Search and Data Mining (2018), Workshop on Misinformation and Misbehavior Mining on the Web. http://faculty.washington.edu/kstarbi/examining-trolls-polarization.pdf. Accessed 1 Dec 2018.
Tarde, G. (2012). [1895]. Monadology and sociology. Melbourne, Victoria: Re.press.
Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017). Online human-bot interactions: Detection, estimation, and characterization. arXiv:170303107 (preprint).
Zannettou, S., Caulfield, T., Setzer, W., Sirivianos, M., Stringhini, G., & Blackburn, J. (2018). Who let the trolls out? towards understanding state-sponsored trolls. CoRR abs/1811.03130. arXiv:1811.03130
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kim, D., Graham, T., Wan, Z. et al. Analysing user identity via time-sensitive semantic edit distance (t-SED): a case study of Russian trolls on Twitter. J Comput Soc Sc 2, 331–351 (2019). https://doi.org/10.1007/s42001-019-00051-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-019-00051-x