Abstract
Online social networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. The social influence of large groups may be affected by user beliefs or be attracted by the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence more OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities, and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long-term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze the statistical differences between the bot accounts and the human users, as well as the botnet interactions with the user communities and the Twitter trending topics.
This is a preview of subscription content, access via your institution.










Data Availability Statement
We consulted our Data Protection Officer (DPO) for the usage of this dataset, as well as the sharing. To obtain the dataset used for the analysis described in this study, we follow the Twitter API restrictions and do not violate any terms from the Developer Agreement and Policies. According to Twitter Policy, we are not allowed to share the entire dataset, but only 100K user IDs. This dataset is available here: https://zenodo.org/record/4715885#.YILrU6kzadZ.
Code availability
The source code used for data collection and initial analysis is located here: https://github.com/polyvios/twAwler.
References
Alothali E, Zaki N, Mohamed EA, Alashwal H. Detecting social bots on Twitter: a literature review. In: 2018 International Conference on Innovations in Information Technology (IIT), 2018, p. 175–80. https://doi.org/10.1109/INNOVATIONS.2018.8605995.
Amanda M, Nikan C, Danai K, Abdullah M. Botwalk: efficient adaptive exploration of Twitter bot networks. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, p. 467–74.
Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: data model, graph structure, sentiment analysis and attacks. Expert Syst Appl. 2021;164:114006. https://doi.org/10.1016/j.eswa.2020.114006.
Bastian M, Heymann S, Jacomy M, et al. Gephi: an open source software for exploring and manipulating networks. Icwsm. 2009;8:361–2.
Cai C, Li L, Zengi D. Behavior enhanced deep bot detection in social media. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE; 2017. p 128–30.
Cao Q, Yang X, Yu J, Palow C. Uncovering large groups of active malicious accounts in online social networks. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM; 2014. p. 477–88.
Chavoshi N, Hamooni H, Mueen A. Debot: Twitter bot detection via warped correlation. In: ICDM; 2016. p. 817–22.
Cresci S. A decade of social bot detection. Commun ACM. 2020;63(10):72–83.
Csardi G, Nepusz T. The igraph software package for complex network research. Int J Complex Syst. 2006:1695. http://igraph.sf.net.
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. Botornot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, p. 273–74.
Edwards C, Edwards A, Spence PR, Shelton AK. Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on twitter. Comput Hum Behav. 2014;33:372–6.
Färber M, Qurdina A, Ahmedi L. Identifying Twitter bots using a convolutional neural network. In: CLEF (Working Notes), 2019.
Ferrara Emilio O, Varol Davis C, Menczer F, Flammini A. The rise of social bots. Commun ACM. 2016;59(7):96–104.
Freitas C, Benevenuto F, Ghosh S, Veloso A. Reverse engineering socialbot infiltration strategies in Twitter. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM; 2015. p. 25–32.
Gamallo P, Almatarneh S. Naive-Bayesian classification for bot detection in Twitter. In: CLEF (Working Notes) 2019.
Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J. Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, p. 349–54.
Gilani Z, Farahbakhsh R, Tyson G, Crowcroft J. A large-scale behavioural analysis of bots and humans on Twitter. ACM Trans Web (TWEB). 2019;13(1):1–23.
Jithu P, Shareena J, Ramdas A, Haripriya A. Intrusion detection system for IoT botnet attacks using deep learning. SN Comput Sci. 2021;2(3):1–8.
Luo L, Zhang X, Yang X, Yang W. Deepbot: a deep neural network based approach for detecting Twitter bots. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing, 1, 2020, p. 012063.
Monica C, Nagarathna N. Detection of fake tweets using sentiment analysis. SN Comput Sci. 2020;1(2):1–7.
Orabi M, Mouheb D, Al Aghbari Z, Kamel I. Detection of bots in social media: a systematic review. Inf Process Manag. 2020;57(4):102250. https://doi.org/10.1016/j.ipm.2020.102250.
Pratikakis P. twAwler: a lightweight Twitter crawler. arXiv preprint. 2018. arXiv:1804.07748.
Ratkiewicz J, Conover M, Meiss MR, Gonçalves B, Flammini A, Menczer F. Detecting and tracking political abuse in social media. ICWSM. 2011;11:297–304.
Rezaei A. Using ensemble learning technique for detecting botnet on IoT. SN Comput Sci. 2021;2(3):1–14.
Romanov A, Semenov A, Mazhelis O, Veijalainen J. Detection of fake profiles in social media-literature review. Int Conf Web Inf Syst Technol SCITEPRESS. 2017;2:363–9.
Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021;2(3):1–16.
Sassi IB, Yahia SB. Malicious accounts detection from online social networks: a systematic review of literature. Int J Gen Syst. 2021;50(7):741–814. https://doi.org/10.1080/03081079.2021.1976773.
Shevtsov A, Oikonomidou M, Antonakaki D, Pratikakis P, Ioannidis S. Identification of Twitter bots based on an explainable ML framework on the US 2020 elections dataset. In: 16th International Conference on Web and Social Media (AAAI ICWSM-2022), 2022.
Stefanie H, Timothy DB, Kim H, Andrew T, Cassidy RS, Vincent L. Tweets as impact indicators: examining the implications of automated “bot’’ accounts on Twitter. New York: Wiley; 2016.
Strayer WT, Lapsely D, Walsh R, Livadas C. Botnet detection based on network behavior. In: Botnet detection. Springer: Berlin; 2008. p. 1–24.
Subrahmanian V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F. The DARPA Twitter bot challenge. Computer. 2016;49(6):38–46.
Thomas K, Nicol DM. The Koobface botnet and the rise of social malware. In: Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on. IEEE; 2010. p. 63–70.
Thomas K, Grier C, Song D, Paxson V. Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM; 2011. p. 243–58.
Thomas K, McCoy D, Grier C, Kolcz A, Paxson V. Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium, 2013, p. 195–10.
Varol Onur FE, Davis CA, Menczer F, Flammini A. Online human–bot interactions: detection, estimation, and characterization. 2017. arXiv preprint arXiv:1703.03107.
Zangerle E, Specht G. Sorry, i was hacked: a classification of compromised Twitter accounts. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM; 2014. p. 587–93.
Zhang CM, Paxson V. Detecting and analyzing automated activity on Twitter. In: International Conference on Passive and Active Network Measurement. Springer; 2011. p. 102–111.
Zi C, Steven G, Haining W, Sushil J. Detecting automation of Twitter accounts: are you a human, bot, or cyborg? In: IEEE Transactions on Dependable and Secure Computing, 2012, p. 811–24.
Funding
This document is the results of the research project co-funded by the European Commission (EUROPEAN COMMISSION Directorate-General Communications Networks, Content and Technology), project CONCORDIA, with Grant number 830927, project CyberSANE, with Grant number 833683, project PUZZLE with Grant number 883540 and by the Greek national funds through the Operational Program Competitiveness, Entrepreneurship, and Innovation, under the call RESEARCH—CREATE—INNOVATE (project code: T1EDK-02857 and T1EDK-01800). We would also like to thank the anonymous reviewers for their meaningful and constructive comments that formed the final version of this study.
Author information
Authors and Affiliations
Contributions
All authors have made substantial contributions to conception of the idea of this study. Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis and Alexandros Kanterakis have contributed to the design, acquisition of the dataset, the analysis and the interpretation of the data. Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki and Polyvios Pratikakis have been involved in drafting the manuscript, writing, revising and finalizing it.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
Not applicable.
Consent to participate
Not applicable. All authors have consented to the writing of this study and no additional experiments were made for the purposes of this study.
Consent for publication
All authors have consented for the submission of this manuscript to the journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 and 35.
CDF distribution of all capital words (percentage of all words) over two different sets of clear users (users with zero copied events in our dataset) and users that were identified as bot (with threshold T of copied events). Figure shows that bot user accounts tend to use less capital words in their tweet text
CDF distribution of digits/number usage (percentage) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use less digits/numbers in their tweet text
CDF distribution of digits/number usage (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use less digits/numbers in their tweet text
CDF distribution of words per tweet (standard deviation) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). User market as bot has a more strict range of words per tweet than usual users
CDF distribution of unique retweet hashtags over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). In the figure we notice that bot accounts tend to use more unique hashtags in retweeted text
Rights and permissions
About this article
Cite this article
Shevtsov, A., Oikonomidou, M., Antonakaki, D. et al. Discovery and Classification of Twitter Bots. SN COMPUT. SCI. 3, 255 (2022). https://doi.org/10.1007/s42979-022-01154-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01154-5
Keywords
- Online social networks
- Botnets
- Concurrent content injection
- Trending topics
- Jaccard similarity
- Bot graph
- Bot evolution
- Classification