Skip to main content
Log in

Discovery and Classification of Twitter Bots

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Online social networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. The social influence of large groups may be affected by user beliefs or be attracted by the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence more OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities, and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long-term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze the statistical differences between the bot accounts and the human users, as well as the botnet interactions with the user communities and the Twitter trending topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability Statement

We consulted our Data Protection Officer (DPO) for the usage of this dataset, as well as the sharing. To obtain the dataset used for the analysis described in this study, we follow the Twitter API restrictions and do not violate any terms from the Developer Agreement and Policies. According to Twitter Policy, we are not allowed to share the entire dataset, but only 100K user IDs. This dataset is available here: https://zenodo.org/record/4715885#.YILrU6kzadZ.

Code availability

The source code used for data collection and initial analysis is located here: https://github.com/polyvios/twAwler.

References

  1. Alothali E, Zaki N, Mohamed EA, Alashwal H. Detecting social bots on Twitter: a literature review. In: 2018 International Conference on Innovations in Information Technology (IIT), 2018, p. 175–80. https://doi.org/10.1109/INNOVATIONS.2018.8605995.

  2. Amanda M, Nikan C, Danai K, Abdullah M. Botwalk: efficient adaptive exploration of Twitter bot networks. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, p. 467–74.

  3. Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: data model, graph structure, sentiment analysis and attacks. Expert Syst Appl. 2021;164:114006. https://doi.org/10.1016/j.eswa.2020.114006.

    Article  Google Scholar 

  4. Bastian M, Heymann S, Jacomy M, et al. Gephi: an open source software for exploring and manipulating networks. Icwsm. 2009;8:361–2.

    Google Scholar 

  5. Cai C, Li L, Zengi D. Behavior enhanced deep bot detection in social media. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE; 2017. p 128–30.

  6. Cao Q, Yang X, Yu J, Palow C. Uncovering large groups of active malicious accounts in online social networks. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM; 2014. p. 477–88.

  7. Chavoshi N, Hamooni H, Mueen A. Debot: Twitter bot detection via warped correlation. In: ICDM; 2016. p. 817–22.

  8. Cresci S. A decade of social bot detection. Commun ACM. 2020;63(10):72–83.

    Article  Google Scholar 

  9. Csardi G, Nepusz T. The igraph software package for complex network research. Int J Complex Syst. 2006:1695. http://igraph.sf.net.

  10. Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. Botornot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, p. 273–74.

  11. Edwards C, Edwards A, Spence PR, Shelton AK. Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on twitter. Comput Hum Behav. 2014;33:372–6.

    Article  Google Scholar 

  12. Färber M, Qurdina A, Ahmedi L. Identifying Twitter bots using a convolutional neural network. In: CLEF (Working Notes), 2019.

  13. Ferrara Emilio O, Varol Davis C, Menczer F, Flammini A. The rise of social bots. Commun ACM. 2016;59(7):96–104.

    Article  Google Scholar 

  14. Freitas C, Benevenuto F, Ghosh S, Veloso A. Reverse engineering socialbot infiltration strategies in Twitter. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. ACM; 2015. p. 25–32.

  15. Gamallo P, Almatarneh S. Naive-Bayesian classification for bot detection in Twitter. In: CLEF (Working Notes) 2019.

  16. Gilani Z, Farahbakhsh R, Tyson G, Wang L, Crowcroft J. Of bots and humans (on Twitter). In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, p. 349–54.

  17. Gilani Z, Farahbakhsh R, Tyson G, Crowcroft J. A large-scale behavioural analysis of bots and humans on Twitter. ACM Trans Web (TWEB). 2019;13(1):1–23.

    Article  Google Scholar 

  18. Jithu P, Shareena J, Ramdas A, Haripriya A. Intrusion detection system for IoT botnet attacks using deep learning. SN Comput Sci. 2021;2(3):1–8.

    Google Scholar 

  19. Luo L, Zhang X, Yang X, Yang W. Deepbot: a deep neural network based approach for detecting Twitter bots. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing, 1, 2020, p. 012063.

  20. Monica C, Nagarathna N. Detection of fake tweets using sentiment analysis. SN Comput Sci. 2020;1(2):1–7.

    Article  Google Scholar 

  21. Orabi M, Mouheb D, Al Aghbari Z, Kamel I. Detection of bots in social media: a systematic review. Inf Process Manag. 2020;57(4):102250. https://doi.org/10.1016/j.ipm.2020.102250.

    Article  Google Scholar 

  22. Pratikakis P. twAwler: a lightweight Twitter crawler. arXiv preprint. 2018. arXiv:1804.07748.

  23. Ratkiewicz J, Conover M, Meiss MR, Gonçalves B, Flammini A, Menczer F. Detecting and tracking political abuse in social media. ICWSM. 2011;11:297–304.

    Google Scholar 

  24. Rezaei A. Using ensemble learning technique for detecting botnet on IoT. SN Comput Sci. 2021;2(3):1–14.

    Article  Google Scholar 

  25. Romanov A, Semenov A, Mazhelis O, Veijalainen J. Detection of fake profiles in social media-literature review. Int Conf Web Inf Syst Technol SCITEPRESS. 2017;2:363–9.

    Google Scholar 

  26. Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021;2(3):1–16.

    MathSciNet  Google Scholar 

  27. Sassi IB, Yahia SB. Malicious accounts detection from online social networks: a systematic review of literature. Int J Gen Syst. 2021;50(7):741–814. https://doi.org/10.1080/03081079.2021.1976773.

    Article  Google Scholar 

  28. Shevtsov A, Oikonomidou M, Antonakaki D, Pratikakis P, Ioannidis S. Identification of Twitter bots based on an explainable ML framework on the US 2020 elections dataset. In: 16th International Conference on Web and Social Media (AAAI ICWSM-2022), 2022.

  29. Stefanie H, Timothy DB, Kim H, Andrew T, Cassidy RS, Vincent L. Tweets as impact indicators: examining the implications of automated “bot’’ accounts on Twitter. New York: Wiley; 2016.

    Google Scholar 

  30. Strayer WT, Lapsely D, Walsh R, Livadas C. Botnet detection based on network behavior. In: Botnet detection. Springer: Berlin; 2008. p. 1–24.

    Google Scholar 

  31. Subrahmanian V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F. The DARPA Twitter bot challenge. Computer. 2016;49(6):38–46.

    Article  Google Scholar 

  32. Thomas K, Nicol DM. The Koobface botnet and the rise of social malware. In: Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on. IEEE; 2010. p. 63–70.

  33. Thomas K, Grier C, Song D, Paxson V. Suspended accounts in retrospect: an analysis of Twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM; 2011. p. 243–58.

  34. Thomas K, McCoy D, Grier C, Kolcz A, Paxson V. Trafficking fraudulent accounts: the role of the underground market in Twitter spam and abuse. In: USENIX Security Symposium, 2013, p. 195–10.

  35. Varol Onur FE, Davis CA, Menczer F, Flammini A. Online human–bot interactions: detection, estimation, and characterization. 2017. arXiv preprint arXiv:1703.03107.

  36. Zangerle E, Specht G. Sorry, i was hacked: a classification of compromised Twitter accounts. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM; 2014. p. 587–93.

  37. Zhang CM, Paxson V. Detecting and analyzing automated activity on Twitter. In: International Conference on Passive and Active Network Measurement. Springer; 2011. p. 102–111.

  38. Zi C, Steven G, Haining W, Sushil J. Detecting automation of Twitter accounts: are you a human, bot, or cyborg? In: IEEE Transactions on Dependable and Secure Computing, 2012, p. 811–24.

Download references

Funding

This document is the results of the research project co-funded by the European Commission (EUROPEAN COMMISSION Directorate-General Communications Networks, Content and Technology), project CONCORDIA, with Grant number 830927, project CyberSANE, with Grant number 833683, project PUZZLE with Grant number 883540 and by the Greek national funds through the Operational Program Competitiveness, Entrepreneurship, and Innovation, under the call RESEARCH—CREATE—INNOVATE (project code: T1EDK-02857 and T1EDK-01800). We would also like to thank the anonymous reviewers for their meaningful and constructive comments that formed the final version of this study.

Author information

Authors and Affiliations

Authors

Contributions

All authors have made substantial contributions to conception of the idea of this study. Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis and Alexandros Kanterakis have contributed to the design, acquisition of the dataset, the analysis and the interpretation of the data. Alexander Shevtsov, Maria Oikonomidou, Despoina Antonakaki and Polyvios Pratikakis have been involved in drafting the manuscript, writing, revising and finalizing it.

Corresponding author

Correspondence to Despoina Antonakaki.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

Not applicable.

Consent to participate

Not applicable. All authors have consented to the writing of this study and no additional experiments were made for the purposes of this study.

Consent for publication

All authors have consented for the submission of this manuscript to the journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Figs. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 and 35.

Fig. 11
figure 11

CDF distribution of all capital words (percentage of all words) over two different sets of clear users (users with zero copied events in our dataset) and users that were identified as bot (with threshold T of copied events). Figure shows that bot user accounts tend to use less capital words in their tweet text

Fig. 12
figure 12

CDF distribution of all capital words (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use less capital words in their tweet text

Fig. 13
figure 13

CDF distribution of digits/number usage (percentage) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use less digits/numbers in their tweet text

Fig. 14
figure 14

CDF distribution of digits/number usage (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use less digits/numbers in their tweet text

Fig. 15
figure 15

CDF distribution of emoji usage (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use more emojis in their tweet text

Fig. 16
figure 16

CDF distribution of emoticons usage (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to use more emojis in their tweet text

Fig. 17
figure 17

CDF distribution of number of favorited accounts (percentage) over two different sets of clear users (users with zero copied events in our dataset) and users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to have more favorited accounts

Fig. 18
figure 18

CDF distribution of number of favoriters accounts (percentage) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 19
figure 19

CDF distribution of number of favoriters accounts (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that bot user accounts tend to have more favoriters accounts

Fig. 20
figure 20

CDF distribution of friends to followers Jaccard similarity over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Figure shows that two distributions are very similar

Fig. 21
figure 21

CDF distribution of number of mentions of each user (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 22
figure 22

CDF distribution of number of mentioned user (count) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 23
figure 23

CDF distribution of quoted by other users (average with weight) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 24
figure 24

CDF distribution of quoted users (average with weight) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 25
figure 25

CDF distribution of quoted by other users (count degree) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 26
figure 26

CDF distribution of quoted users (count degree) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 27
figure 27

CDF distribution of quoted by other users (count weighted) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 28
figure 28

CDF distribution of quoted users (count weighted) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 29
figure 29

CDF distribution of incoming retweets (count degree) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 30
figure 30

CDF distribution of outcoming retweets (count degree) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 31
figure 31

CDF distribution of incoming retweets (count weighted) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 32
figure 32

CDF distribution of outcoming retweets (count weighted) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events)

Fig. 33
figure 33

CDF distribution of words per tweet (standard deviation) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). User market as bot has a more strict range of words per tweet than usual users

Fig. 34
figure 34

CDF distribution of unique retweet hashtags over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). In the figure we notice that bot accounts tend to use more unique hashtags in retweeted text

Fig. 35
figure 35

CDF distribution of URLs per tweet (average) over two different sets of clear users (users with zero copied events in our dataset) and the users that were identified as bots (with threshold T of copied events). Bot user accounts tend to use less URLs on average in their tweets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shevtsov, A., Oikonomidou, M., Antonakaki, D. et al. Discovery and Classification of Twitter Bots. SN COMPUT. SCI. 3, 255 (2022). https://doi.org/10.1007/s42979-022-01154-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01154-5

Keywords

Navigation