Detection of automated behavior on Twitter through approximate entropy and sample entropy

Gilmary, Rosario; Venkatesan, Akila; Vaiyapuri, Govindasamy

doi:10.1007/s00779-021-01647-9

Detection of automated behavior on Twitter through approximate entropy and sample entropy

Original Article
Published: 20 September 2021

Volume 27, pages 91–105, (2023)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

Rosario Gilmary ORCID: orcid.org/0000-0003-3754-9809¹,
Akila Venkatesan¹ &
Govindasamy Vaiyapuri²

441 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Twitter is an Online Social Network (OSN). It is a significant forum for public expression and building relationships. By 2020, Twitter has reached nearly 353.1 million active users per month. However, a considerable number of online user accounts can be automated profiles. It is predicted that roughly 52 million profiles on Twitter are bots. Some bots perform positive operations like publishing news, scientific articles, and support emergencies. However, there also exist some bots that deceive genuine users by sharing spurious content or distributing malware. Henceforth, discovery of suspicious accounts is mandatory to ensure a safe Twitter environment. This paper has proposed novel approaches to identify bots by determining the randomness and regularity present in the temporal tweet attribute of the user. The real-time tweets posted by individual Twitter profiles are collected and the number of tweets posted by the user over a sampling period is extracted as an activity signal. Later, the degree of regularity present in the activity signals is measured through the lens of entropy. In this work, the probabilistic concepts, Approximate Entropy, and Sample Entropy are utilized to quantify the global degree of regularity in the signal. Accounts with entropy values less than the fixed threshold are labeled as bots. Thus, the nature of the Twitter profile (bot or human) can be determined. Our technique yields an F1 score of 0.8759 and 0.8349 for Approximate Entropy and Sample Entropy, respectively. Point-biserial correlation is employed to establish the association between the entropy values and the class of Twitter users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Social media analytics: a survey of techniques, tools and platforms

Article Open access 26 July 2014

The homophily principle in social network analysis: A survey

Article 18 January 2022

Machine learning-based social media bot detection: a comprehensive literature review

Article Open access 05 January 2023

References

Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of twitter data. In Proceedings of the workshop on language in social media (LSM 2011), 30-38.
Aljohani NR, Fayoumi A, Hassan SU (2020) Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks. Soft Comput:1–12
Avvenuti M, Bellomo S, Cresci S, La Polla MN, Tesconi M (2017) Hybrid crowdsensing: a novel paradigm to combine the strengths of opportunistic and participatory crowdsensing. Proceedings of the 26th international conference on World Wide Web companion, pp 1413–1421
Google Scholar
Bereziński P, Jasiul B, Szpyrka M (2015) An entropy-based network anomaly detection method. Entropy 17(4):2367–2408
Article Google Scholar
Bessi A, Ferrara E (2016) Social bots distort the 2016 US Presidential election online discussion. First Monday 21(11-7)
Bhuvaneswari A, Valliyammai C (2019) Information entropy based event detection during disaster in cyber-social networks. J Intell Fuzzy Syst 36(5):3981–3992
Article Google Scholar
Bonett DG (2020) Point-biserial correlation: Interval estimation, hypothesis testing, meta-analysis, and sample size determination. Br J Math Stat Psychol 73:113–144
Article Google Scholar
Chavoshi N, Hamooni H, Mueen A (2016) Identifying correlated bots in twitter. In: International conference on social informatics 2016. Springer, pp 14–21
Google Scholar
Chen X, Solomon IC, Chon KH (2005) Comparison of the use of approximate entropy and sample entropy: applications to neural respiratory signal. In: In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference 2006. IEEE, pp 4212–4215
Google Scholar
Chernick MR, LaBudde RA (2014) An introduction to bootstrap methods with applications to R. John Wiley & Sons.
Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Transactions on dependable and secure computing 9(6):811–824
Article Google Scholar
Costa M, Goldberger AL, Peng CK (2005) Multiscale entropy analysis of biological signals. Phys Rev E 71(2):021906
Article MathSciNet Google Scholar
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst 80:56–71
Article Google Scholar
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In Proceedings of the 26th international conference on world wide web companion, 963-972
Cresci S, Lillo F, Regoli D, Tardelli S, Tesconi M (2018) $ FAKE: evidence of spam and bot activity in stock microblogs on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, 12(1).
Cresci S, Petrocchi M, Spognardi A, Tognazzi S (2018) From reaction to proaction: unexplored ways to the detection of evolving spambots. In Companion Proceedings of the The Web Conference, 1469-1470.
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F(2016) Botornot: a system to evaluate social bots. In Proceedings of the 25th international conference companion on world wide web, 273-274.
Echeverria J, Zhou S (2017) Discovery, retrieval, and analysis of the 'star wars' botnet in Twitter. In Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, 1-8.
Firdaus SN, Ding C, Sadeghian A (2018) Retweet: a popular information diffusion mechanism—a survey paper. Online Social Networks and Media 6:26–40
Article Google Scholar
Ghosh R, Surachawala T, Lerman K. (2011) Entropy-based classification of 'retweeting' activity on twitter. arXiv preprint arXiv:1106.0346.
Gianvecchio S, Xie M, Wu Z, Wang H (2011) Humans and bots in internet chat: measurement, analysis, and automated classification. IEEE/ACM Trans Networking 19(5):1557–1571
Article Google Scholar
Gianvecchio S, Xie M, Wu Z, Wang H. (2008) Measurement and Classification of Humans and Bots in Internet Chat. In USENIX security symposium, 155-170.
Gilani Z, Kochmar E, Crowcroft J (2017) Classification of twitter accounts into automated agents and human users. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 489-496.
Haustein S, Bowman TD, Holmberg K, Tsou A, Sugimoto CR, Larivière V (2016) Tweets as impact indicators: examining the implications of automated “bot” accounts on T witter. J Assoc Inf Sci Technol 67(1):232–238
Article Google Scholar
Holzinger A, Hörtenhuber M, Mayer C, Bachler M, Wassertheurer S, Pinho AJ, Koslicki D (2014) On entropy-based data mining. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics 2014:209–226
Google Scholar
Kabakus AT, Kara R (2017) A survey of spam detection methods on twitter. Int J Adv Comput Sci Appl 8(3):29–38
Google Scholar
Keller TR, Klinger U (2019) Social bots in election campaigns: theoretical, empirical, and methodological implications. Polit Commun 36(1):171–189
Article Google Scholar
Kvålseth TO (2016) On the measurement of randomness (uncertainty): a more informative entropy. Entropy 18(5):159
Article Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web 2010, 591-600.
Kwak SG, Kim JH (2017) Central limit theorem: the cornerstone of modern statistics. Korean journal of anesthesiology 70(2):144–156
Article Google Scholar
Latah M (2020) Detection of malicious social bots: a survey and a refined taxonomy. Expert Syst Appl 151:113383
Article Google Scholar
Lingam G, Rout RR, Somayajulu DV, Das SK (2020) Social botnet community detection: a novel approach based on behavioral similarity in twitter network using deep learning. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, 708-718.
Liu H, Han J, Motoda H (2014). Uncovering deception in social media.
Mazza M, Cresci S, Avvenuti M, Quattrociocchi W, Tesconi M (2019) Rtbust: Exploiting temporal patterns for botnet detection on twitter. In Proceedings of the 10th ACM Conference on Web Science, 183-192.
Mislove A, Lehmann S, Ahn YY, Onnela JP, Rosenquist J (2011) Understanding the demographics of Twitter users. In Proceedings of the International AAAI Conference on Web and Social Media 2011:5(1)
Google Scholar
Molina-Picó A, Cuesta-Frau D, Aboy M, Crespo C, Miró-Martínez P, Oltra-Crespo S (2011) Comparative study of approximate entropy and sample entropy robustness to spikes. Artif Intell Med 53(2):97–106
Article Google Scholar
Perdana RS, Muliawati TH, Alexandro R (2015) Bot spammer detection in Twitter using tweet similarity and time interval entropy. Jurnal Ilmu Komputer dan Informasi 8(1):19–25
Article Google Scholar
Pincus S (1995) Approximate entropy (ApEn) as a complexity measure. Chaos: An Interdisciplinary Journal of Nonlinear Science 5(1):110–117
Article MathSciNet Google Scholar
Pincus SM (1991) Approximate entropy as a measure of system complexity. Proc Natl Acad Sci 88(6):2297–2301
Article MathSciNet MATH Google Scholar
Pincus SM, Huang WM (1992) Approximate entropy: statistical properties and applications. Communications in Statistics-Theory and Methods 21(11):3061–3077
Article MATH Google Scholar
Rauchfleisch A, Kaiser J (2020) The false positive problem of automatic bot detection in social science research. PLoS One 15(10):e0241045
Article Google Scholar
Richman JS, Moorman JR (2000) Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology
Samper-Escalante LD, Loyola-González O, Monroy R, Medina-Pérez MA (2021) Bot Datasets on Twitter: Analysis and Challenges. Appl Sci 11(9):4105
Article Google Scholar
Savage S, Monroy-Hernandez A, Höllerer T (2016) Botivist: Calling volunteers to action using online bots. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 813-822.
Shao C, Ciampaglia GL, Varol O, Yang KC, Flammini A, Menczer F (2018) The spread of low-credibility content by social bots. Nat Commun 9(1):1–9
Article Google Scholar
Song J, Lee S, Kim J (2011) Spam filtering in twitter using sender-receiver relationship. In International workshop on recent advances in intrusion detection, 301-317.
Starbird K, Arif A, Wilson T (2019) Disinformation as collaborative work: surfacing the participatory nature of strategic information operations. Proceedings of the ACM on Human-Computer Interaction 3(CSCW):1–26
Article Google Scholar
Stella M, Ferrara E, De Domenico M (2018) Bots increase exposure to negative and inflammatory content in online social systems. Proc Natl Acad Sci 115(49):12435–12440
Article Google Scholar
Twitter (2017) Automation Rules —Twitter Help Center. Available from: https://help. twitter.com/en/rules- and- policies/twitter- automation
Twitter Dev. Developer Agreement and Policy. Twitter Incorporated. 2020. Available online: https://developer.twitter.com/en/developer-terms/agreement-and-policy (accessed on 15 November 2020).
Varol O, Ferrara E, Davis C, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In Proceedings of the International AAAI Conference on Web and Social Media, 11(1).
Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151
Article Google Scholar
Wang G, Mohanlal M, Wilson C, Wang X, Metzger M, Zheng H, Zhao BY (2012) Social turing tests: crowdsourcing sybil detection. arXiv preprint arXiv:1205.3856.
Yamaguchi Y, Amagasa T, Kitagawa H (2011) Tag-based user topic discovery using twitter lists. In 2011 International Conference on Advances in Social Networks Analysis and Mining, IEEE, 13-20.
Yang C, Harkreader R, Gu G (2013) Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security 8(8):1280–1293
Article Google Scholar

Download references

Funding

This work has been supported by Research Grant No.SPG/2020/000594 under the SERB POWER grant scheme, Science and Engineering Research Board, Government of India., to Akila Venkatesan, Pondicherry Engineering College, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pondicherry Engineering College, Pondicherry, India
Rosario Gilmary & Akila Venkatesan
Department of Information Technology, Pondicherry Engineering College, Pondicherry, India
Govindasamy Vaiyapuri

Authors

Rosario Gilmary
View author publications
You can also search for this author in PubMed Google Scholar
Akila Venkatesan
View author publications
You can also search for this author in PubMed Google Scholar
Govindasamy Vaiyapuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosario Gilmary.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gilmary, R., Venkatesan, A. & Vaiyapuri, G. Detection of automated behavior on Twitter through approximate entropy and sample entropy. Pers Ubiquit Comput 27, 91–105 (2023). https://doi.org/10.1007/s00779-021-01647-9

Download citation

Received: 08 November 2020
Accepted: 17 August 2021
Published: 20 September 2021
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00779-021-01647-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection of automated behavior on Twitter through approximate entropy and sample entropy

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

The homophily principle in social network analysis: A survey

Machine learning-based social media bot detection: a comprehensive literature review

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection of automated behavior on Twitter through approximate entropy and sample entropy

Abstract

Access this article

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

The homophily principle in social network analysis: A survey

Machine learning-based social media bot detection: a comprehensive literature review

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation