Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions

Ahmad, Saleh Beyt Sheikh; Rafie, Mahnaz; Ghorabie, Seyed Mojtaba

doi:10.1007/s11042-020-10405-7

Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions

Published: 06 January 2021

Volume 80, pages 11583–11605, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Saleh Beyt Sheikh Ahmad¹,
Mahnaz Rafie ORCID: orcid.org/0000-0003-2546-826X² &
Seyed Mojtaba Ghorabie³

874 Accesses
19 Citations
Explore all metrics

Abstract

Spam tweets might cause numerous problems for users. An automatic method is introduced as a proposed method to detect spam tweets. This method is based on pre-processing and feature extraction steps. The pre-processing step is significant for our problem due to the specific structure of tweets. The pre-processing step is performed in such a way that after which only the words remain in each tweet that can play a key role in determining whether the tweet is spam or non-spam. In the proposed method, the features are classified into five classes of user profile features, account information features, user activity based features, user interaction based features, and tweet content-based features including 28 different features. In the feature selection step, an optimal subset of these features is selected for the learning process. However, a support vector classifier is used for the learning process by two Gaussian and polynomial kernels. Finally, the proposed method is compared with multi-layer perceptron (MLP), Naive Bayes (NB), random forest (RF), and k-nearest neighbors (KNN) methods in terms of standard criteria. The obtained results show the superiority of the proposed method using support vector machine (SVM) algorithm and polynomial kernel with 0.988 precision, 0.953 efficiency, 0.96 accuracy, F-0.969, and 0.985 ROC area under the curve compared to the other methods, indicating that the proposed method has better performance overall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine and Deep Learning Algorithms for Twitter Spam Detection

Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network

Comparative Analysis of Classifiers Based on Spam Data in Twitter Sentiments

References

Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Social Networks and Media 18:100079
Article Google Scholar
Alsaleh M, Alarifi A, Al-Quayed F, Al-Salman AS (2015) Combating comment spam with machine learning approaches. International Conference on Machine Learning and Applications (ICMLA):295–300
Al-Zoubi AM, Faris H, Alqatawna J, Hassonah MA (2018) Evolving support vector machines using Whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowledge-Based Systems 153:91–104
Article Google Scholar
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. Collaboration. Electronic messaging, Anti-Abuse and Spam Conference (CEAS), p 6
Google Scholar
Cao C, Caverlee J (2015) Detecting spam URLs in social media via behavioral analysis. European Conference on Information Retrieval (ECIR):703–714
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: Step-by-step data mining guide. In: SPSS
Google Scholar
Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted Twitter spam. IEEE Transactions on Information Forensics and Security 12(4):914–925
Article Google Scholar
Danezis G, Mittal P (2009) SybilInfer: detecting sybil nodes using social networks. Network and Distributed System Security Symposium (NDSS). In: San Diego. USA, California
Google Scholar
El-Mawass N, Honeine P, Vercouter L (2020) SimilCatch: Enhanced social spammers detection on twitter using Markov random fields. Information Processing & Management 57(6):102317
Article Google Scholar
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. 10th ACM SIGCOMM Internet Measurement Conference (IMC), Melbourne, Australia:35–47
Heydari A, Tavakoli MA, Salim N, Heydari Z (2015) Detection of review spam: A survey. Expert Syst Appl. 42(7):3634–3642
Article Google Scholar
Howard PN, Kollanyi B (2016) Bots, #Strongerln, and #Brexit: Computational propaganda during the UK-EU referendum. In: Social Science Research Network (SSRN)
Google Scholar
Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511
Article Google Scholar
Jindal N, Liu B (2008) Opinion spam and analysis. International Conference on Web Search and Data Mining (WSDM):219–230
Kouvela M (2020) Bot detective: explainable bot detection in twitter. In: A thesis submitted in fulfillment of the requirements for the degree of Master of Data & Web Science
Google Scholar
Lee S, Kim J (2012) WarningBird: detecting suspicious URLs in Twitter stream. 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, pp. 183–195.
Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on Twitter. Fifth International Conference on Weblogs and Social Media:185–192
Liu S, Wang Y, Zhang J, Chen C, Xiang Y (2017) Addressing the class imbalance problem in Twitter spam detection using ensemble learning. Comput. Secur. 69:35–49
Article Google Scholar
McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: Calero JMA, Yang LT, Mármol FG, García Villalba LJ, Li AX, Wang Y (eds) Autonomic and Trusted Computing (ATC), vol 6906, pp 175–186
Chapter Google Scholar
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci. 260:64–73
Article Google Scholar
Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems 5:169–175
Article Google Scholar
Subba Reddy K, Srinivasa Reddy E (2019) Detecting spam messages in Twitter data by machine learning algorithms using cross validation. International Journal of Innovative Technology and Exploring Engineering (IJITEE). ISSN 8(12):2278–3075
Google Scholar
Thomas K, Grier C, Ma J, Paxson V, Song D, (2011) Design and evaluation of a real-time URL spam filtering service. 32nd IEEE Symposium on Security and Privacy (S&P), Berkeley, California, USA, pp. 447–462.
Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. International AAAI Conference on Web and Social Media, AAAI Press, pp. 280–289.
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. Australasian Computer Science Week Multiconference, (3):1–8
Yang C, Harkreader R, Zhang J, Shin S, Gu G (2012) Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on Twitter. 21st International Conference on World Wide Web, WWW ‘12, ACM. USA, New York, NY, pp 71–80
Google Scholar
Yu H, Kaminsky M, Gibbons PB, Flaxman AD (2008) SybilGuard: defending against sybil attacks via social networks. IEEE/ACM Transactions on Networking 16(3):576–589
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Arvandan Nonprofit Higher Education Institute, Khorramshahr, Iran
Saleh Beyt Sheikh Ahmad
Department of Computer Engineering, Ramhormoz Branch, Islamic Azad University, Ramhormoz, Iran
Mahnaz Rafie
Department of Computer Engineering, International Branch, Islamic Azad University, Qeshm, Iran
Seyed Mojtaba Ghorabie

Authors

Saleh Beyt Sheikh Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Mahnaz Rafie
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mojtaba Ghorabie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahnaz Rafie.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmad, S.B.S., Rafie, M. & Ghorabie, S.M. Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions. Multimed Tools Appl 80, 11583–11605 (2021). https://doi.org/10.1007/s11042-020-10405-7

Download citation

Received: 11 May 2020
Revised: 08 September 2020
Accepted: 22 December 2020
Published: 06 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10405-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions

Abstract

Access this article

Similar content being viewed by others

Machine and Deep Learning Algorithms for Twitter Spam Detection

Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network

Comparative Analysis of Classifiers Based on Spam Data in Twitter Sentiments

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions

Abstract

Access this article

Similar content being viewed by others

Machine and Deep Learning Algorithms for Twitter Spam Detection

Ensemble Learning Based Feature Selection for Detection of Spam in the Twitter Network

Comparative Analysis of Classifiers Based on Spam Data in Twitter Sentiments

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation