Abstract
Spam tweets might cause numerous problems for users. An automatic method is introduced as a proposed method to detect spam tweets. This method is based on pre-processing and feature extraction steps. The pre-processing step is significant for our problem due to the specific structure of tweets. The pre-processing step is performed in such a way that after which only the words remain in each tweet that can play a key role in determining whether the tweet is spam or non-spam. In the proposed method, the features are classified into five classes of user profile features, account information features, user activity based features, user interaction based features, and tweet content-based features including 28 different features. In the feature selection step, an optimal subset of these features is selected for the learning process. However, a support vector classifier is used for the learning process by two Gaussian and polynomial kernels. Finally, the proposed method is compared with multi-layer perceptron (MLP), Naive Bayes (NB), random forest (RF), and k-nearest neighbors (KNN) methods in terms of standard criteria. The obtained results show the superiority of the proposed method using support vector machine (SVM) algorithm and polynomial kernel with 0.988 precision, 0.953 efficiency, 0.96 accuracy, F-0.969, and 0.985 ROC area under the curve compared to the other methods, indicating that the proposed method has better performance overall.
Similar content being viewed by others
References
Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Social Networks and Media 18:100079
Alsaleh M, Alarifi A, Al-Quayed F, Al-Salman AS (2015) Combating comment spam with machine learning approaches. International Conference on Machine Learning and Applications (ICMLA):295–300
Al-Zoubi AM, Faris H, Alqatawna J, Hassonah MA (2018) Evolving support vector machines using Whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowledge-Based Systems 153:91–104
Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. Collaboration. Electronic messaging, Anti-Abuse and Spam Conference (CEAS), p 6
Cao C, Caverlee J (2015) Detecting spam URLs in social media via behavioral analysis. European Conference on Information Retrieval (ECIR):703–714
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: Step-by-step data mining guide. In: SPSS
Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted Twitter spam. IEEE Transactions on Information Forensics and Security 12(4):914–925
Danezis G, Mittal P (2009) SybilInfer: detecting sybil nodes using social networks. Network and Distributed System Security Symposium (NDSS). In: San Diego. USA, California
El-Mawass N, Honeine P, Vercouter L (2020) SimilCatch: Enhanced social spammers detection on twitter using Markov random fields. Information Processing & Management 57(6):102317
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. 10th ACM SIGCOMM Internet Measurement Conference (IMC), Melbourne, Australia:35–47
Heydari A, Tavakoli MA, Salim N, Heydari Z (2015) Detection of review spam: A survey. Expert Syst Appl. 42(7):3634–3642
Howard PN, Kollanyi B (2016) Bots, #Strongerln, and #Brexit: Computational propaganda during the UK-EU referendum. In: Social Science Research Network (SSRN)
Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511
Jindal N, Liu B (2008) Opinion spam and analysis. International Conference on Web Search and Data Mining (WSDM):219–230
Kouvela M (2020) Bot detective: explainable bot detection in twitter. In: A thesis submitted in fulfillment of the requirements for the degree of Master of Data & Web Science
Lee S, Kim J (2012) WarningBird: detecting suspicious URLs in Twitter stream. 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, pp. 183–195.
Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on Twitter. Fifth International Conference on Weblogs and Social Media:185–192
Liu S, Wang Y, Zhang J, Chen C, Xiang Y (2017) Addressing the class imbalance problem in Twitter spam detection using ensemble learning. Comput. Secur. 69:35–49
McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: Calero JMA, Yang LT, Mármol FG, García Villalba LJ, Li AX, Wang Y (eds) Autonomic and Trusted Computing (ATC), vol 6906, pp 175–186
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci. 260:64–73
Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems 5:169–175
Subba Reddy K, Srinivasa Reddy E (2019) Detecting spam messages in Twitter data by machine learning algorithms using cross validation. International Journal of Innovative Technology and Exploring Engineering (IJITEE). ISSN 8(12):2278–3075
Thomas K, Grier C, Ma J, Paxson V, Song D, (2011) Design and evaluation of a real-time URL spam filtering service. 32nd IEEE Symposium on Security and Privacy (S&P), Berkeley, California, USA, pp. 447–462.
Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. International AAAI Conference on Web and Social Media, AAAI Press, pp. 280–289.
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. Australasian Computer Science Week Multiconference, (3):1–8
Yang C, Harkreader R, Zhang J, Shin S, Gu G (2012) Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on Twitter. 21st International Conference on World Wide Web, WWW ‘12, ACM. USA, New York, NY, pp 71–80
Yu H, Kaminsky M, Gibbons PB, Flaxman AD (2008) SybilGuard: defending against sybil attacks via social networks. IEEE/ACM Transactions on Networking 16(3):576–589
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmad, S.B.S., Rafie, M. & Ghorabie, S.M. Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions. Multimed Tools Appl 80, 11583–11605 (2021). https://doi.org/10.1007/s11042-020-10405-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10405-7