Skip to main content
Log in

Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Spam tweets might cause numerous problems for users. An automatic method is introduced as a proposed method to detect spam tweets. This method is based on pre-processing and feature extraction steps. The pre-processing step is significant for our problem due to the specific structure of tweets. The pre-processing step is performed in such a way that after which only the words remain in each tweet that can play a key role in determining whether the tweet is spam or non-spam. In the proposed method, the features are classified into five classes of user profile features, account information features, user activity based features, user interaction based features, and tweet content-based features including 28 different features. In the feature selection step, an optimal subset of these features is selected for the learning process. However, a support vector classifier is used for the learning process by two Gaussian and polynomial kernels. Finally, the proposed method is compared with multi-layer perceptron (MLP), Naive Bayes (NB), random forest (RF), and k-nearest neighbors (KNN) methods in terms of standard criteria. The obtained results show the superiority of the proposed method using support vector machine (SVM) algorithm and polynomial kernel with 0.988 precision, 0.953 efficiency, 0.96 accuracy, F-0.969, and 0.985 ROC area under the curve compared to the other methods, indicating that the proposed method has better performance overall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Social Networks and Media 18:100079

    Article  Google Scholar 

  2. Alsaleh M, Alarifi A, Al-Quayed F, Al-Salman AS (2015) Combating comment spam with machine learning approaches. International Conference on Machine Learning and Applications (ICMLA):295–300

  3. Al-Zoubi AM, Faris H, Alqatawna J, Hassonah MA (2018) Evolving support vector machines using Whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowledge-Based Systems 153:91–104

    Article  Google Scholar 

  4. Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. Collaboration. Electronic messaging, Anti-Abuse and Spam Conference (CEAS), p 6

    Google Scholar 

  5. Cao C, Caverlee J (2015) Detecting spam URLs in social media via behavioral analysis. European Conference on Information Retrieval (ECIR):703–714

  6. Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0: Step-by-step data mining guide. In: SPSS

    Google Scholar 

  7. Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted Twitter spam. IEEE Transactions on Information Forensics and Security 12(4):914–925

    Article  Google Scholar 

  8. Danezis G, Mittal P (2009) SybilInfer: detecting sybil nodes using social networks. Network and Distributed System Security Symposium (NDSS). In: San Diego. USA, California

    Google Scholar 

  9. El-Mawass N, Honeine P, Vercouter L (2020) SimilCatch: Enhanced social spammers detection on twitter using Markov random fields. Information Processing & Management 57(6):102317

    Article  Google Scholar 

  10. Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. 10th ACM SIGCOMM Internet Measurement Conference (IMC), Melbourne, Australia:35–47

  11. Heydari A, Tavakoli MA, Salim N, Heydari Z (2015) Detection of review spam: A survey. Expert Syst Appl. 42(7):3634–3642

    Article  Google Scholar 

  12. Howard PN, Kollanyi B (2016) Bots, #Strongerln, and #Brexit: Computational propaganda during the UK-EU referendum. In: Social Science Research Network (SSRN)

    Google Scholar 

  13. Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511

    Article  Google Scholar 

  14. Jindal N, Liu B (2008) Opinion spam and analysis. International Conference on Web Search and Data Mining (WSDM):219–230

  15. Kouvela M (2020) Bot detective: explainable bot detection in twitter. In: A thesis submitted in fulfillment of the requirements for the degree of Master of Data & Web Science

    Google Scholar 

  16. Lee S, Kim J (2012) WarningBird: detecting suspicious URLs in Twitter stream. 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, pp. 183–195.

  17. Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on Twitter. Fifth International Conference on Weblogs and Social Media:185–192

  18. Liu S, Wang Y, Zhang J, Chen C, Xiang Y (2017) Addressing the class imbalance problem in Twitter spam detection using ensemble learning. Comput. Secur. 69:35–49

    Article  Google Scholar 

  19. McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: Calero JMA, Yang LT, Mármol FG, García Villalba LJ, Li AX, Wang Y (eds) Autonomic and Trusted Computing (ATC), vol 6906, pp 175–186

    Chapter  Google Scholar 

  20. Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci. 260:64–73

    Article  Google Scholar 

  21. Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems 5:169–175

    Article  Google Scholar 

  22. Subba Reddy K, Srinivasa Reddy E (2019) Detecting spam messages in Twitter data by machine learning algorithms using cross validation. International Journal of Innovative Technology and Exploring Engineering (IJITEE). ISSN 8(12):2278–3075

    Google Scholar 

  23. Thomas K, Grier C, Ma J, Paxson V, Song D, (2011) Design and evaluation of a real-time URL spam filtering service. 32nd IEEE Symposium on Security and Privacy (S&P), Berkeley, California, USA, pp. 447–462.

  24. Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. International AAAI Conference on Web and Social Media, AAAI Press, pp. 280–289.

  25. Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. Australasian Computer Science Week Multiconference, (3):1–8

  26. Yang C, Harkreader R, Zhang J, Shin S, Gu G (2012) Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on Twitter. 21st International Conference on World Wide Web, WWW ‘12, ACM. USA, New York, NY, pp 71–80

    Google Scholar 

  27. Yu H, Kaminsky M, Gibbons PB, Flaxman AD (2008) SybilGuard: defending against sybil attacks via social networks. IEEE/ACM Transactions on Networking 16(3):576–589

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahnaz Rafie.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, S.B.S., Rafie, M. & Ghorabie, S.M. Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions. Multimed Tools Appl 80, 11583–11605 (2021). https://doi.org/10.1007/s11042-020-10405-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10405-7

Keywords

Navigation