Arabic spam tweets classification using deep learning

Kaddoura, Sanaa; Alex, Suja A.; Itani, Maher; Henno, Safaa; AlNashash, Asma; Hemanth, D. Jude

doi:10.1007/s00521-023-08614-w

Arabic spam tweets classification using deep learning

Original Article
Published: 29 April 2023

Volume 35, pages 17233–17246, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Sanaa Kaddoura ORCID: orcid.org/0000-0002-4384-4364¹,
Suja A. Alex²,
Maher Itani³,
Safaa Henno¹,
Asma AlNashash⁴ &
…
D. Jude Hemanth⁵

385 Accesses
3 Citations
Explore all metrics

Abstract

With the increased use of social network sites, such as Twitter, attackers exploit these platforms to spread counterfeit content. Such content can be fake advertisements or illegal content. Classifying such content is a challenging task, especially in Arabic. The Arabic language has a complex structure and makes classification tasks more difficult. This paper presents an approach to classifying Arabic tweets using classical machine learning (non-deep machine learning) and deep learning techniques. Tweets corpus were collected through Twitter API and labelled manually to get a reliable dataset. For an efficient classifier, feature extraction is applied to the corpus dataset. Then, two learning techniques are used for each feature extraction technique on the created dataset using N-gram models (uni-gram, bi-gram, and char-gram). The applied classical machine learning algorithms are support vector machines, neural networks, logistics regression, and naïve Bayes. Global vector (GloVe) and fastText learning models are utilised for the deep learning approaches. The Precision, Recall, and F1-score are the suggested performance measures calculated in this paper. Afterwards, the dataset is increased using the synthetic minority oversampling technique class to create a balanced dataset. After applying the classical machine learning models, the experimental results show that the neural network algorithm outperforms the other algorithms. Moreover, the GloVe outperforms the fastText model for the deep learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

Automatic detection of hate speech in code-mixed Indian languages in twitter social media interaction using DConvBLSTM-MuRIL ensemble method

Article 25 May 2024

A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions

Article 30 May 2024

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Alharbi AR, Aljaedi A (2019) Predicting rogue content and arabic spammers on twitter. Future Internet 11(11):229
Article Google Scholar
Benevenuto F, Magno G, Rodrigues T and Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS) (Vol. 6, No. 2010, p. 12)
Wang AH (2010) Don't follow me: Spam detection in twitter. In: 2010 international conference on security and cryptography (SECRYPT), pp 1–10. IEEE
Wu T, Wen S, Xiang Y, Zhou W (2018) Twitter spam detection: survey of new approaches and comparative study. Comput Secur 76:265–284
Article Google Scholar
Kaddoura S, Chandrasekaran G, Elena Popescu D, Duraisamy JH (2022) A systematic literature review on spam content detection and classification. PeerJ Comput Sci 8:830. https://doi.org/10.7717/peerj-cs.830
Article Google Scholar
Kaddoura S, Alfandi O and Dahmani N (2020) A spam email detection mechanism for english language text emails using deep learning approach. In: 2020 IEEE 29th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE). IEEE, pp 193–198. https://doi.org/10.1109/WETICE49692.2020.00045
Kaddoura S (2021) Classification of malicious and benign websites by network features using supervised machine learning algorithms. In: 2021 5th Cyber security in networking conference (CSNet). IEEE, pp 36–40. https://doi.org/10.1109/CSNet52717.2021.9614273
Kaddoura S, Arid AE and Moukhtar M (2021) Evaluation of supervised machine learning algorithms for multi-class intrusion detection systems. In: Proceedings of the future technologies conference. Springer, Cham, pp 1–16. https://doi.org/10.1007/978-3-030-89912-7_1
Ahmed I, Aljahdali S, Khan MS, Kaddoura S (2022) Classification of parkinson disease based on patient’s voice signal using machine learning. Intell Autom Soft Comput 32(2):705–722. https://doi.org/10.32604/iasc.2022.022037
Article Google Scholar
Mubarak, H., Abdelali, A., Hassan, S. and Darwish, K., 2020, October. Spam detection on arabic twitter. In International Conference on Social Informatics (pp. 237–251). Springer, Cham.
Saeed RM, Rady S, Gharib TF (2022) An ensemble approach for spam detection in Arabic opinion texts. J King Saud University-Comput Inf Sci 34(1):1407–1416
Google Scholar
Kaddoura S, Itani M, Roast C (2021) Analyzing the effect of negation in sentiment polarity of facebook dialectal arabic text. Appl Sci 11(11):4768. https://doi.org/10.3390/app11114768
Article Google Scholar
Kaddoura S, Ahmed DR (2022) A comprehensive review on Arabic word sense disambiguation for natural language processing applications. Wiley Interdisciplinary Rev Data Mining Knowl Discov 12:e1447. https://doi.org/10.1002/widm.1447
Article Google Scholar
Ekmekcioglu FC, Lynch MF, Willett P (1996) Stemming and n-gram matching for term conflation in Turkish texts. Inf Res 2(2):2–2
Google Scholar
Daneshvar S and Inkpen D (2018) Gender identification in twitter using n-grams and lsa. In: Proceedings of the Ninth international conference of the CLEF association (CLEF 2018).
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Ma Y, He H (Eds.). (2013) Imbalanced learning: foundations, algorithms, and applications
Markines B, Cattuto C, Menczer F (2009) Social spam detection. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web, pp 41–48)
Wang AH (2010) Machine learning for the detection of spam in twitter networks. In: International conference on e-business and telecommunications, pp 319–333. Springer, Berlin, Heidelberg
Wang AH (2010) Don't follow me: Spam detection in twitter. In: 2010 international conference on security and cryptography (SECRYPT), pp 1–10. IEEE
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: international conference on Autonomic and trusted computing, pp 175–186. Springer, Berlin, Heidelberg
Shirani-Mehr H (2013) SMS spam detection using machine learning approach. unpublished) http://cs229stanford.edu/proj2013/ShiraniMehr-SMSSpamDetectionUsingMachineLearningApproach.pdf
Meda C, Bisio F, Gastaldo P, Zunino R (2014) A machine learning approach for Twitter spammers detection. In: 2014 international carnahan conference on security technology (iccst), pp 1–6. IEEE
Chen C, Zhang J, Xie Y, Xiang Y, Zhou W, Hassan MM, Alrubaian M (2015) A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans Comput Soc Syst 2(3):65–76
Article Google Scholar
Chen C, Zhang J, Chen X, Xiang Y, Zhou W (2015) 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: 2015 IEEE international conference on communications (ICC), pp 7065–7070. IEEE
Trivedi SK (2016) A study of machine learning classifiers for spam detection. In: 2016 4th international symposium on computational and business Intelligence (ISCBI), pp 176–180. IEEE
Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2016) Statistical features-based real-time detection of drifted twitter spam. IEEE Trans Inf Forens Secur 12(4):914–925
Article Google Scholar
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the australasian computer science week multiconference, pp 1–8
Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for Twitter. In: 2017 14th international bhurban conference on applied sciences and technology (IBCAST), pp 466–471. IEEE
Liu S, Wang Y, Zhang J, Chen C, Xiang Y (2017) Addressing the class imbalance problem in twitter spam detection using ensemble learning. Comput Secur 69:35–49
Article Google Scholar
Li C, Liu S (2018) A comparative study of the class imbalance problem in Twitter spam detection. Concurr Comput Pract Exp 30(5):e4281
Article Google Scholar
Gupta H, Jamal MS, Madisetty S, Desarkar MS (2018) A framework for real-time spam detection in Twitter. In: 2018 10th international conference on communication systems & networks (COMSNETS), pp 380–383. IEEE
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in Twitter. IEEE Trans Comput Soc Syst 5(4):973–984
Article Google Scholar
Itani M (2018) Sentiment analysis and resources for informal Arabic text on social media, Doctoral dissertation, Sheffield Hallam University.
Falak A, Ghous H, Malik M (2021) Twitter spam detection using machine learning. Int J Sci Eng Res, 12(2)
Ding Z, Xia R, Yu J, Li X, Yang J (2018) Densely connected bidirectional lstm with applications to sentence classification. In: Natural language processing and chinese computing: 7th CCF international conference, NLPCC 2018, Hohhot, China, August 26–30, 2018, Proceedings, Part II 7, pp 278–287. Springer International Publishing.
Rojas RF, Romero J, Lopez-Aparicio J, Ou KL (2021) Pain assessment based on fnirs using bi-lstm rnns. In: 2021 10th international IEEE/EMBS conference on neural engineering (NER, pp 399–402). IEEE
Jaihuni M, Basak JK, Khan F, Okyere FG, Sihalath T, Bhujel A, Kim HT (2022) A novel recurrent neural network approach in forecasting short term solar irradiance. ISA transactions 121:63–74
Article Google Scholar
Sunny MAI, Maswood MMS, Alharbi AG (2020) Deep learning-based stock price prediction using LSTM and bi-directional LSTM model. In: 2020 2nd novel intelligent and leading emerging sciences conference (NILES), pp 87–92. IEEE
Hegde A, Coelho S, Shashirekha H (2022) MUCS@ DravidianLangTech@ ACL2022: ensemble of logistic regression penalties to identify Emotions in Tamil Text. In: Proceedings of the second workshop on speech and language technologies for Dravidian languages, (pp 145–150
Liu J, Rong Y, Takáč M, Huang J (2019) Accelerating distributed stochastic L-BFGS by sampled 2nd Order Information. Beyond first order methods in ML@ NeurIPS.
Koh K, Kim SJ, Boyd S (2007) A Method for large-scale l~ 1-regularized logistic regression. In: AAAI, pp 565–571
Zhang P, Shen C (2019) Choice of the number of hidden layers for back propagation neural network driven by stock price data and application to price prediction. In: Journal of physics: conference series (Vol. 1302, No. 2, p. 022017). IOP Publishing
Zhang C, Woodland PC (2015) Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling. In: Sixteenth annual conference of the international speech communication association
Huda NS, Mubarok MS (2019) A multi-label classification on topics of quranic verses (english translation) using backpropagation neural network with stochastic gradient descent and adam optimiser. In: 2019 7th International conference on information and communication technology (ICoICT), pp 1–5. IEEE
Goel A, and Srivastava SK (2016) Role of kernel parameters in performance evaluation of SVM. In: 2016 Second international conference on computational Intelligence & communication technology (CICT), pp 166–169. IEEE
Xu PF, Cheng C, Cheng HX, Shen YL, Ding YX (2020) Identification-based 3 DOF model of unmanned surface vehicle using support vector machines enhanced by cuckoo search algorithm. Ocean Eng 197:106898
Article Google Scholar
Wang H, and Hu D (2005) Comparison of SVM and LS-SVM for regression. In: 2005 International conference on neural networks and brain (Vol. 1, pp. 279–283). IEEE
Vergara D, Hernández S, Jorquera F (2016) Multinomial Naive Bayes for real-time gender recognition. In: 2016 XXI Symposium on signal processing, images and artificial vision (STSIVA), pp 1–6. IEEE
Alkadri AM, Elkorany A, Ahmed C (2022) Enhancing detection of arabic social spam using data augmentation and machine learning. Appl Sci 12(22):11388
Article Google Scholar
Al-Azani S, El-Alfy ESM (2018) Detection of arabic spam tweets using word embedding and machine learning. In: 2018 international conference on innovation and intelligence for informatics, computing, and technologies (3ICT), pp 1–5. IEEE
Kardaş Berk et al. (2021) Detecting spam tweets using machine learning and effective preprocessing. In: Proceedings of the 2021 IEEE/ACM international conference on advances in social networks analysis and mining
Alom Z, Carminati B, Ferrari E (2018) Detecting spam accounts on Twitter. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 1191–1198. IEEE
Mostafa M, Abdelwahab A, Sayed HM (2020) Detecting spam campaign in twitter with semantic similarity. In: Journal of physics: conference series (Vol. 1447, No. 1, p. 012044). IOP Publishing
Ahmad SBS, Rafie M, Ghorabie SM (2021) Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions. Multimed Tools Appl 80(8):11583–11605
Article Google Scholar
Ban X, Chen C, Liu S, Wang Y, Zhang J (2018) Deep-learnt features for Twitter spam detection. In: 2018 International symposium on security and privacy in social networks and big data (SocialSec), pp 208–212. IEEE.

Download references

Funding

This work was funded by Zayed University—Start-up research grant [Grant Number R20081].

Author information

Authors and Affiliations

Department of Computing and Applied Technology, College of Technological Innovation, Zayed University, Abu Dhabi, UAE
Sanaa Kaddoura & Safaa Henno
Department of Information Technology, St. Xavier’s Catholic College of Engineering, Nagercoil, India
Suja A. Alex
Computing Department, Academic Development Division, Sabis Educational Services, Choueifat, Lebanon
Maher Itani
Department of Data Science, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Amman, Jordan
Asma AlNashash
Department of ECE, Karunya Institute of Technology and Sciences, Coimbatore, India
D. Jude Hemanth

Authors

Sanaa Kaddoura
View author publications
You can also search for this author in PubMed Google Scholar
Suja A. Alex
View author publications
You can also search for this author in PubMed Google Scholar
Maher Itani
View author publications
You can also search for this author in PubMed Google Scholar
Safaa Henno
View author publications
You can also search for this author in PubMed Google Scholar
Asma AlNashash
View author publications
You can also search for this author in PubMed Google Scholar
D. Jude Hemanth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanaa Kaddoura.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kaddoura, S., Alex, S.A., Itani, M. et al. Arabic spam tweets classification using deep learning. Neural Comput & Applic 35, 17233–17246 (2023). https://doi.org/10.1007/s00521-023-08614-w

Download citation

Received: 13 February 2022
Accepted: 17 April 2023
Published: 29 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-023-08614-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arabic spam tweets classification using deep learning

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Automatic detection of hate speech in code-mixed Indian languages in twitter social media interaction using DConvBLSTM-MuRIL ensemble method

A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arabic spam tweets classification using deep learning

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Automatic detection of hate speech in code-mixed Indian languages in twitter social media interaction using DConvBLSTM-MuRIL ensemble method

A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation