Abstract
Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique’s accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.
Similar content being viewed by others
Data availability
Data will be made available based on the request.
References
Abkenar SB, Kashani MH, Mahdipour E, Jameii SM (2021) Big data analytics meets social media: a systematic review of techniques, open issues, and future directions. Telematics Inf 57:101517
Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in online social networks. Comput Commun 36(10–11):1120–1129
Ahmed N, Amin R, Aldabbas H, Koundal D (2022) Machine learning techniques for spam detection in Email and IoT platforms: analysis and research challenges. Secur Commun Netw 8:1–19
Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Soc Netw Media 18:1–12
Barushka A, Hajek P (2020) Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput Appl 32:1–19
Chakraborty M, Pal S, Pramanik R, Ravindranath Chowdary C (2016) Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag 52(6):1053–1073
Choi J, Jeon C (2021) Cost-based heterogeneous learning framework for real-time spam detection in social networks with expert decisions. IEEE Access 9:103573–103587
Choudhury D, Acharjee T (2022) A novel approach to fake news detection in social networks using genetic algorithm applying machine learning classifiers. Multimed Tools Appl 82:1–17
Elakkiya E, Selvakumar S (2022) Stratified hyperparameters optimization of feed-forward neural network for social network spam detection (SON2S). Soft Comput 8:1–20
Govil N, Agarwal K, Bansal A, Varshney A (2020a) A machine learning based spam detection mechanism. In: Fourth international conference on computing methodologies and communication (ICCMC 2020a), pp 954–957
Govil N, Agarwal K, Bansal A, Varshney A (2020b) A machine learning based spam detection mechanism. In: 2020b Fourth international conference on computing methodologies and communication (ICCMC), Erode, India
Gupta M, Bakliwal A, Agarwal S, Mehndiratta P (2018) A comparative study of spam SMS detection using machine learning classifiers. In: 2018 Eleventh international conference on contemporary computing (IC3), pp 1–7
Heidemann J, Klier M, Probst F (2012) Online social networks: a survey of a global phenomenon. Comput Netw 56:3866–3878
Hu X, Tang J, Liu H (2014) Online social spammer detection. In: Proceeding 28th AAAI conference on artificial intelligence (AAAI), pp 59–65
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern 42(2):513–529
Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85:21–44
Janez-Martino F, Alaiz-Rodriguez R, Gonzalez-Castro V, Fidalgo E, Alegre E (2023) A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artif Intell Rev 56:1145–1173
Jbara YHF, Mohamed HAS (2020) Twitter spammer identification using URL based detection. IOP Conf Ser Mater Sci Eng 925:1–7
Jenifer Darling Rosita P, Jacob WS (2022) Multi-objective genetic algorithm and CNN-based deep learning architectural scheme for effective spam detection. Int J Intell Netw 3:9–15
Karim A, Azam S, Shanmugam B, Kannoorpatti K, Alazab M (2019) A comprehensive survey for intelligent spam email detection. IEEE Access 7:168261–168295
Kumar C, Bharti TS, Prakash S (2023) A hybrid data-driven framework for spam detection in online social network. In: International conference of machine learning and data engineering
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst 5(4):973–984
Masood F, Ammad G, Almogren A, Abbas A (2019) Spammer detection and fake user identification on social networks. IEEE Access 7:68140–68152
Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for Twitter. In: 2017 14th International Bhurban conference on applied sciences and technology (IBCAST), Islamabad, Pakistan, pp 466–471
Niranjani V, Agalya Y, Charunandhini K, Gayathri K, Gayathri R (2022) Spam detection for social media networks using machine learning. In:2022 8th International conference on advanced computing and communication systems (ICACCS), pp 2082–2088
Pirozmand P, Sadeghilalimi M, Rahmani AA (2021) A feature selection approach for spam detection in social networks using gravitational force-based heuristic algorithm. J Amb Intell Human Comput 8:1–14
Rodrigues AP, Fernandes R, Aakash A, Abhishek B, Shetty A (2022) Real-time twitter spam detection and sentiment analysis using machine learning and deep learning techniques. Comput Intell Neurosci 2022:1–14
Sharma R, Kaur G (2016) E-mail spam detection using SVM and RBF. Int J Mod Educ Comput Sci 8:57–63
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In Proceeding 26th annual computer security application conference (ACSAC), pp 1–9
Sun N, Lin G, Qiu J, Rimba P (2022) Near real-time twitter spam detection with machine learning techniques. Int J Comput Appl 44:1–12
Svadasu G, Adimoolam M (2022) Spam detection in social media using artificial neural network algorithm and comparing accuracy with support vector machine algorithm. In: 2022 International conference on business analytics for technology and security (ICBATS), pp 1–5
Swathi P (2018) Analysis on solutions for over-fitting and under-fitting in machine learning algorithms. Int J Innov Res Sci Eng Technol 7:10–15680
Thomas M, Meshram BB (2023) Chso-DNFNet: spam detection in Twitter using feature fusion and optimized deep neuro fuzzy network. Adv Eng Softw 175:1–12
Venkatewarlu B, Viswanath Shenoi V (2021) Optimized generative adversarial network with fractional calculus based feature fusion using twitter stream for spam detection. Inf Secur J Glob Perspect 8:1–20
Vijayaraj N, Sumathi M, Rajkamal MU (2022) Decision trees to detect malware in a cloud computing environment. In: 2022 International conference on electronic systems and intelligent computing (ICESIC), pp 299–303
Zhang Z, Hou R, Yang J (2020) Detection of social network spam based on improved extreme learning machine. IEEE Access 8:112003–112014
Zhao C, Xin Y, Li X, Yang Y, Chen Y (2020) A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci 10:1–18
Zheng X, Zeng Z, Chen Z, Yuanlong Yu, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34
Zheng X, Zhang X, Yu Y, Kechadi T, Rong C (2016) ELM-based spammer detection in social networks. J Supercomput 72(8):2991–3005
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
MS had done the methodology. SPR had done the writing and drafting. All the authors are aware of the submission.
Corresponding author
Ethics declarations
Conflict of interest
We declare that there is no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sumathi, M., Raja, S.P. Machine learning algorithm-based spam detection in social networks. Soc. Netw. Anal. Min. 13, 104 (2023). https://doi.org/10.1007/s13278-023-01108-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-023-01108-6