Abstract
The emergence of online social network (OSN) platforms has resulted in the production of enormous amounts of data from billions of active users. The ease of access to personal information on OSN platforms renders users vulnerable to the creation of fake profiles. The primary objective of fake accounts is to disseminate unsolicited messages, unverified information, and other deceitful content on OSN platforms. No study has been conducted to ascertain whether traditional machine learning techniques or deep learning techniques are more effective at detecting fake profiles on OSN platforms with respect to dataset size. The present study fills this void by conducting a performance evaluation of artificial intelligence (traditional machine learning and deep learning) techniques using benchmark datasets of different sizes. An ablation study is conducted to ascertain the optimal combination of features, while data augmentation techniques are employed to address the issue of an imbalanced dataset. The study’s results show that using the data augmentation technique, particularly the synthetic minority over-sampling technique (SMOTE), yields better results on an imbalanced dataset. Further deep learning techniques (LSTM, which has an accuracy rate of 97%) work better on large datasets for finding fake profiles on OSN platforms, while traditional machine learning techniques (XGBoost, which also has an accuracy rate of 97%) work better on small datasets. The top-performing techniques are also compared with state-of-the-art techniques to validate the results. The study may aid future researchers in developing a comprehensive methodology for detecting fake profiles on OSN platforms.
Similar content being viewed by others
Data Availability
In the reference section, the datasets utilized for this research are referenced.
Notes
https://datareportal.com/social-media-users and Global Social Media User Statistics (2023) — Demography & Facts (demandsage.com) (Accessed on 10th January 2024).
https://backlinko.com/social-media-users and https://datareportal.com/social-media-users (Accessed on 10th January 2024).
Global Social Media User Statistics (2023) — Demography & Facts (demandsage.com) (Accessed on 10th January 2024).
https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/ (Accessed on 10th January 2024).
https://www.bankmycell.com/blog/how-many-phones-are-in-the-world (Accessed on 10th January 2024).
References
Aïmeur E, Amri S, Brassard G. Fake news, disinformation and misinformation in social media: a review. Soc Netw Anal Min. 2023. https://doi.org/10.1007/s13278-023-01028-5.
Priyadharshini VM, Valarmathi A. A novel spam detection technique for detecting and classifying malicious profiles in online social networks. J Intell Fuzzy Syst. 2021;41(1):993–1007. https://doi.org/10.3233/JIFS-202937.
Gupta S, Verma B, Gupta P, Goel L, Yadav AK, Yadav D. Identification of fake news using deep neural network-based hybrid model. SN Comput Sci. 2023;4:679. https://doi.org/10.1007/s42979-023-02117-0.
Clark EM, Williams JR, Jones CA, Galbraith RA, Danforth CM, Dodds PS. Sifting robotic from organic text: a natural language approach for detecting automation on Twitter. J Comput Sci. 2016;16:1–7. https://doi.org/10.1016/J.JOCS.2015.11.002.
Mughaid A, Obeidat I, AlZu’bi S, Elsoud EA, Alnajjar A, Alsoud AR, Abualigah L. A novel machine learning and face recognition technique for fake accounts detection system on cyber social networks. Multimed Tools Appl. 2023;82:26353–78. https://doi.org/10.1007/s11042-023-14347-8.
Van Der Walt E, Eloff J. Using machine learning to detect fake identities: bots vs humans. IEEE Access. 2018;6:6540–9. https://doi.org/10.1109/ACCESS.2018.2796018.
Swe MM, Nyein Myo N. Fake Accounts Detection on Twitter Using Blacklist. In: Proceedings – 17th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2018, pp. 562–566, 2018. https://doi.org/10.1109/ICIS.2018.8466499.
Shahid W, Li Y, Staples D, Amin G, Hakak S, Ghorbani A. Are you a cyborg, bot or human? A survey on detecting fake news spreaders. IEEE Access. 2022;10:27069–83. https://doi.org/10.1109/ACCESS.2022.3157724.
Islam MM, Uddin MA, Islam L, Akter A, Sharmin S, Acharjee UK. Cyberbullying Detection on Social Networks Using Machine Learning Approaches. In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2020, 2020. https://doi.org/10.1109/CSDE50874.2020.9411601.
Shah A, Varshney S, Mehrotra M. DeepMUI: a novel method to identify malicious users on online social network platforms. Concurr Comput. 2023. https://doi.org/10.1002/CPE.7917.
Sahoo SR, Gupta BB. Classification of various attacks and their defence mechanism in online social networks: a survey. Enterp Inf Syst. 2019;13(6):832–64. https://doi.org/10.1080/17517575.2019.1605542.
Singh N, Sharma T, Thakral A, Choudhury T. Detection of Fake Profile in Online Social Networks Using Machine Learning. In: Proceedings on 2018 International Conference on Advances in Computing and Communication Engineering, ICACCE 2018; 2018. pp. 231–234. https://doi.org/10.1109/ICACCE.2018.8441713.
Ambareen K, Meenakshi Sundaram S. A survey of cyberbullying detection and performance: its impact in social media using artificial intelligence. SN Comput Sci. 2023;4:859. https://doi.org/10.1007/s42979-023-02301-2.
Thaokar C, Rout JK, Rout M, Ray NK. N-Gram based sarcasm detection for news and social media text using hybrid deep learning models. SN Comput Sci. 2024;5:163. https://doi.org/10.1007/s42979-023-02506-5.
Ramalingam D, Chinnaiah V. Fake profile detection techniques in large-scale online social networks: a comprehensive review. Comput Electr Eng. 2018;65:165–77. https://doi.org/10.1016/J.COMPELECENG.2017.05.020.
Wanda P, Jie HJ. DeepProfile: finding fake profile in online social network using dynamic CNN. J Inform Secur Appl. 2020;52:102465.https://doi.org/10.1016/J.JISA.2020.102465
Pv S, Bhanu SMS. UbCadet: detection of compromised accounts in twitter based on user behavioural profiling. Multimed Tools Appl. 2020;79:27–8. https://doi.org/10.1007/S11042-020-08721-Z.
Bharti KK, Pandey S. Fake account detection in twitter using logistic regression with particle swarm optimization. Soft Comput. 2021;25(16):11333–45. https://doi.org/10.1007/S00500-021-05930-Y.
Roy PK, Chahar S. Fake profile detection on social networking websites: a comprehensive review. IEEE Trans Artif Intell. 2020;1(3):271–85. https://doi.org/10.1109/TAI.2021.3064901.
Kaushik K, Bhardwaj A, Kumar M, Gupta SK, Gupta A. A novel machine learning-based framework for detecting fake Instagram profiles. Concurr Comput. 2022;34(28):e7349. https://doi.org/10.1002/CPE.7349.
Jia J, Wang B, Gong NZ. Random walk based fake account detection in online social networks. In: Proceedings – 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2017; 2017. pp 273–284. https://doi.org/10.1109/DSN.2017.55.
Fire M, Goldschmidt R, Elovici Y. Online social networks: threats and solutions. IEEE Commun Surv Tutorials. Apr. 2014;16(4):2019–36. https://doi.org/10.1109/COMST.2014.2321628.
Alsaleh M, Alarifi A, Al-Salman AM, Alfayez M, Almuhaysin A. TSD: Detecting sybil accounts in twitter. In: Proceedings – 2014 13th International Conference on Machine Learning and Applications, ICMLA 2014; 2014. pp. 463–469. https://doi.org/10.1109/ICMLA.2014.81.
Erşahin B, Aktaş Ö, Kilmç D, Akyol C. Twitter fake account detection. In: 2nd International Conference on Computer Science and Engineering, UBMK 2017; 2017. pp. 388–392. https://doi.org/10.1109/UBMK.2017.8093420.
Alom Z, Carminati B, Ferrari E. Detecting spam accounts on Twitter. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2018; 2018. pp. 1191–1198. https://doi.org/10.1109/ASONAM.2018.8508495.
Senthil Raja M, Arun Raj L. Detection of malicious profiles and protecting users in online social networks. Wirel Pers Commun. 2022;127(1):107–24. https://doi.org/10.1007/S11277-021-08095-X.
Awan MJ, Khan MA, Ansari ZK, Yasin A, Shehzad HMF. Fake profile recognition using big data analytics in social media platforms. Int J Comput Appl Technol. 2022;68(3):215–22. https://doi.org/10.1504/IJCAT.2022.124942.
David I, Siordia OS, Moctezuma D. Features combination for the detection of malicious Twitter accounts. In: 2016 IEEE International Autumn Meeting on Power, Electronics and Computing. ROPEC 2016; 2017. https://doi.org/10.1109/ROPEC.2016.7830626.
Revathi S, Suriakala M. Profile similarity communication matching approaches for detection of duplicate profiles in online social network. In: Proceedings 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions. CSITSS 2018; 2018. pp. 174–182. https://doi.org/10.1109/CSITSS.2018.8768751.
Sheikhi S. An efficient method for detection of fake accounts on the Instagram platform. Rev Intell Artif. 2020;34(4):429–36. https://doi.org/10.18280/RIA.340407.
Singh M, Bansal D, Sofat S. Detecting malicious users in Twitter using classifiers. In: ACM International Conference Proceeding Series, vol. 2014. 2014. pp. 247–253. https://doi.org/10.1145/2659651.2659736.
Akyon FC, Esat Kalfaoglu M. Instagram fake and automated account detection. In: Proceedings – 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019; 2019. https://doi.org/10.1109/ASYU48272.2019.8946437.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106. https://doi.org/10.1007/BF00116251.
Schölkopf B. SVMs: a practical consequence of learning theory. IEEE Intell Syst Appl. 1998;13(4):18–21. https://doi.org/10.1109/5254.708428.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324/METRICS.
Lewis DD. Naive(Bayes) at forty: the independence assumption in information retrieval. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1398. 1998. pp. 4–15. https://doi.org/10.1007/BFB0026666.
Maalouf M. Logistic regression in data analysis: an overview. Int J Data Anal Tech Strateg. 2011;3(3):281–99. https://doi.org/10.1504/IJDATS.2011.041335.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13. 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785.
Pineda FJ. Generalization of back-propagation to recurrent neural networks. Phys Rev Lett. 1987;59(19):2229. https://doi.org/10.1103/PhysRevLett.59.2229.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/NECO.1997.9.8.1735.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–323. https://doi.org/10.1109/5.726791.
Cresci S, Spognardi A, Petrocchi M, Tesconi M, Di Pietro R. The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: 26th International World Wide Web Conference 2017. WWW 2017 Companion; 2017. pp. 963–972. https://doi.org/10.1145/3041021.3055135.
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst. 2015;80:56–71. https://doi.org/10.1016/J.DSS.2015.09.003.
Ghanem R, Erbay H. Spam detection on social networks using deep contextualized word representation. Multimed Tools Appl. 2023;82(3):3697–712. https://doi.org/10.1007/S11042-022-13397-8.
Ghanem R, Erbay H, Bakour K. Contents-based spam detection on social networks using RoBERTa embedding and stacked BLSTM. SN Comput Sci. 2023. https://doi.org/10.1007/s42979-023-01798-x.
Funding
There was no explicit financing for this paper from any public, private, or nonprofit funding source. The figures and tables used in this publication do not require copyright rights.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest pertaining to this research.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shah, A., Varshney, S. & Mehrotra, M. Detection of Fake Profiles on Online Social Network Platforms: Performance Evaluation of Artificial Intelligence Techniques. SN COMPUT. SCI. 5, 489 (2024). https://doi.org/10.1007/s42979-024-02839-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02839-9