Abstract
This study explores the convergence of cybersecurity, Machine learning (ML), Generative adversarial networks (GANs), and Natural language processing (NLP) to overcome the threat posed by phishing emails in the digital landscape. The surge in online business models and email communication has fueled the proliferation of malicious content, accentuating phishing emails as a significant cybersecurity challenge. ML and Artificial Intelligence (AI) algorithms present a dynamic solution, adapting to the evolving threat landscape, contingent upon the availability of pertinent data-privacy concerns. To address this, the study investigates the potential of GANs for synthetic data generation in cybersecurity, specifically focusing on phishing emails. A major advantage of utilizing AI to handle phishing email detection is that such a system has an ability to adapt to the dynamic landscape of cybersecurity without the need for explicit directions given by human operators. Text mining is simply used to reformat the data in to a representation suitable to support the application of AI algorithms, that are designed with numerical values in mind. By conducting experiments with real-world datasets, the research evaluates the performance of contemporary ML classifiers, incorporating NLP techniques, and introduces a GAN-based approach to generate synthetic training data. The outcomes aim to contribute to the development of robust intrusion detection techniques, providing insights into mitigating cybersecurity risks in the face of advanced digital threats.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chang, C.C., Lin, C.J.: Training v-support vector classifiers: theory and algorithms. Neural Comput. 13(9), 2119–2147 (2001)
De Ville, B.: Decision trees. Interdiscipl. Rev.: Comput. Stat. 5(6), 448–455 (2013)
Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class Adaboost. In: Statistics and Its Interface, vol. 2, no. 3, pp. 349–360 (2009)
Jolicoeur-Martineau, A., Fatras, K., Kachman, T.: Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees. arXiv preprint arXiv:2309.09968 (2023)
Jovanovic, L., Jovanovic, D., Antonijevic, M., Nikolic, B., Bacanin, N., Zivkovic, M., Strumberger, I.: Improving phishing website detection using a hybrid two-level framework for feature selection and XGBoost tuning. J. Web Eng. 22(3), 543–574 (2023)
Jovanovic, L., Jovanovic, G., Perisic, M., Alimpic, F., Stanisic, S., Bacanin, N., Zivkovic, M., Stojic, A.: The explainable potential of coupling metaheuristics-optimized-XGBoost and Shap in revealing vocs’ environmental fate. Atmosphere 14(1), 109 (2023)
Kumar, A., Dhingra, S., Falwadiya, H.: Adoption of internet of things: a systematic literature review and future research agenda. Int. J. Consum. Stud. 47(6), 2553–2582 (2023)
Kuzlu, M., Fair, C., Guler, O.: Role of artificial intelligence in the internet of things (IoT) cybersecurity. Discover Internet Things 1, 1–14 (2021)
Liang, D., Krishnan, R.G., Hoffman, M.D., Jebara, T.: Variational autoencoders for collaborative filtering. In: Proceedings of the 2018 World Wide Web Conference, pp. 689–698 (2018)
Mani, J., Shaker, H., Jovanovic, L., et al.: Sunspot occurrence forecasting with metaheuristic optimized recurrent neural networks. Theor. Appl. Comput. Intell. 1(1), 15–26 (2023)
Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobotics 7, 21 (2013)
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Popescu, M.C., Balas, V.E., Perescu-Popescu, L., Mastorakis, N.: Multilayer perceptron and neural networks. WSEAS Trans. Circ. Syst. 8(7), 579–588 (2009)
Qaiser, S., Ali, R.: Text mining: use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Salb, M., Jovanovic, L., Zivkovic, M., Tuba, E., Elsadai, A., Bacanin, N.: Training logistic regression model by enhanced moth flame optimizer for spam email classification. In: Computer Networks and Inventive Communication Technologies: Proceedings of Fifth ICCNCT 2022, pp. 753–768. Springer (2022)
Xu, L., Veeramachaneni, K.: Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264 (2018)
Zhao, J., Wang, S.: A stable GAN for image steganography with multi-order feature fusion. Neural Comput. Appl. 34(18), 16073–16088 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jovanovic, L., Bacanin, N., Ravikumar, R., Antonijevic, M., Radic, G., Zivkovic, M. (2024). Generative Adversarial Networks for Synthetic Training Data Replacement in Phishing Email Detection Using Natural Language Processing. In: Asokan, R., Ruiz, D.P., Piramuthu, S. (eds) Smart Data Intelligence. ICSMDI 2024. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-97-3191-6_46
Download citation
DOI: https://doi.org/10.1007/978-981-97-3191-6_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3190-9
Online ISBN: 978-981-97-3191-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)