Skip to main content

Generative Adversarial Networks for Synthetic Training Data Replacement in Phishing Email Detection Using Natural Language Processing

  • Conference paper
  • First Online:
Smart Data Intelligence (ICSMDI 2024)

Abstract

This study explores the convergence of cybersecurity, Machine learning (ML), Generative adversarial networks (GANs), and Natural language processing (NLP) to overcome the threat posed by phishing emails in the digital landscape. The surge in online business models and email communication has fueled the proliferation of malicious content, accentuating phishing emails as a significant cybersecurity challenge. ML and Artificial Intelligence (AI) algorithms present a dynamic solution, adapting to the evolving threat landscape, contingent upon the availability of pertinent data-privacy concerns. To address this, the study investigates the potential of GANs for synthetic data generation in cybersecurity, specifically focusing on phishing emails. A major advantage of utilizing AI to handle phishing email detection is that such a system has an ability to adapt to the dynamic landscape of cybersecurity without the need for explicit directions given by human operators. Text mining is simply used to reformat the data in to a representation suitable to support the application of AI algorithms, that are designed with numerical values in mind. By conducting experiments with real-world datasets, the research evaluates the performance of contemporary ML classifiers, incorporating NLP techniques, and introduces a GAN-based approach to generate synthetic training data. The outcomes aim to contribute to the development of robust intrusion detection techniques, providing insights into mitigating cybersecurity risks in the face of advanced digital threats.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Austria)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 256.79
Price includes VAT (Austria)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 252.99
Price includes VAT (Austria)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.kaggle.com/datasets/subhajournal/phishingemails.

References

  1. Chang, C.C., Lin, C.J.: Training v-support vector classifiers: theory and algorithms. Neural Comput. 13(9), 2119–2147 (2001)

    Article  Google Scholar 

  2. De Ville, B.: Decision trees. Interdiscipl. Rev.: Comput. Stat. 5(6), 448–455 (2013)

    Google Scholar 

  3. Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class Adaboost. In: Statistics and Its Interface, vol. 2, no. 3, pp. 349–360 (2009)

    Google Scholar 

  4. Jolicoeur-Martineau, A., Fatras, K., Kachman, T.: Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees. arXiv preprint arXiv:2309.09968 (2023)

  5. Jovanovic, L., Jovanovic, D., Antonijevic, M., Nikolic, B., Bacanin, N., Zivkovic, M., Strumberger, I.: Improving phishing website detection using a hybrid two-level framework for feature selection and XGBoost tuning. J. Web Eng. 22(3), 543–574 (2023)

    Google Scholar 

  6. Jovanovic, L., Jovanovic, G., Perisic, M., Alimpic, F., Stanisic, S., Bacanin, N., Zivkovic, M., Stojic, A.: The explainable potential of coupling metaheuristics-optimized-XGBoost and Shap in revealing vocs’ environmental fate. Atmosphere 14(1), 109 (2023)

    Google Scholar 

  7. Kumar, A., Dhingra, S., Falwadiya, H.: Adoption of internet of things: a systematic literature review and future research agenda. Int. J. Consum. Stud. 47(6), 2553–2582 (2023)

    Article  Google Scholar 

  8. Kuzlu, M., Fair, C., Guler, O.: Role of artificial intelligence in the internet of things (IoT) cybersecurity. Discover Internet Things 1, 1–14 (2021)

    Article  Google Scholar 

  9. Liang, D., Krishnan, R.G., Hoffman, M.D., Jebara, T.: Variational autoencoders for collaborative filtering. In: Proceedings of the 2018 World Wide Web Conference, pp. 689–698 (2018)

    Google Scholar 

  10. Mani, J., Shaker, H., Jovanovic, L., et al.: Sunspot occurrence forecasting with metaheuristic optimized recurrent neural networks. Theor. Appl. Comput. Intell. 1(1), 15–26 (2023)

    Article  Google Scholar 

  11. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobotics 7, 21 (2013)

    Article  Google Scholar 

  12. Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)

    Article  Google Scholar 

  13. Popescu, M.C., Balas, V.E., Perescu-Popescu, L., Mastorakis, N.: Multilayer perceptron and neural networks. WSEAS Trans. Circ. Syst. 8(7), 579–588 (2009)

    Google Scholar 

  14. Qaiser, S., Ali, R.: Text mining: use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)

    Google Scholar 

  15. Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)

    Article  Google Scholar 

  16. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  17. Salb, M., Jovanovic, L., Zivkovic, M., Tuba, E., Elsadai, A., Bacanin, N.: Training logistic regression model by enhanced moth flame optimizer for spam email classification. In: Computer Networks and Inventive Communication Technologies: Proceedings of Fifth ICCNCT 2022, pp. 753–768. Springer (2022)

    Google Scholar 

  18. Xu, L., Veeramachaneni, K.: Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264 (2018)

  19. Zhao, J., Wang, S.: A stable GAN for image steganography with multi-order feature fusion. Neural Comput. Appl. 34(18), 16073–16088 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nebojsa Bacanin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jovanovic, L., Bacanin, N., Ravikumar, R., Antonijevic, M., Radic, G., Zivkovic, M. (2024). Generative Adversarial Networks for Synthetic Training Data Replacement in Phishing Email Detection Using Natural Language Processing. In: Asokan, R., Ruiz, D.P., Piramuthu, S. (eds) Smart Data Intelligence. ICSMDI 2024. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-97-3191-6_46

Download citation

Publish with us

Policies and ethics