Abstract
This research investigates the potential of Natural Language Processing (NLP) in discerning the underlying motivations behind phishing emails. Traditional systems, which primarily focus on tracking origin points, subjects, and IP addresses, face challenges in detecting emerging networking threats that involve the virtual relocation of large-scale scam operations. Despite these challenges, the consistency of intent within the email context remains a constant factor. Therefore, this study proposes the utilization of NLP in conjunction with robust classification techniques as a promising approach to effectively address phishing emails and fortify cybersecurity. The specific focus of the study lies in exploring the application of BERT encoding combined with the XGBoost classifier for the identification of phishing emails based on their email body content. However, the success of machine learning classifiers is contingent upon appropriate hyperparameter selection. To tackle this issue, an altered version of the SCHO algorithm is introduced and customized to overcome inherent limitations, providing a valuable benchmark against other contemporary optimizers. Simulations conducted with real-world samples demonstrate promising outcomes, achieving a precision rate exceeding 75% for phishing email identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Safi, A., Singh, S.: A systematic literature review on phishing website detection techniques. J. King Saud Univ. Comput. Inf. Sci. 35(2), 590–611 (2023)
Bacanin, N., et al.: Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection. Complex Intell. Syst. 9(6), 7269–7304 (2023)
Bai, J., et al.: A sinh cosh optimizer. Knowl.-Based Syst. 282, 111081 (2023)
Gangavarapu, T., Jaidhar, C.D., Chanduka, B.: Applicability of machine learning in spam and phishing email filtering: review and approaches. Artif. Intell. Rev. 53(7), 5019–5081 (2020)
Ahmed, N., Amin, R., Aldabbas, H., Koundal, D., Alouffi, B., Shah, T.: Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges. Secur. Commun. Netw. 2022, 1862888 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T., (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Association for Computational Linguistics, pp. 4171–4186 (2019)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Carrasco, J., GarcÃa, S., Rueda, M., Das, S., Herrera, F.: Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: practical guidelines and a critical review. Swarm Evol. Comput. 54, 100665 (2020)
Khishe, M., Mosavi, M.R.: Chimp optimization algorithm. Expert Syst. Appl. 149, 113338 (2020)
Marini, F., Walczak, B.: Particle swarm optimization (PSO). A tutorial. Chemometr. Intell. Lab. Syst. 149, 153–165 (2015)
Karaboga, D., Akay, B.: A comparative study of artificial bee colony algorithm. Appl. Math. Comput. 214(1), 108–132 (2009)
Yang, X.S., Slowik, A.: Firefly algorithm. In: Swarm İntelligence Algorithms, pp. 163–174. CRC Press (2020)
Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Jovanovic, L., Zivkovic, M., Antonijevic, M., Jovanovic, D., Ivanovic, M., Jassim, H.S.: An emperor penguin optimizer application for medical diagnostics. In: 2022 IEEE Zooming Innovation in Consumer Technologies Conference (ZINC), pp. 191–196. IEEE (2022)
Jovanovic, L., et al.: Multi-step crude oil price prediction based on LSTM approach tuned by Salp swarm algorithm with disputation operator. Sustainability 14(21), 14616 (2022)
Jovanovic, L., Bacanin, N., Simic, V., Mani, J., Zivkovic, M., Sarac, M.: Optimizing machine learning for space weather forecasting and event classification using modified metaheuristics. Soft Comput. 1–20 (2023)
Mirjalili, S., Mirjalili, S.: Genetic algorithm. Evol. Algorithms Neural NetworksTheory Appl. 43–55 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Antonijevic, M., Jovanovic, L., Bacanin, N., Zivkovic, M., Kaljevic, J., Zivkovic, T. (2024). Using BERT with Modified Metaheuristic Optimized XGBoost for Phishing Email Identification. In: Manoharan, S., Tugui, A., Baig, Z. (eds) Proceedings of 4th International Conference on Artificial Intelligence and Smart Energy. ICAIS 2024. Information Systems Engineering and Management, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-031-61475-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-61475-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61474-3
Online ISBN: 978-3-031-61475-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)
