Prediction of phishing websites using machine learning

Pandey, Mithilesh Kumar; Singh, Munindra Kumar; Pal, Saurabh; Tiwari, B. B.

doi:10.1007/s41324-022-00489-8

Prediction of phishing websites using machine learning

Published: 06 October 2022

Volume 31, pages 157–166, (2023)
Cite this article

Spatial Information Research Aims and scope Submit manuscript

Mithilesh Kumar Pandey¹,
Munindra Kumar Singh¹,
Saurabh Pal ORCID: orcid.org/0000-0001-9545-7481¹ &
…
B. B. Tiwari²

159 Accesses
Explore all metrics

Abstract

With the growing popularity of the information science, more application is being integrated with websites that can be accessed directly through the internet. This has increased the possibility of attack by ill-legal persons to steal personal information. To identify a phishing assault, several strategies have been presented. However, there is still opportunity for progress in the fight against phishing. The objective of this research paper is to develop a more accurate prediction model using Decision Tree (DT), Random Forest (RF) and Gradient Boosting Classifiers (GBC) with three features selection techniques Extra Tree (ET), Chi-Square and Recursive Feature Elimination (RFE). Since phishing websites dataset contains 89 features, therefore we have applied extra tree and chi-square, feature selection method to identify the limited important features and then recursive features elimination technique has been used to reduce the dataset up-to optimum important features. We have compared the performance of the developed model using machine learning algorithms and find the best prediction performance using GBC, followed by RF and DT. These algorithmic models capture the trends from various cases of phishing with over R-square, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), in each case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine Learning Techniques for the Investigation of Phishing Websites

Website Phishing Detection Using Machine Learning Classification Algorithms

Performance Assessment of Multiple Machine Learning Classifiers for Detecting the Phishing URLs

References

Lam, I. F., Xiao, W. C., Wang, S. C., & Chen, K. T. (2009, June). Counteracting phishing page polymorphism: An image layout analysis approach. In International conference on information security and assurance (pp. 270–279). Springer.
Krombholz, K., Hobel, H., Huber, M., & Weippl, E. (2015). Advanced social engineering attacks. Journal of Information Security and applications, 22, 113–122.
Article Google Scholar
Jain, A. K., & Gupta, B. B. (2018). PHISH-SAFE: URL features-based phishing detection system using machine learning. In Cyber security (pp. 467–474). Springer.
Purbay, M., & Kumar, D. (2021). Split behavior of supervised machine learning algorithms for phishing URL detection. In Advances in VLSI, communication, and signal processing (pp. 497–505). Springer.
Gandotra, E., & Gupta, D. (2021). An efficient approach for phishing detection using machine learning. In Multimedia security (pp. 239–253). Springer.
Le, H., Pham, Q., Sahoo, D., & Hoi, S. C. (2018). URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint, arXiv:1802.03162.
Hong, J., Kim, T., Liu, J., Park, N., & Kim, S. W. (2020). Phishing URL detection with lexical features and blacklisted domains. In Adaptive autonomous secure cyber systems (pp. 253–267). Springer.
Kumar, J., Santhanavijayan, A., Janet, B., Rajendran, B., & Bindhumadhava, B. S. (2020, January). Phishing website classification and detection using machine learning. In 2020 international conference on computer communication and informatics (ICCCI) (pp. 1–6). IEEE.
Abutair, H. Y., & Belghith, A. (2017). Using case-based reasoning for phishing detection. Procedia Computer Science, 109, 281–288.
Article Google Scholar
Rao, R. S., & Pais, A. R. (2019). Jail-Phish: An improved search engine based phishing detection system. Computers & Security, 83, 246–267.
Article Google Scholar
Aljofey, A., Jiang, Q., Qu, Q., Huang, M., & Niyigena, J. P. (2020). An effective phishing detection model based on character level convolutional neural network from URL. Electronics, 9(9), 1514.
Article Google Scholar
AlEroud, A., & Karabatis, G. (2020, March). Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In Proceedings of the Sixth international workshop on security and privacy analytics (pp. 53–60).
Althobaiti, K., Rummani, G., & Vaniea, K. (2019, June). A review of human-and computer-facing URL phishing features. In 2019 IEEE European symposium on security and privacy workshops (EuroS&PW) (pp. 182–191). IEEE.
Gupta, B. B., Yadav, K., Razzak, I., Psannis, K., Castiglione, A., & Chang, X. (2021). A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment. Computer Communications, 175, 47–57.
Article Google Scholar
Sahoo, D., Liu, C., & Hoi, S. C. (2017). Malicious URL detection using machine learning: A survey. arXiv preprint, arXiv:1701.07179.
Chaurasia, V., & Pal, S. (2020). Applications of machine learning techniques to predict diagnostic breast cancer. SN Computer Science, 1(5), 1–11.
Article Google Scholar
Yadav, D. C., & Pal, S. (2020). Prediction of thyroid disease using decision tree ensemble method. Human-Intelligent Systems Integration, 2(1), 89–95.
Article Google Scholar
Chaurasia, V., & Pal, S. (2014). Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Review of Research, 3(8), 1–13.
Google Scholar
Kharwar, A. R., & Thakor, D. V. (2022). An ensemble approach for feature selection and classification in intrusion detection using extra-tree algorithm. International Journal of Information Security and Privacy (IJISP), 16(1), 1–21.
Google Scholar
Aggrawal, R., & Pal, S. (2020). Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease. SN Computer Science, 1(6), 1–16.
Article Google Scholar
Chaurasia, V., & Pal, S. (2022). An ensemble framework-stacking and feature selection technique for detection of breast cancer. International Journal of Medical Engineering and Informatics, 14(3), 240–251.
Article Google Scholar
Pandey, M. K., & Pal, S. (2022). Evaluation of chronic myelogenous leukemia (CML) as the chronic phase of disease using machine learning techniques. International Journal of Mechanical Engineering, 6, 198–206.
Google Scholar
Chaurasia, V., Pandey, M. K., & Pal, S. (2021, March). Prediction of presence of breast cancer disease in the patient using machine learning algorithms and SFS. In IOP conference series: Materials science and engineering (Vol. 1099, No. 1, p. 012003). IOP Publishing.
Shu, M., Zuo, J., Shen, M., Yin, P., Wang, M., Yang, X., Tang, J., Li, B., & Ma, Y. (2021). Improving the estimation accuracy of SPAD values for maize leaves by removing UAV hyperspectral image backgrounds. International Journal of Remote Sensing, 42(15), 5862–5881.
Article Google Scholar
Yadav, D. C., & Pal, S. (2021). Performance based evaluation of algorithms on chronic kidney disease using hybrid ensemble model in machine learning. Biomedical and Pharmacology Journal, 14(3), 1633–1645.
Article Google Scholar
Stančič, L., Oštir, K., & Kokalj, Ž. (2021). Fluvial gravel bar mapping with spectral signal mixture analysis. European Journal of Remote Sensing, 54(sup1), 31–46.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the VBS Purvanchal University, Jaunpur. I am indebted to the people who supported to the research and shared their ideas. I appreciate Prof. Surjeet Kumar due to his scientific advice related to the subject of this research.

Author information

Authors and Affiliations

Department of Computer Applications, VBS Purvanchal University, Jaunpur, Uttar Pradesh, 222001, India
Mithilesh Kumar Pandey, Munindra Kumar Singh & Saurabh Pal
Department of Electronics and Communication, VBS Purvanchal University, Jaunpur, Uttar Pradesh, 222001, India
B. B. Tiwari

Authors

Mithilesh Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Munindra Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Pal
View author publications
You can also search for this author in PubMed Google Scholar
B. B. Tiwari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saurabh Pal.

Ethics declarations

Conflict of Interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pandey, M.K., Singh, M.K., Pal, S. et al. Prediction of phishing websites using machine learning. Spat. Inf. Res. 31, 157–166 (2023). https://doi.org/10.1007/s41324-022-00489-8

Download citation

Received: 29 July 2022
Revised: 21 September 2022
Accepted: 22 September 2022
Published: 06 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s41324-022-00489-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Prediction of phishing websites using machine learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning Techniques for the Investigation of Phishing Websites

Website Phishing Detection Using Machine Learning Classification Algorithms

Performance Assessment of Multiple Machine Learning Classifiers for Detecting the Phishing URLs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of phishing websites using machine learning

Abstract

Access this article

Similar content being viewed by others

Machine Learning Techniques for the Investigation of Phishing Websites

Website Phishing Detection Using Machine Learning Classification Algorithms

Performance Assessment of Multiple Machine Learning Classifiers for Detecting the Phishing URLs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation