Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning

Pandey, Pankaj; Mishra, Nishchol

doi:10.1007/s10207-023-00672-4

Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning

Regular contribution
Published: 01 March 2023

Volume 22, pages 881–891, (2023)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

776 Accesses
2 Citations
Explore all metrics

Abstract

Phishing is one of the most dangerous threats in which a hacker imitates a person, company or government agency to lure and deceive their victims. Machine learning anti-phishing solutions are gaining popularity nowadays. However, most anti-phishing solutions rely heavily on features extracted from third-party services such as whois services, DNS search, and web traffic. As a result, they are slow and require a lot of computing resources. This paper introduces a machine-learning-based framework: Phish-Sight that detects phishing websites through a visual inspection strategy. Phish-Sight uses dominant color features and highly targeted popular brand names embedded in URLs’ web pages with machine learning techniques to detect phishing web pages. Prediction performance of the dominant color features and popular brand names from web pages was investigated using five machine learning algorithms. The Random Forest algorithm surpassed the others, with a 98.43% true positive rate and 99.13% accuracy in detecting phishing frauds. The prediction run time per web page measured at 7.6 s suggests that Phish-Sight has potential for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

A Review on Random Forest: An Ensemble Classifier

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Jain, A.K., Gupta, B.B.: Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. Int. J. Inf. Comput. Secur. 10(4), 398–417 (2018)
Google Scholar
Akiyama, M., Yagi, T., Hariu, T., Kadobayashi, Y.: Honeycirculator: distributing credential honeytoken for introspection of web-based attack cycle. Int. J. Inf. Secur. 17(2), 135–151 (2018)
Article Google Scholar
Gupta, S., Gupta, B.B.: Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: present and future challenges. Int. J. Cloud Appl. Comput. (IJCAC) 7(3), 1–43 (2017)
Google Scholar
Almomani, A., Gupta, B.B., Atawneh, S., Meulenberg, A., Almomani, E.: A survey of phishing email filtering techniques. IEEE Commun. Surv. Tutor. 15(4), 2070–2090 (2013)
Article Google Scholar
APWG.: APWG trends report q3,2021 (2021). https://docs.apwg.org/reports/apwg_trends_report_q3_2021.PDF. Accessed 15 Oct 2021
Munonye, K., Péter, M.: Machine learning approach to vulnerability detection in OAuth 2.0 authentication and authorization flow. Int. J. Inf. Secur. 21, 1–15 (2021)
Google Scholar
Ding, Y., Luktarhan, N., Li, K., Slamu, W.: A keyword-based combination approach for detecting phishing webpages. Comput. Secur. 84(1), 256–275 (2019)
Article Google Scholar
Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. 3, 563060 (2021)
Article Google Scholar
Jain, A.K., Gupta, B.B.: A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp. Inf. Syst. 16(4), 527–565 (2022)
Article Google Scholar
Vinayakumar, R., Soman, KP., Poornachandran, P., Akarsh, S., Elhoseny, M.: Deep learning framework for cyber threat situational awareness based on email and URL data analysis. In: Cybersecurity and Secure Information Systems, pp. 87–124. Springer, Berlin (2019)
Security Magzine. Security magzine report (2021). https://tinyurl.com/mr3xh557. Accessed 15 Oct 2021
Purkait, S.: Examining the effectiveness of phishing filters against DNS based phishing attacks. Inf. Comput. Secur. 23, 333–346 (2015)
Article Google Scholar
Huang, Z., Liu, S., Mao, X., Chen, K., Li, J.: Insight of the protection for data security under selective opening attacks. Inf. Sci. 412, 223–241 (2017)
Article MATH Google Scholar
Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M.M., Alelaiwi, A.: Secure distributed deduplication systems with improved reliability. IEEE Trans. Comput. 64(12), 3569–3579 (2015)
Article MathSciNet MATH Google Scholar
Jain, A.K., Gupta, B.B.: Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68(4), 687–700 (2018)
Article Google Scholar
Arachchilage, N.A.G., Love, S., Beznosov, K.: Phishing threat avoidance behaviour: an empirical investigation. Comput. Hum. Behav. 60, 185–197 (2016)
Article Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An Empirical Analysis of Phishing Blacklists (2009)
Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L.F., Hong, J., Nunge, E.: Anti-phishing Phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security, pp. 88–99 (2007)
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Article Google Scholar
Buber, E., Demir, Ö., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017)
Zhang, D., Yan, Z., Jiang, H., Kim, T.: A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf. Manag. 51(7), 845–853 (2014)
Article Google Scholar
Montazer, G.A., ArabYarmohammadi, S.: Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Appl. Soft Comput. 35, 482–492 (2015)
Article Google Scholar
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Article Google Scholar
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey (2017). arXiv:1701.07179
Tian, K., Jan, S.T.K., Hu, H., Yao, D., Wang, G.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Proceedings of the Internet Measurement Conference 2018, pp. 429–442 (2018).
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humaniz. Comput. 1–15 (2018)
securitymagazine.com. The top 25 most phished brand (2021). https://www.securitymagazine.com/articles/94574-the-top-25-most-phished-brands. Accessed 15 Oct 2021
Lam, I.-F., Xiao, W.-C., Wang, S.-C., Chen, K.-T.: Counteracting phishing page polymorphism: an image layout analysis approach. In: International Conference on Information Security and Assurance, pp. 270–279. Springer, Berlin (2009)
Rao, R.S., Pais, A.R.: Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J. Ambient Intell. Humaniz. Comput. 11(9), 3853–3872 (2020)
Article Google Scholar
Duckett, J.: HTML & CSS: Design and Build Websites, vol. 15. Wiley, Indianapolis (2011)
Google Scholar
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 1–20 (2017)
Article Google Scholar
Rosiello, A.P.E., Kirda, E., Ferrandi, F., et al.: A layout-similarity-based approach for detecting phishing pages. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 454–463. IEEE (2007)
Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Comput. 10(2), 58–65 (2006)
Article Google Scholar
Huang, C.-Y., Ma, S.-P., Yeh, W.-L., Lin, C.-Y., Liu, C.-T.: Mitigate web phishing using site signatures. In: TENCON 2010—2010 IEEE Region 10 Conference, pp. 803–808. IEEE (2010)
Afroz, S., Greenstadt, R.: Phishzoo: Ddetecting phishing websites by looking at them. In: 2011 IEEE Fifth International Conference on Semantic Computing, pp. 368–375. IEEE (2011)
Mao, J., Li, P., Li, K., Wei, T., Liang, Z.: Baitalarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 790–795. IEEE (2013)
Chen, K.-T., Chen, J.-Y., Huang, C.-R., Chen, C.-S.: Fighting phishing with discriminative keypoint features. IEEE Internet Comput. 13(3), 56–63 (2009)
Article MathSciNet Google Scholar
Dunlop, M., Groat, S., Shelly, D.: Goldphish: using images for content-based phishing analysis. In: 2010 Fifth International Conference on Internet Monitoring and Protection, pp. 123–128. IEEE (2010)
Hara, M., Yamada, A., Miyake, Y.: Visual similarity-based phishing detection without victim site information. In: 2009 IEEE Symposium on Computational Intelligence in Cyber Security, pp. 30–36. IEEE (2009)
Medvet, E., Kirda, E., Kruegel, C.: Visual-similarity-based phishing detection. In: Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, pp. 1–6 (2008)
Chiew, K.L., Chang, E.H., Tiong, W.K., et al.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015)
Article Google Scholar
El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017)
Article Google Scholar
Jain, A.K., Gupta, B.B.: A machine learning based approach for phishing detection using hyperlinks information. J. Ambient Intell. Humaniz. Comput. 10(5), 2015–2028 (2019)
Article Google Scholar
Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saud Univ. Comput. Inf. Sci. 34, 232–247 (2019)
Google Scholar
Gandotra, E., Gupta, D.: Improving spoofed website detection using machine learning. Cybern. Syst. 52(2), 169–190 (2021)
Article Google Scholar
Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020)
Article Google Scholar
Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., Liang, Z.: Detecting phishing websites via aggregation analysis of page layouts. Procedia Comput. Sci. 129, 224–230 (2018)
Article Google Scholar
Li, Y., Zhenguo Yang, X., Chen, H.Y., Liu, W.: A stacking model using URL and html features for phishing webpage detection. Future Gener. Comput. Syst. 94, 27–39 (2019)
Article Google Scholar
Deepa, S.T., Sujatha, S.S., Thanammal, K.K.: Phishing website detection using novel features and machine learning approach. Turk. J. Comput. Math. Educ. 12, 1–6 (2016)
Google Scholar
Liu, G., Qiu, B., Wenyin, L.: Automatic detection of phishing target from phishing webpage. In: 2010 20th International Conference on Pattern Recognition, pp. 4153–4156. IEEE (2010)
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: International Conference on Neural Information Processing, pp. 539–546. Springer, Berlin (2008)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)
Article Google Scholar
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)
Article Google Scholar
Lokesh, G.H., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)
Article Google Scholar
Subasi, A., Kremic, E.: Comparison of adaboost with multiboosting for phishing website detection. Procedia Comput. Sci. 168, 272–278 (2020)
Article Google Scholar
Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019)
Article Google Scholar
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2019)
Article Google Scholar
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S.C., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Article Google Scholar
Tan, C.L., Chiew, K.L., Wong, K., et al.: Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016)
Article Google Scholar
Berger, H., Dvir, A.Z., Geva, M.: A wrinkle in time: a case study in DNS poisoning. Int. J. Inf. Secur. 20(3), 313–329 (2021)
Article Google Scholar
Chiba, D., Yagi, T., Akiyama, M., Shibahara, T., Mori, T., Goto, S.: Domainprofiler: toward accurate and early discovery of domain names abused in future. Int. J. Inf. Secur. 17(6), 661–680 (2018)
Article Google Scholar
Stevanovic, M., Pedersen, J.M., D’Alconzo, A., Ruehrup, S.: A method for identifying compromised clients based on DNS traffic analysis. Int. J. Inf. Secur. 16(2), 115–132 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Rajiv Gandhi Proudyogiki, Vishwavidyalaya, Bhopal, Madhya Pradesh, India
Pankaj Pandey & Nishchol Mishra

Authors

Pankaj Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Nishchol Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pankaj Pandey.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pandey, P., Mishra, N. Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning. Int. J. Inf. Secur. 22, 881–891 (2023). https://doi.org/10.1007/s10207-023-00672-4

Download citation

Published: 01 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10207-023-00672-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A Review on Random Forest: An Ensemble Classifier

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A Review on Random Forest: An Ensemble Classifier

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation