Skip to main content
Log in

Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Phishing is one of the most dangerous threats in which a hacker imitates a person, company or government agency to lure and deceive their victims. Machine learning anti-phishing solutions are gaining popularity nowadays. However, most anti-phishing solutions rely heavily on features extracted from third-party services such as whois services, DNS search, and web traffic. As a result, they are slow and require a lot of computing resources. This paper introduces a machine-learning-based framework: Phish-Sight that detects phishing websites through a visual inspection strategy. Phish-Sight uses dominant color features and highly targeted popular brand names embedded in URLs’ web pages with machine learning techniques to detect phishing web pages. Prediction performance of the dominant color features and popular brand names from web pages was investigated using five machine learning algorithms. The Random Forest algorithm surpassed the others, with a 98.43% true positive rate and 99.13% accuracy in detecting phishing frauds. The prediction run time per web page measured at 7.6 s suggests that Phish-Sight has potential for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Jain, A.K., Gupta, B.B.: Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. Int. J. Inf. Comput. Secur. 10(4), 398–417 (2018)

    Google Scholar 

  2. Akiyama, M., Yagi, T., Hariu, T., Kadobayashi, Y.: Honeycirculator: distributing credential honeytoken for introspection of web-based attack cycle. Int. J. Inf. Secur. 17(2), 135–151 (2018)

    Article  Google Scholar 

  3. Gupta, S., Gupta, B.B.: Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: present and future challenges. Int. J. Cloud Appl. Comput. (IJCAC) 7(3), 1–43 (2017)

    Google Scholar 

  4. Almomani, A., Gupta, B.B., Atawneh, S., Meulenberg, A., Almomani, E.: A survey of phishing email filtering techniques. IEEE Commun. Surv. Tutor. 15(4), 2070–2090 (2013)

    Article  Google Scholar 

  5. APWG.: APWG trends report q3,2021 (2021). https://docs.apwg.org/reports/apwg_trends_report_q3_2021.PDF. Accessed 15 Oct 2021

  6. Munonye, K., Péter, M.: Machine learning approach to vulnerability detection in OAuth 2.0 authentication and authorization flow. Int. J. Inf. Secur. 21, 1–15 (2021)

    Google Scholar 

  7. Ding, Y., Luktarhan, N., Li, K., Slamu, W.: A keyword-based combination approach for detecting phishing webpages. Comput. Secur. 84(1), 256–275 (2019)

    Article  Google Scholar 

  8. Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. 3, 563060 (2021)

    Article  Google Scholar 

  9. Jain, A.K., Gupta, B.B.: A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp. Inf. Syst. 16(4), 527–565 (2022)

    Article  Google Scholar 

  10. Vinayakumar, R., Soman, KP., Poornachandran, P., Akarsh, S., Elhoseny, M.: Deep learning framework for cyber threat situational awareness based on email and URL data analysis. In: Cybersecurity and Secure Information Systems, pp. 87–124. Springer, Berlin (2019)

  11. Security Magzine. Security magzine report (2021). https://tinyurl.com/mr3xh557. Accessed 15 Oct 2021

  12. Purkait, S.: Examining the effectiveness of phishing filters against DNS based phishing attacks. Inf. Comput. Secur. 23, 333–346 (2015)

    Article  Google Scholar 

  13. Huang, Z., Liu, S., Mao, X., Chen, K., Li, J.: Insight of the protection for data security under selective opening attacks. Inf. Sci. 412, 223–241 (2017)

    Article  MATH  Google Scholar 

  14. Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M.M., Alelaiwi, A.: Secure distributed deduplication systems with improved reliability. IEEE Trans. Comput. 64(12), 3569–3579 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Jain, A.K., Gupta, B.B.: Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68(4), 687–700 (2018)

    Article  Google Scholar 

  16. Arachchilage, N.A.G., Love, S., Beznosov, K.: Phishing threat avoidance behaviour: an empirical investigation. Comput. Hum. Behav. 60, 185–197 (2016)

    Article  Google Scholar 

  17. Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An Empirical Analysis of Phishing Blacklists (2009)

  18. Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L.F., Hong, J., Nunge, E.: Anti-phishing Phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security, pp. 88–99 (2007)

  19. Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)

    Article  Google Scholar 

  20. Buber, E., Demir, Ö., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017)

  21. Zhang, D., Yan, Z., Jiang, H., Kim, T.: A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf. Manag. 51(7), 845–853 (2014)

    Article  Google Scholar 

  22. Montazer, G.A., ArabYarmohammadi, S.: Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Appl. Soft Comput. 35, 482–492 (2015)

    Article  Google Scholar 

  23. Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)

    Article  Google Scholar 

  24. Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey (2017). arXiv:1701.07179

  25. Tian, K., Jan, S.T.K., Hu, H., Yao, D., Wang, G.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Proceedings of the Internet Measurement Conference 2018, pp. 429–442 (2018).

  26. Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humaniz. Comput. 1–15 (2018)

  27. securitymagazine.com. The top 25 most phished brand (2021). https://www.securitymagazine.com/articles/94574-the-top-25-most-phished-brands. Accessed 15 Oct 2021

  28. Lam, I.-F., Xiao, W.-C., Wang, S.-C., Chen, K.-T.: Counteracting phishing page polymorphism: an image layout analysis approach. In: International Conference on Information Security and Assurance, pp. 270–279. Springer, Berlin (2009)

  29. Rao, R.S., Pais, A.R.: Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J. Ambient Intell. Humaniz. Comput. 11(9), 3853–3872 (2020)

    Article  Google Scholar 

  30. Duckett, J.: HTML & CSS: Design and Build Websites, vol. 15. Wiley, Indianapolis (2011)

    Google Scholar 

  31. Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 1–20 (2017)

    Article  Google Scholar 

  32. Rosiello, A.P.E., Kirda, E., Ferrandi, F., et al.: A layout-similarity-based approach for detecting phishing pages. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 454–463. IEEE (2007)

  33. Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Comput. 10(2), 58–65 (2006)

    Article  Google Scholar 

  34. Huang, C.-Y., Ma, S.-P., Yeh, W.-L., Lin, C.-Y., Liu, C.-T.: Mitigate web phishing using site signatures. In: TENCON 2010—2010 IEEE Region 10 Conference, pp. 803–808. IEEE (2010)

  35. Afroz, S., Greenstadt, R.: Phishzoo: Ddetecting phishing websites by looking at them. In: 2011 IEEE Fifth International Conference on Semantic Computing, pp. 368–375. IEEE (2011)

  36. Mao, J., Li, P., Li, K., Wei, T., Liang, Z.: Baitalarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 790–795. IEEE (2013)

  37. Chen, K.-T., Chen, J.-Y., Huang, C.-R., Chen, C.-S.: Fighting phishing with discriminative keypoint features. IEEE Internet Comput. 13(3), 56–63 (2009)

    Article  MathSciNet  Google Scholar 

  38. Dunlop, M., Groat, S., Shelly, D.: Goldphish: using images for content-based phishing analysis. In: 2010 Fifth International Conference on Internet Monitoring and Protection, pp. 123–128. IEEE (2010)

  39. Hara, M., Yamada, A., Miyake, Y.: Visual similarity-based phishing detection without victim site information. In: 2009 IEEE Symposium on Computational Intelligence in Cyber Security, pp. 30–36. IEEE (2009)

  40. Medvet, E., Kirda, E., Kruegel, C.: Visual-similarity-based phishing detection. In: Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, pp. 1–6 (2008)

  41. Chiew, K.L., Chang, E.H., Tiong, W.K., et al.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015)

    Article  Google Scholar 

  42. El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017)

    Article  Google Scholar 

  43. Jain, A.K., Gupta, B.B.: A machine learning based approach for phishing detection using hyperlinks information. J. Ambient Intell. Humaniz. Comput. 10(5), 2015–2028 (2019)

    Article  Google Scholar 

  44. Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saud Univ. Comput. Inf. Sci. 34, 232–247 (2019)

    Google Scholar 

  45. Gandotra, E., Gupta, D.: Improving spoofed website detection using machine learning. Cybern. Syst. 52(2), 169–190 (2021)

    Article  Google Scholar 

  46. Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020)

    Article  Google Scholar 

  47. Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., Liang, Z.: Detecting phishing websites via aggregation analysis of page layouts. Procedia Comput. Sci. 129, 224–230 (2018)

    Article  Google Scholar 

  48. Li, Y., Zhenguo Yang, X., Chen, H.Y., Liu, W.: A stacking model using URL and html features for phishing webpage detection. Future Gener. Comput. Syst. 94, 27–39 (2019)

    Article  Google Scholar 

  49. Deepa, S.T., Sujatha, S.S., Thanammal, K.K.: Phishing website detection using novel features and machine learning approach. Turk. J. Comput. Math. Educ. 12, 1–6 (2016)

    Google Scholar 

  50. Liu, G., Qiu, B., Wenyin, L.: Automatic detection of phishing target from phishing webpage. In: 2010 20th International Conference on Pattern Recognition, pp. 4153–4156. IEEE (2010)

  51. Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)

  52. Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: International Conference on Neural Information Processing, pp. 539–546. Springer, Berlin (2008)

  53. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)

    Article  Google Scholar 

  54. Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)

    Article  Google Scholar 

  55. Lokesh, G.H., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)

    Article  Google Scholar 

  56. Subasi, A., Kremic, E.: Comparison of adaboost with multiboosting for phishing website detection. Procedia Comput. Sci. 168, 272–278 (2020)

    Article  Google Scholar 

  57. Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019)

    Article  Google Scholar 

  58. Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2019)

    Article  Google Scholar 

  59. Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S.C., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)

    Article  Google Scholar 

  60. Tan, C.L., Chiew, K.L., Wong, K., et al.: Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016)

    Article  Google Scholar 

  61. Berger, H., Dvir, A.Z., Geva, M.: A wrinkle in time: a case study in DNS poisoning. Int. J. Inf. Secur. 20(3), 313–329 (2021)

    Article  Google Scholar 

  62. Chiba, D., Yagi, T., Akiyama, M., Shibahara, T., Mori, T., Goto, S.: Domainprofiler: toward accurate and early discovery of domain names abused in future. Int. J. Inf. Secur. 17(6), 661–680 (2018)

    Article  Google Scholar 

  63. Stevanovic, M., Pedersen, J.M., D’Alconzo, A., Ruehrup, S.: A method for identifying compromised clients based on DNS traffic analysis. Int. J. Inf. Secur. 16(2), 115–132 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj Pandey.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, P., Mishra, N. Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning. Int. J. Inf. Secur. 22, 881–891 (2023). https://doi.org/10.1007/s10207-023-00672-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00672-4

Keywords

Navigation