Abstract
Phishing is one of the most dangerous threats in which a hacker imitates a person, company or government agency to lure and deceive their victims. Machine learning anti-phishing solutions are gaining popularity nowadays. However, most anti-phishing solutions rely heavily on features extracted from third-party services such as whois services, DNS search, and web traffic. As a result, they are slow and require a lot of computing resources. This paper introduces a machine-learning-based framework: Phish-Sight that detects phishing websites through a visual inspection strategy. Phish-Sight uses dominant color features and highly targeted popular brand names embedded in URLs’ web pages with machine learning techniques to detect phishing web pages. Prediction performance of the dominant color features and popular brand names from web pages was investigated using five machine learning algorithms. The Random Forest algorithm surpassed the others, with a 98.43% true positive rate and 99.13% accuracy in detecting phishing frauds. The prediction run time per web page measured at 7.6 s suggests that Phish-Sight has potential for real-time applications.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Jain, A.K., Gupta, B.B.: Detection of phishing attacks in financial and e-banking websites using link and visual similarity relation. Int. J. Inf. Comput. Secur. 10(4), 398–417 (2018)
Akiyama, M., Yagi, T., Hariu, T., Kadobayashi, Y.: Honeycirculator: distributing credential honeytoken for introspection of web-based attack cycle. Int. J. Inf. Secur. 17(2), 135–151 (2018)
Gupta, S., Gupta, B.B.: Detection, avoidance, and attack pattern mechanisms in modern web application vulnerabilities: present and future challenges. Int. J. Cloud Appl. Comput. (IJCAC) 7(3), 1–43 (2017)
Almomani, A., Gupta, B.B., Atawneh, S., Meulenberg, A., Almomani, E.: A survey of phishing email filtering techniques. IEEE Commun. Surv. Tutor. 15(4), 2070–2090 (2013)
APWG.: APWG trends report q3,2021 (2021). https://docs.apwg.org/reports/apwg_trends_report_q3_2021.PDF. Accessed 15 Oct 2021
Munonye, K., Péter, M.: Machine learning approach to vulnerability detection in OAuth 2.0 authentication and authorization flow. Int. J. Inf. Secur. 21, 1–15 (2021)
Ding, Y., Luktarhan, N., Li, K., Slamu, W.: A keyword-based combination approach for detecting phishing webpages. Comput. Secur. 84(1), 256–275 (2019)
Alkhalil, Z., Hewage, C., Nawaf, L., Khan, I.: Phishing attacks: a recent comprehensive study and a new anatomy. Front. Comput. Sci. 3, 563060 (2021)
Jain, A.K., Gupta, B.B.: A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp. Inf. Syst. 16(4), 527–565 (2022)
Vinayakumar, R., Soman, KP., Poornachandran, P., Akarsh, S., Elhoseny, M.: Deep learning framework for cyber threat situational awareness based on email and URL data analysis. In: Cybersecurity and Secure Information Systems, pp. 87–124. Springer, Berlin (2019)
Security Magzine. Security magzine report (2021). https://tinyurl.com/mr3xh557. Accessed 15 Oct 2021
Purkait, S.: Examining the effectiveness of phishing filters against DNS based phishing attacks. Inf. Comput. Secur. 23, 333–346 (2015)
Huang, Z., Liu, S., Mao, X., Chen, K., Li, J.: Insight of the protection for data security under selective opening attacks. Inf. Sci. 412, 223–241 (2017)
Li, J., Chen, X., Huang, X., Tang, S., Xiang, Y., Hassan, M.M., Alelaiwi, A.: Secure distributed deduplication systems with improved reliability. IEEE Trans. Comput. 64(12), 3569–3579 (2015)
Jain, A.K., Gupta, B.B.: Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68(4), 687–700 (2018)
Arachchilage, N.A.G., Love, S., Beznosov, K.: Phishing threat avoidance behaviour: an empirical investigation. Comput. Hum. Behav. 60, 185–197 (2016)
Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An Empirical Analysis of Phishing Blacklists (2009)
Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L.F., Hong, J., Nunge, E.: Anti-phishing Phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security, pp. 88–99 (2007)
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Buber, E., Demir, Ö., Sahingoz, O.K.: Feature selections for the machine learning based detection of phishing websites. In: 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), pp. 1–5 (2017)
Zhang, D., Yan, Z., Jiang, H., Kim, T.: A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf. Manag. 51(7), 845–853 (2014)
Montazer, G.A., ArabYarmohammadi, S.: Detection of phishing attacks in Iranian e-banking using a fuzzy-rough hybrid system. Appl. Soft Comput. 35, 482–492 (2015)
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Sahoo, D., Liu, C., Hoi, S.C.H.: Malicious URL detection using machine learning: a survey (2017). arXiv:1701.07179
Tian, K., Jan, S.T.K., Hu, H., Yao, D., Wang, G.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Proceedings of the Internet Measurement Conference 2018, pp. 429–442 (2018).
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humaniz. Comput. 1–15 (2018)
securitymagazine.com. The top 25 most phished brand (2021). https://www.securitymagazine.com/articles/94574-the-top-25-most-phished-brands. Accessed 15 Oct 2021
Lam, I.-F., Xiao, W.-C., Wang, S.-C., Chen, K.-T.: Counteracting phishing page polymorphism: an image layout analysis approach. In: International Conference on Information Security and Assurance, pp. 270–279. Springer, Berlin (2009)
Rao, R.S., Pais, A.R.: Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J. Ambient Intell. Humaniz. Comput. 11(9), 3853–3872 (2020)
Duckett, J.: HTML & CSS: Design and Build Websites, vol. 15. Wiley, Indianapolis (2011)
Jain, A.K., Gupta, B.B.: Phishing detection: analysis of visual similarity based approaches. Secur. Commun. Netw. 2017, 1–20 (2017)
Rosiello, A.P.E., Kirda, E., Ferrandi, F., et al.: A layout-similarity-based approach for detecting phishing pages. In: 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, pp. 454–463. IEEE (2007)
Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Comput. 10(2), 58–65 (2006)
Huang, C.-Y., Ma, S.-P., Yeh, W.-L., Lin, C.-Y., Liu, C.-T.: Mitigate web phishing using site signatures. In: TENCON 2010—2010 IEEE Region 10 Conference, pp. 803–808. IEEE (2010)
Afroz, S., Greenstadt, R.: Phishzoo: Ddetecting phishing websites by looking at them. In: 2011 IEEE Fifth International Conference on Semantic Computing, pp. 368–375. IEEE (2011)
Mao, J., Li, P., Li, K., Wei, T., Liang, Z.: Baitalarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 790–795. IEEE (2013)
Chen, K.-T., Chen, J.-Y., Huang, C.-R., Chen, C.-S.: Fighting phishing with discriminative keypoint features. IEEE Internet Comput. 13(3), 56–63 (2009)
Dunlop, M., Groat, S., Shelly, D.: Goldphish: using images for content-based phishing analysis. In: 2010 Fifth International Conference on Internet Monitoring and Protection, pp. 123–128. IEEE (2010)
Hara, M., Yamada, A., Miyake, Y.: Visual similarity-based phishing detection without victim site information. In: 2009 IEEE Symposium on Computational Intelligence in Cyber Security, pp. 30–36. IEEE (2009)
Medvet, E., Kirda, E., Kruegel, C.: Visual-similarity-based phishing detection. In: Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, pp. 1–6 (2008)
Chiew, K.L., Chang, E.H., Tiong, W.K., et al.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015)
El-Alfy, E.-S.M.: Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput. J. 60(12), 1745–1759 (2017)
Jain, A.K., Gupta, B.B.: A machine learning based approach for phishing detection using hyperlinks information. J. Ambient Intell. Humaniz. Comput. 10(5), 2015–2028 (2019)
Orunsolu, A.A., Sodiya, A.S., Akinwale, A.T.: A predictive model for phishing detection. J. King Saud Univ. Comput. Inf. Sci. 34, 232–247 (2019)
Gandotra, E., Gupta, D.: Improving spoofed website detection using machine learning. Cybern. Syst. 52(2), 169–190 (2021)
Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020)
Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., Liang, Z.: Detecting phishing websites via aggregation analysis of page layouts. Procedia Comput. Sci. 129, 224–230 (2018)
Li, Y., Zhenguo Yang, X., Chen, H.Y., Liu, W.: A stacking model using URL and html features for phishing webpage detection. Future Gener. Comput. Syst. 94, 27–39 (2019)
Deepa, S.T., Sujatha, S.S., Thanammal, K.K.: Phishing website detection using novel features and machine learning approach. Turk. J. Comput. Math. Educ. 12, 1–6 (2016)
Liu, G., Qiu, B., Wenyin, L.: Automatic detection of phishing target from phishing webpage. In: 2010 20th International Conference on Pattern Recognition, pp. 4153–4156. IEEE (2010)
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: International Conference on Neural Information Processing, pp. 539–546. Springer, Berlin (2008)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)
Lokesh, G.H., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5(1), 1–14 (2021)
Subasi, A., Kremic, E.: Comparison of adaboost with multiboosting for phishing website detection. Procedia Comput. Sci. 168, 272–278 (2020)
Yang, P., Zhao, G., Zeng, P.: Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7, 15196–15209 (2019)
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2019)
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S.C., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Tan, C.L., Chiew, K.L., Wong, K., et al.: Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decis. Support Syst. 88, 18–27 (2016)
Berger, H., Dvir, A.Z., Geva, M.: A wrinkle in time: a case study in DNS poisoning. Int. J. Inf. Secur. 20(3), 313–329 (2021)
Chiba, D., Yagi, T., Akiyama, M., Shibahara, T., Mori, T., Goto, S.: Domainprofiler: toward accurate and early discovery of domain names abused in future. Int. J. Inf. Secur. 17(6), 661–680 (2018)
Stevanovic, M., Pedersen, J.M., D’Alconzo, A., Ruehrup, S.: A method for identifying compromised clients based on DNS traffic analysis. Int. J. Inf. Secur. 16(2), 115–132 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pandey, P., Mishra, N. Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning. Int. J. Inf. Secur. 22, 881–891 (2023). https://doi.org/10.1007/s10207-023-00672-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00672-4