Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection

Ouyang, Linshu; Zhang, Yongzheng

doi:10.1007/978-3-030-90022-9_20

Linshu Ouyang^20,21 &
Yongzheng Zhang^20,21

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 399))

Included in the following conference series:

International Conference on Security and Privacy in Communication Systems

918 Accesses

Abstract

Phishing web page is one of the most serious threats to the users of the Internet. Recently, deep learning-based phishing detection methods have achieved significant improvement. However, these supervised deep neural networks require a large number of training samples. They also have difficulties in detecting novel phishing web pages. Using anomaly detection approaches is a possible way out yet is currently less explored, possibly due to two reasons. First, HTML codes lie in high dimensional discrete space which is difficult to handle for existing anomaly detection methods. Second, existing anomaly detection methods may find other types of anomalies that are beyond the scope of phishing.

In this paper, we propose a novel semi-supervised deep anomaly detection-based phishing webpage detection method. We first utilize a multi-head self-attention network to learn feature representation that is suitable for anomaly detection from HTML codes. Then we build a semi-supervised learner with Gaussian prior and contrastive loss to fulfill an end-to-end anomaly detector that is specifically optimized for detecting phishing webpages. Extensive experiments on a real-world dataset demonstrate that the accuracy of our method outperforms other state-of-the-art methods by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AlEroud, A., Zhou, L.: Phishing environments, techniques, and countermeasures: a survey. Comput. Secur. 68, 160–196 (2017)
Article Google Scholar
Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th Workshop on Digital Identity Management, pp. 51–60. ACM (2008)
Google Scholar
Chiew, K., Fatt, J.C.S., Sze, S., Yong, K.S.C.: Leverage website favicon to detect phishing websites. Secur. Commun. Netw 2018, 7251750:1-7251750:11 (2018)
Article Google Scholar
Das, A., Baki, S., Aassal, A.E., Verma, R.M., Dunbar, A.: Sok: a comprehensive reexamination of phishing research from the security perspective. IEEE Commun. Surv. Tutor. 22(1), 671–708 (2020)
Article Google Scholar
Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A.I., Guizani, M.: Systematization of knowledge (sok): a systematic review of software-based web phishing detection. IEEE Commun. Surv. Tutor. 19(4), 2797–2819 (2017)
Article Google Scholar
Huang, Y., Yang, Q., Qin, J., Wen, W.: Phishing URL detection via CNN and attention-based hierarchical RNN. In: 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering, TrustCom/BigDataSE, pp. 112–119. IEEE (2019)
Google Scholar
Lin, Z., et al.: A structured self-attentive sentence embedding. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008, pp. 413–422. IEEE Computer Society (2008). https://doi.org/10.1109/ICDM.2008.17
Mao, J., Li, P., Li, K., Wei, T., Liang, Z.: Baitalarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 790–795. IEEE (2013)
Google Scholar
Opara, C., Wei, B., Chen, Y.: Htmlphish: enabling phishing web page detection by applying deep learning techniques on HTML analysis. In: 2020 International Joint Conference on Neural Networks, pp. 1–8. IEEE (2020)
Google Scholar
Ramesh, G., Krishnamurthi, I., Kumar, K.S.S.: An efficacious method for detecting phishing webpages through target domain identification. Decis. Supp. Syst. 61, 12–22 (2014)
Article Google Scholar
Ruff, L., et al.: Deep one-class classification. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, vol. 80, pp. 4390–4399. Proceedings of Machine Learning Research, PMLR (2018). http://proceedings.mlr.press/v80/ruff18a.html
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965
Article MATH Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: CEAS 2009 (2009)
Google Scholar
Stobbs, J., Issac, B., Jacob, S.M.: Phishing web page detection using optimised machine learning. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 483–490 (2020)
Google Scholar
Wang, W., Zhang, F., Luo, X., Zhang, S.: PDRCNN: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw 2019, 2595794:1-2595794:15 (2019)
Google Scholar
Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2010. The Internet Society (2010)
Google Scholar
Xiang, G., Hong, J.I., Rosé, C.P., Cranor, L.F.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1-21:28 (2011)
Article Google Scholar
Zhao, P., Hoi, S.C.H.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–927. ACM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Linshu Ouyang & Yongzheng Zhang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Linshu Ouyang & Yongzheng Zhang

Authors

Linshu Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Yongzheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linshu Ouyang .

Editor information

Editors and Affiliations

Télécom SudParis, Institut Polytechnique de Paris, Palaiseau, France
Joaquin Garcia-Alfaro
University of Kent Canterbury, Canterbury, Kent, UK
Shujun Li
University of Washington, Seattle, WA, USA
Radha Poovendran
Télécom SudParis, Institut Polytechnique de Paris, Palaiseau, France
Hervé Debar
Google Inc., New York, NY, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouyang, L., Zhang, Y. (2021). Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds) Security and Privacy in Communication Networks. SecureComm 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 399. Springer, Cham. https://doi.org/10.1007/978-3-030-90022-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-90022-9_20
Published: 04 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90021-2
Online ISBN: 978-3-030-90022-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics