Skip to main content

Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection

  • Conference paper
  • First Online:
Security and Privacy in Communication Networks (SecureComm 2021)

Abstract

Phishing web page is one of the most serious threats to the users of the Internet. Recently, deep learning-based phishing detection methods have achieved significant improvement. However, these supervised deep neural networks require a large number of training samples. They also have difficulties in detecting novel phishing web pages. Using anomaly detection approaches is a possible way out yet is currently less explored, possibly due to two reasons. First, HTML codes lie in high dimensional discrete space which is difficult to handle for existing anomaly detection methods. Second, existing anomaly detection methods may find other types of anomalies that are beyond the scope of phishing.

In this paper, we propose a novel semi-supervised deep anomaly detection-based phishing webpage detection method. We first utilize a multi-head self-attention network to learn feature representation that is suitable for anomaly detection from HTML codes. Then we build a semi-supervised learner with Gaussian prior and contrastive loss to fulfill an end-to-end anomaly detector that is specifically optimized for detecting phishing webpages. Extensive experiments on a real-world dataset demonstrate that the accuracy of our method outperforms other state-of-the-art methods by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AlEroud, A., Zhou, L.: Phishing environments, techniques, and countermeasures: a survey. Comput. Secur. 68, 160–196 (2017)

    Article  Google Scholar 

  2. Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th Workshop on Digital Identity Management, pp. 51–60. ACM (2008)

    Google Scholar 

  3. Chiew, K., Fatt, J.C.S., Sze, S., Yong, K.S.C.: Leverage website favicon to detect phishing websites. Secur. Commun. Netw 2018, 7251750:1-7251750:11 (2018)

    Article  Google Scholar 

  4. Das, A., Baki, S., Aassal, A.E., Verma, R.M., Dunbar, A.: Sok: a comprehensive reexamination of phishing research from the security perspective. IEEE Commun. Surv. Tutor. 22(1), 671–708 (2020)

    Article  Google Scholar 

  5. Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A.I., Guizani, M.: Systematization of knowledge (sok): a systematic review of software-based web phishing detection. IEEE Commun. Surv. Tutor. 19(4), 2797–2819 (2017)

    Article  Google Scholar 

  6. Huang, Y., Yang, Q., Qin, J., Wen, W.: Phishing URL detection via CNN and attention-based hierarchical RNN. In: 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering, TrustCom/BigDataSE, pp. 112–119. IEEE (2019)

    Google Scholar 

  7. Lin, Z., et al.: A structured self-attentive sentence embedding. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017)

    Google Scholar 

  8. Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, 15–19 December 2008, pp. 413–422. IEEE Computer Society (2008). https://doi.org/10.1109/ICDM.2008.17

  9. Mao, J., Li, P., Li, K., Wei, T., Liang, Z.: Baitalarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 790–795. IEEE (2013)

    Google Scholar 

  10. Opara, C., Wei, B., Chen, Y.: Htmlphish: enabling phishing web page detection by applying deep learning techniques on HTML analysis. In: 2020 International Joint Conference on Neural Networks, pp. 1–8. IEEE (2020)

    Google Scholar 

  11. Ramesh, G., Krishnamurthi, I., Kumar, K.S.S.: An efficacious method for detecting phishing webpages through target domain identification. Decis. Supp. Syst. 61, 12–22 (2014)

    Article  Google Scholar 

  12. Ruff, L., et al.: Deep one-class classification. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, vol. 80, pp. 4390–4399. Proceedings of Machine Learning Research, PMLR (2018). http://proceedings.mlr.press/v80/ruff18a.html

  13. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965

    Article  MATH  Google Scholar 

  14. Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: CEAS 2009 (2009)

    Google Scholar 

  15. Stobbs, J., Issac, B., Jacob, S.M.: Phishing web page detection using optimised machine learning. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 483–490 (2020)

    Google Scholar 

  16. Wang, W., Zhang, F., Luo, X., Zhang, S.: PDRCNN: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw 2019, 2595794:1-2595794:15 (2019)

    Google Scholar 

  17. Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2010. The Internet Society (2010)

    Google Scholar 

  18. Xiang, G., Hong, J.I., Rosé, C.P., Cranor, L.F.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1-21:28 (2011)

    Article  Google Scholar 

  19. Zhao, P., Hoi, S.C.H.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–927. ACM (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linshu Ouyang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouyang, L., Zhang, Y. (2021). Phishing Web Page Detection with Semi-Supervised Deep Anomaly Detection. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds) Security and Privacy in Communication Networks. SecureComm 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 399. Springer, Cham. https://doi.org/10.1007/978-3-030-90022-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90022-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90021-2

  • Online ISBN: 978-3-030-90022-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics