DeltaPhish: Detecting Phishing Webpages in Compromised Websites

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10492)


The large-scale deployment of modern phishing attacks relies on the automatic exploitation of vulnerable websites in the wild, to maximize profit while hindering attack traceability, detection and blacklisting. To the best of our knowledge, this is the first work that specifically leverages this adversarial behavior for detection purposes. We show that phishing webpages can be accurately detected by highlighting HTML code and visual differences with respect to other (legitimate) pages hosted within a compromised website. Our system, named DeltaPhish, can be installed as part of a web application firewall, to detect the presence of anomalous content on a website after compromise, and eventually prevent access to it. DeltaPhish is also robust against adversarial attempts in which the HTML code of the phishing page is carefully manipulated to evade detection. We empirically evaluate it on more than 5,500 webpages collected in the wild from compromised websites, showing that it is capable of detecting more than 99% of phishing webpages, while only misclassifying less than 1% of legitimate pages. We further show that the detection rate remains higher than 70% even under very sophisticated attacks carefully designed to evade our system.



This work has been partially supported by the DOGANA project, funded by the EU Horizon 2020 framework programme, under Grant Agreement no. 653618.


  1. 1.
    Beardsley, T.: Phishing detection and prevention, practical counter-fraud solutions. Technical report, TippingPoint (2005)Google Scholar
  2. 2.
    Hong, J.: The state of phishing attacks. Commun. ACM 55(1), 74–81 (2012)CrossRefGoogle Scholar
  3. 3.
    Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. Commun. Surv. Tutorials 15(4), 2091–2121 (2013). IEEECrossRefGoogle Scholar
  4. 4.
    Han, X., Kheir, N., Balzarotti, D.: Phisheye: live monitoring of sandboxed phishing kits. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 2016, pp. 1402–1413. ACM, New York (2016)Google Scholar
  5. 5.
    Bursztein, E., Benko, B., Margolis, D., Pietraszek, T., Archer, A., Aquino, A., Pitsillidis, A., Savage, S.: Handcrafted fraud and extortion: manual account hijacking in the wild. In: IMC 2014, pp. 347–358 (2014)Google Scholar
  6. 6.
    Cova, M., Kruegel, C., Vigna, G.: There is no free phish: an analysis of “free” and live phishing kits. In: 2nd WOOT 2008, Berkeley, CA, USA, pp. 4:1–4:8. USENIX (2008)Google Scholar
  7. 7.
    Invernizzi, L., Benvenuti, S., Cova, M., Comparetti, P.M., Kruegel, C., Vigna, G.: Evilseed: a guided approach to finding malicious web pages. In: IEEE Symposium SP 2012, Washington DC, USA, pp. 428–442. IEEE CS (2012)Google Scholar
  8. 8.
    APWG: Global phishing survey: Trends and domain name use in 2014 (2015)Google Scholar
  9. 9.
    Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: ACM CCS, pp. 1568–1579. ACM (2016)Google Scholar
  10. 10.
    Basnet, R.B., Sung, A.H.: Learning to detect phishing webpages. J. Internet Serv. Inf. Sec. (JISIS) 4(3), 21–39 (2014)Google Scholar
  11. 11.
    Medvet, E., Kirda, E., Kruegel, C.: Visual-similarity-based phishing detection. In: 4th International Conference SecureComm 2008, pp. 22:1–22:6. ACM, New York (2008)Google Scholar
  12. 12.
    Chen, T.C., Stepan, T., Dick, S., Miller, J.: An anti-phishing system employing diffused information. ACM Trans. Inf. Syst. Secur. 16(4), 16:1–16:31 (2014)Google Scholar
  13. 13.
    Chen, T.C., Dick, S., Miller, J.: Detecting visually similar web pages: application to phishing detection. ACM Trans. Intern. Tech. 10(2), 5:1–5:38 (2010)CrossRefGoogle Scholar
  14. 14.
    Blum, A., Wardman, B., Solorio, T., Warner, G.: Lexical feature based phishing URL detection using online learning. In: 3rd ACM Workshop on Artificial Intelligence and Security, AISec 2010, pp. 54–60. ACM, New York (2010)Google Scholar
  15. 15.
    Liang, B., Su, M., You, W., Shi, W., Yang, G.: Cracking classifiers for evasion: a case study on the google’s phishing pages filter. In: 25th International Conference on WWW, Montreal, Canada, pp. 345–356 (2016)Google Scholar
  16. 16.
    Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, WORM 2007, pp. 1–8. ACM, New York (2007)Google Scholar
  17. 17.
    Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: Url names say it all. In: 2011 Proceedings IEEE INFOCOM, pp. 191–195, April 2011Google Scholar
  18. 18.
    Marchal, S., François, J., State, R., Engel, T.: Proactive discovery of phishing related domain names. In: Balzarotti, D., Stolfo, S.J., Cova, M. (eds.) RAID 2012. LNCS, vol. 7462, pp. 190–209. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33338-5_10 CrossRefGoogle Scholar
  19. 19.
    Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: 22nd ACSAC, pp. 381–392 (2006)Google Scholar
  20. 20.
    Xu, L., Zhan, Z., Xu, S., Ye, K.: Cross-layer detection of malicious websites. In: 3rd CODASPY, pp. 141–152. ACM, New York (2013)Google Scholar
  21. 21.
    Whittaker, C., Ryner, B., Nazif, M.: Large-scale automatic classification of phishing pages. In: NDSS, San Diego, California, USA. The Internet Society (2010)Google Scholar
  22. 22.
    Xiang, G., Pendleton, B.A., Hong, J., Rose, C.P.: A hierarchical adaptive probabilistic approach for zero hour phish detection. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345, pp. 268–285. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15497-3_17 CrossRefGoogle Scholar
  23. 23.
    Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21:1–21:28 (2011)CrossRefGoogle Scholar
  24. 24.
    Britt, J., Wardman, B., Sprague, A., Warner, G.: Clustering potential phishing websites using deepmd5. In: 5th LEET, Berkeley, CA, USA. USENIX (2012)Google Scholar
  25. 25.
    Jo, I., Jung, E., Yeom, H.: You’re not who you claim to be: website identity check for phishing detection. In: International Conference on Computer Communication and Networks, pp. 1–6 (2010)Google Scholar
  26. 26.
    Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: Hämmerli, B., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73614-1_2 CrossRefGoogle Scholar
  27. 27.
    Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: 16th International Conference on WWW, pp. 649–656. ACM (2007)Google Scholar
  28. 28.
    Wardman, B., Stallings, T., Warner, G., Skjellum, A.: High-performance content-based phishing attack detection. In: eCrime Researchers Summit, November 2011Google Scholar
  29. 29.
    Wenyin, L., Liu, G., Qiu, B., Quan, X.: Antiphishing through phishing target discovery. IEEE Internet Comput. 16(2), 52–61 (2012)CrossRefGoogle Scholar
  30. 30.
    Chen, K.T., Chen, J.Y., Huang, C.R., Chen, C.S.: Fighting phishing with discriminative keypoint features. IEEE Internet Comput. 13(3), 56–63 (2009)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Fu, A.Y., Wenyin, L., Deng, X.: Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (emd). IEEE Trans. Dependable Secure Comput. 3(4), 301–311 (2006)CrossRefGoogle Scholar
  32. 32.
    Afroz, S., Greenstadt, R.: Phishzoo: Detecting phishing websites by looking at them. In: 5th IEEE International Conference on Semantic Computing, pp. 368–375 (2011)Google Scholar
  33. 33.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  34. 34.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, San Diego, CA, USA, pp. 886–893. IEEE CS (2005)Google Scholar
  35. 35.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  36. 36.
    Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) 18th International Conference on Multimedia, Firenze, Italy, pp. 1469–1472. ACM (2010)Google Scholar
  37. 37.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision, pp. 1–22 (2004)Google Scholar
  38. 38.
    Sural, S., Qian, G., Pramanik, S.: Segmentation and histogram generation using the HSV color space for image retrieval. In: ICIP, vol. 2, pp. 589–592 (2002)Google Scholar
  39. 39.
    Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comp. Vis. 7(1), 11–32 (1991)CrossRefGoogle Scholar
  40. 40.
    Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)CrossRefGoogle Scholar
  42. 42.
    Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40994-3_25 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Pluribus OneCagliariItaly
  2. 2.Department of Electrical and Electronic EngineeringUniversity of CagliariCagliariItaly

Personalised recommendations