A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection

  • Guang Xiang
  • Bryan A. Pendleton
  • Jason Hong
  • Carolyn P. Rose
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6345)


Phishing attacks are a significant threat to users of the Internet, causing tremendous economic loss every year. In combating phish, industry relies heavily on manual verification to achieve a low false positive rate, which, however, tends to be slow in responding to the huge volume of unique phishing URLs created by toolkits. Our goal here is to combine the best aspects of human verified blacklists and heuristic-based methods, i.e., the low false positive rate of the former and the broad and fast coverage of the latter. To this end, we present the design and evaluation of a hierarchical blacklist-enhanced phish detection framework. The key insight behind our detection algorithm is to leverage existing human-verified blacklists and apply the shingling technique, a popular near-duplicate detection algorithm used by search engines, to detect phish in a probabilistic fashion with very high accuracy. To achieve an extremely low false positive rate, we use a filtering module in our layered system, harnessing the power of search engines via information retrieval techniques to correct false positives. Comprehensive experiments over a diverse spectrum of data sources show that our method achieves 0% false positive rate (FP) with a true positive rate (TP) of 67.15% using search-oriented filtering, and 0.03% FP and 73.53% TP without the filtering module. With incremental model building capability via a sliding window mechanism, our approach is able to adapt quickly to new phishing variants, and is thus more responsive to the evolving attacks.


False Positive Search Engine True Positive Detection Engine Legitimate Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
    Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups (APWG) 2nd Annual eCrime Researchers Summit, pp. 60–69 (2007)Google Scholar
  9. 9.
    Bennouas, T., de Montgolfier, F.: Random web crawls. In: Proceedings of the 16th International Conference on World Wide Web (WWW 2007), pp. 451–460 (2007)Google Scholar
  10. 10.
    Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. In: Proceedings of the Sixth International Conference on World Wide Web, pp. 1157–1166 (1997)Google Scholar
  11. 11.
    Cova, M., Kruegel, C., Vigna, G.: There is no free phish: An analysis of ’free’ and live phishing kits. In: Proceedings of the 2nd USENIX Workshop on Offensive Technologies, WOOT 2008 (2008)Google Scholar
  12. 12.
    Fetterly, D., Manasse, M., Najork, M.: On the evolution of clusters of near- duplicate web pages. In: Proceedings of the First Conference on Latin American Web Congress, pp. 37–45 (2003)Google Scholar
  13. 13.
    Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 1–8 (2007)Google Scholar
  14. 14.
    Henzinger, M.: Combinatorial algorithms for web search engines: three success stories. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1022–1026 (2007)Google Scholar
  15. 15.
    Herley, C., Florencio, D.: A profitless endeavor: phishing as tragedy of the commons. In: Proceedings of the 2008 Workshop on New Security Paradigms, pp. 59–70 (2009)Google Scholar
  16. 16.
    Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: Hämmerli, B.M., Sommer, R. (eds.) DIMVA 2007. LNCS, vol. 4579, pp. 20–39. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Moore, T., Clayton, R.: Examining the impact of website take-down on phishing. In: Proceedings of the Anti-phishing Working Groups (APWG) 2nd Annual eCrime Researchers Summit, pp. 1–13 (2007)Google Scholar
  18. 18.
    Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: Jesshope, C., Egan, C. (eds.) ACSAC 2006. LNCS, vol. 4186, pp. 381–392. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    3sharp report: Gone phishing: Evaluating anti-phishing tools for windows. Tech. rep. (September 2006),
  20. 20.
    Sheng, S., Kumaraguru, P., Acquisti, A., Cranor, L., Hong, J.: Improving phish- ing countermeasures: An analysis of expert interviews. In: Proceedings of the 4th APWG eCrime Researchers Summit (2009)Google Scholar
  21. 21.
    Sheng, S., Wardman, B., Warner, G., Cranor, L., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Proceedings of the 6th Conference on Email and Anti-Spam (2009)Google Scholar
  22. 22.
    Xiang, G., Hong, J.: A hybrid phish detection approach by identity discovery and keywords retrieval. In: Proceedings of the 18th International Conference on World Wide Web (WWW 2009), pp. 571–580 (2009)Google Scholar
  23. 23.
    Zhang, Y., Hong, J., Cranor, L.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web (WWW 2007), pp. 639–648 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guang Xiang
    • 1
  • Bryan A. Pendleton
    • 1
  • Jason Hong
    • 1
  • Carolyn P. Rose
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations