Sophisticated Phishers Make More Spelling Mistakes: Using URL Similarity against Phishing
Phishing attacks rise in quantity and quality. With short online lifetimes of those attacks, classical blacklist based approaches are not sufficient to protect online users. While attackers manage to achieve high similarity between original and fraudulent websites, this fact can also be used for attack detection. In many cases attackers try to make the Internet address (URL) from a website look similar to the original. In this work, we present a way of using the URL itself for automated detection of phishing websites by extracting and verifying different terms of a URL using search engine spelling recommendation.
We evaluate our concept against a large test set of 8730 real phishing URLs. In addition, we collected scores for the visual quality of a subset of those attacks to be able to compare the performance of our tests for different attack qualities. Results suggest that our heuristics are able to mark 54.3% of the malicious URLs as suspicious. With increasing visual quality of the phishing websites, the number of URL characteristics that allow a detection increases, as well.
Unable to display preview. Download preview PDF.
- 1.PhishTank: Statistics about phishing activity and PhishTank usage, http://www.phishtank.com/stats.php (last accessed April 28, 2012)
- 2.Goodin, D.: Google bots detect 9,500 new malicious websites every day, http://arstechnica.com/security/2012/06/google-detects-9500-new-malicious-websites-daily/ (last visited July 12, 2012)
- 3.Google Inc.: Safe browsing API — google developers, https://developers.google.com/safe-browsing/ (last accessed April 28, 2012)
- 4.Hong, J.: The state of phishing attacks. Communications of the ACM (2012)Google Scholar
- 5.Zhang, Y., Egelman, S., Cranor, L., Hong, J.: Phinding phish: Evaluating anti-phishing tools. In: NDSS (2007)Google Scholar
- 6.Moscaritolo, A.: Number of phishing URLs at alltime high, http://www.scmagazine.com/number-of-phishing-urls-at-all-time-high/article/150010/ (last visited July 12, 2012)
- 7.Riden, J.: How fast-flux server networks work (2008), http://www.honeynet.org/node/132 (last visited July 12, 2012)
- 8.Whitten, A., Tygar, J.D.: Why johnny can’t encrypt: A usability evaluation of PGP 5.0. In: 8th USENIX Security Symposium (1999)Google Scholar
- 9.Dhamija, R., Tygar, J.D., Hearst, M.: Why phishing works. In: CHI (2006)Google Scholar
- 10.Wu, M., Miller, R.C., Garfinkel, S.L.: Do security toolbars actually prevent phishing attacks? In: CHI (2006)Google Scholar
- 11.Chou, N., Ledesma, R., Teraguchi, Y., Boneh, D., Mitchell, J.C.: Client-side defense against web-based identity theft. In: NDSS (2004)Google Scholar
- 12.Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: WWW (2007)Google Scholar
- 13.Phelps, T.A., Wilensky, R.: Robust hyperlinks cost just five words each. Technical Report (2000)Google Scholar
- 14.Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (2011)Google Scholar
- 15.Krammer, V.: Phishing defense against IDN address spoofing attacks. In: PST (2006)Google Scholar
- 16.Gabrilovich, E., Gontmakher, A.: The homograph attack. Communications of the ACM (2002)Google Scholar
- 17.Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press (1997)Google Scholar
- 18.Lin, E., Greenberg, S., Trotter, E., Ma, D., Aycock, J.: Does domain highlighting help people identify phishing sites? In: CHI (2011)Google Scholar
- 19.Postel, J.: Domain Name System Structure and Delegation. RFC 1591, Informational (1994)Google Scholar
- 20.Mozilla Foundation: Public suffix list, http://publicsuffix.org/list/ (last accessed April 29, 2012)