Skip to main content

Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations

  • Conference paper
  • First Online:
Computer Security – ESORICS 2022 (ESORICS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13556))

Included in the following conference series:

Abstract

Uniform Resource Locators (URLs) are integral to the Web and have existed for nearly three decades. Yet URL parsing differs subtly among parser implementations, leading to ambiguity that can be abused by attackers. We measure agreement between widely-used URL parsers and find that each has made design decisions that deviate from parsing standards, creating a fractured implementation space where assumptions of uniform interpretation are unreliable. In some cases, deviations are severe enough that clients using different parsers will make requests to different hosts based on a single, “equivocal” URL. We systematize the thousands of differences we observed into seven pitfalls in URL parsing that application developers should beware of. We demonstrate that this ambiguity can be weaponized through misdirection attacks that evade the Google Safe Browsing and VirusTotal URL classifiers. URL parsing libraries have made a tradeoff to favor permissiveness over strict standards adherence. We hope this work will motivate the systemic adoption of a more unified URL parsing standard–enabling a more secure Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexa top 1,000,000 sites. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

  2. parse_url. php.net (2021)

    Google Scholar 

  3. Uniform resource identifier (uri) schemes. IANA (2021)

    Google Scholar 

  4. Urlhaus. abuse.ch (2021)

    Google Scholar 

  5. Google safe browsing (2022). https://safebrowsing.google.com/

  6. Unicode character database: Unicodedata.txt (2022). https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

  7. Url: Living standard. Web Hypertext Application Technology Working Group (2022)

    Google Scholar 

  8. Virus total (2022). www.virustotal.com

  9. Ahmed: url-parse package return wrong hostname. Hackerone (2018)

    Google Scholar 

  10. Akhawe, D., Felt, A.P.: Alice in warningland: a large-scale field study of browser security warning effectiveness. In: 22nd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2013) (2013)

    Google Scholar 

  11. Albakry, S., Vaniea, K., Wolters, M.K.: What is this url’s destination? empirical evaluation of users’ url reading. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)

    Google Scholar 

  12. Aljofey, A., Jiang, Q., Qu, Q., Huang, M., Niyigena, J.P.: An effective phishing detection model based on character level convolutional neural network from url. Electronics 9, 1514 (2020)

    Article  Google Scholar 

  13. Alsharnouby, M., Alaca, F., Chiasson, S.: Why phishing still works: user strategies for combating phishing attacks. Int. J. Hum.-Comput. Stud. 82, 69–82 (2015)

    Article  Google Scholar 

  14. Althobaiti, K., Rummani, G., Vaniea, K.: A review of human-and computer-facing url phishing features. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW). IEEE (2019)

    Google Scholar 

  15. Anitha, A., Gudivada, K.S., Rakshitha Lakshmi, M., Kumari, S., Usha, C.: Identifying phishing websites through url parsing (2019)

    Google Scholar 

  16. Athanasopoulos, E., Kemerlis, V.P., Polychronakis, M., Markatos, E.P.: ARC: protecting against HTTP parameter pollution attacks using application request caches. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 400–417. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_24

    Chapter  Google Scholar 

  17. Balduzzi, M., Gimenez, C.T., Balzarotti, D., Kirda, E.: Automated discovery of parameter pollution vulnerabilities in web applications. In: NDSS (2011)

    Google Scholar 

  18. Berners-Lee, T.: Rfc 1630: universal resource identifiers in www: a unifying syntax for the expression of names and addresses of objects on the network as used in the world-wide web (1994)

    Google Scholar 

  19. Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 2396: uniform resource identifiers (uri): generic syntax (1998)

    Google Scholar 

  20. Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 3986: uniform resource identifier (uri): generic syntax (2005)

    Google Scholar 

  21. Berners-Lee, T., Masinter, L., McCahill, M.: Rfc 1738: uniform resource locators (url) (1994)

    Google Scholar 

  22. Canova, G., Volkamer, M., Bergmann, C., Reinheimer, B.: Nophish app evaluation: lab and retention study. In: NDSS workshop on usable security (2015)

    Google Scholar 

  23. Cao, Y., Wei, Q., Wang, Q.: Parameter pollution vulnerabilities detection study based on tree edit distance. In: Chim, T.W., Yuen, T.H. (eds.) ICICS 2012. LNCS, vol. 7618, pp. 392–399. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34129-8_37

    Chapter  Google Scholar 

  24. Carettoni, L., di Paola, S.: Http parameter pollution. OWASP AppSec Europe (2009)

    Google Scholar 

  25. Costello, A.: Rfc 3492: Punycode: a bootstring encoding of unicode for internationalized domain names in applications (idna) (2003)

    Google Scholar 

  26. Dhamija, R., Tygar, J.D., Hearst, M.: Why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in computing systems (2006)

    Google Scholar 

  27. Duerst, M., Suignard, M.: Rfc 3987: internationalized resource identifiers (iris) (2005)

    Google Scholar 

  28. Durumeric, Z., Adrian, D., Mirian, A., Bailey, M., Halderman, J.A.: A search engine backed by Internet-wide scanning. In: 22nd ACM Conference on Computer and Communications Security (2015)

    Google Scholar 

  29. Felt, A.P., et al.: Improving SSL warnings: comprehension and adherence. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 2893–2902 (2015)

    Google Scholar 

  30. Felt, A.P., et al.: Rethinking connection security indicators. In: Twelfth Symposium on Usable Privacy and Security (\(\{\)SOUPS\(\}\) 2016) (2016)

    Google Scholar 

  31. Felt, A.P., Reeder, R.W., Almuhimedi, H., Consolvo, S.: Experimenting at scale with google chrome’s ssl warning. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014)

    Google Scholar 

  32. Fielding, R.: Rfc 1808: relative uniform resource locators (1995)

    Google Scholar 

  33. Hinden, R., Carpenter, B., Masinter, L.: Format for literal ipv6 addresses in url’s. Technical report, RFC 2732 (1999)

    Google Scholar 

  34. Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44

    Chapter  Google Scholar 

  35. Kettle, J.: Practical web cache poisoning. Port Swigger (2018)

    Google Scholar 

  36. Kreymer, I., Chuang, G.: Announcing the common crawl index! (2015)

    Google Scholar 

  37. Kumaraguru, P., et al.: School of phish: a real-world evaluation of anti-phishing training. In: Proceedings of the 5th Symposium on Usable Privacy and Security (2009)

    Google Scholar 

  38. Kumaraguru, P., et al.: Getting users to pay attention to anti-phishing education: evaluation of retention and transfer. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit (2007)

    Google Scholar 

  39. Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching johnny not to fall for phish. ACM Trans. Internet Technol. (TOIT) 10, 1–31 (2010)

    Article  Google Scholar 

  40. Kunze, J.: Rfc 1736: functional recommendations for internet resource locators (1995)

    Google Scholar 

  41. Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: IEEE INFOCOM. IEEE (2011)

    Google Scholar 

  42. Leitschuh, J.: Ssrf via maliciously crafted url due to host confusion. Hackerone (2019)

    Google Scholar 

  43. Ma, Z., et al.: The impact of secure transport protocols on phishing efficacy. In: 12th \(\{\)USENIX\(\}\) Workshop on Cyber Security Experimentation and Test (\(\{\)CSET\(\}\) 2019) (2019)

    Google Scholar 

  44. Marchal, S., Armano, G., Gröndahl, T., Saari, K., Singh, N., Asokan, N.: Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66, 1717–1733 (2017)

    Article  MathSciNet  Google Scholar 

  45. Marlinspike, M.: More tricks for defeating ssl in practice. Black Hat USA (2009)

    Google Scholar 

  46. Muñoz, F.: Invalid url parsing with #. CVE-2016-8624 (2016)

    Google Scholar 

  47. Parekh, S., Parikh, D., Kotak, S., Sankhe, S.: A new method for detection of phishing websites: Url detection. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE (2018)

    Google Scholar 

  48. Postel, J.: Rfc: 761 ien: 129 (1980)

    Google Scholar 

  49. Reynolds, J., et al.: Measuring identity confusion with uniform resource locators. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)

    Google Scholar 

  50. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from urls. Expert Syst. Appl. 117, 345–357 (2019)

    Article  Google Scholar 

  51. Sheng, S., et al.: Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security (2007)

    Google Scholar 

  52. Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE symposium on security and privacy. IEEE (2011)

    Google Scholar 

  53. Thompson, C., Shelton, M., Stark, E., Walker, M., Schechter, E., Felt, A.P.: The web’s identity crisis: understanding the effectiveness of website identity indicators. In: 28th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2019) (2019)

    Google Scholar 

  54. Thompson, M.: The harmful consequences of the robustness principle (draft) (2018)

    Google Scholar 

  55. Tom: Security: uxss in chrome on ios. bugs.chromium.org (2018)

    Google Scholar 

  56. Tsai, O.: A new era of ssrf - exploiting url parser in trending programming languages! Black Hat USA (2017)

    Google Scholar 

  57. Tsai, O.: Breaking parser logic! take your path normalization off and pop 0days out. Black Hat USA (2018)

    Google Scholar 

  58. Vishwanath, A., Herath, T., Chen, R., Wang, J., Rao, H.R.: Why do people get phished? testing individual differences in phishing vulnerability within an integrated, information processing model. Decis. Supp. Syst. 51, 576–586 (2011)

    Article  Google Scholar 

  59. Wang, X., Lau, W.C., Yang, R., Shi, S.: Make redirection evil again: Url parser issues in oauth. Black Hat Asia (2019)

    Google Scholar 

  60. Weber, C.: (2022). https://websec.github.io/unicode-security-guide/

  61. Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Human-centric Comput. Inf. Sci. 7, 1–13 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the NSF under grants GR0005987 and CNS 1955228. We thank our anonymous peer reviewers as well as Zane Ma, Joshua Mason, Kent Seamons, Jay Misra, Kaylia M. Reynolds, Deepak Kumar, and Paul Murley for their feedback and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Reynolds .

Editor information

Editors and Affiliations

A Appendix A: Tested Parser Details

A Appendix A: Tested Parser Details

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reynolds, J., Bates, A., Bailey, M. (2022). Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13556. Springer, Cham. https://doi.org/10.1007/978-3-031-17143-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17143-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17142-0

  • Online ISBN: 978-3-031-17143-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics