Abstract
Uniform Resource Locators (URLs) are integral to the Web and have existed for nearly three decades. Yet URL parsing differs subtly among parser implementations, leading to ambiguity that can be abused by attackers. We measure agreement between widely-used URL parsers and find that each has made design decisions that deviate from parsing standards, creating a fractured implementation space where assumptions of uniform interpretation are unreliable. In some cases, deviations are severe enough that clients using different parsers will make requests to different hosts based on a single, “equivocal” URL. We systematize the thousands of differences we observed into seven pitfalls in URL parsing that application developers should beware of. We demonstrate that this ambiguity can be weaponized through misdirection attacks that evade the Google Safe Browsing and VirusTotal URL classifiers. URL parsing libraries have made a tradeoff to favor permissiveness over strict standards adherence. We hope this work will motivate the systemic adoption of a more unified URL parsing standard–enabling a more secure Web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexa top 1,000,000 sites. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
parse_url. php.net (2021)
Uniform resource identifier (uri) schemes. IANA (2021)
Urlhaus. abuse.ch (2021)
Google safe browsing (2022). https://safebrowsing.google.com/
Unicode character database: Unicodedata.txt (2022). https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
Url: Living standard. Web Hypertext Application Technology Working Group (2022)
Virus total (2022). www.virustotal.com
Ahmed: url-parse package return wrong hostname. Hackerone (2018)
Akhawe, D., Felt, A.P.: Alice in warningland: a large-scale field study of browser security warning effectiveness. In: 22nd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2013) (2013)
Albakry, S., Vaniea, K., Wolters, M.K.: What is this url’s destination? empirical evaluation of users’ url reading. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)
Aljofey, A., Jiang, Q., Qu, Q., Huang, M., Niyigena, J.P.: An effective phishing detection model based on character level convolutional neural network from url. Electronics 9, 1514 (2020)
Alsharnouby, M., Alaca, F., Chiasson, S.: Why phishing still works: user strategies for combating phishing attacks. Int. J. Hum.-Comput. Stud. 82, 69–82 (2015)
Althobaiti, K., Rummani, G., Vaniea, K.: A review of human-and computer-facing url phishing features. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW). IEEE (2019)
Anitha, A., Gudivada, K.S., Rakshitha Lakshmi, M., Kumari, S., Usha, C.: Identifying phishing websites through url parsing (2019)
Athanasopoulos, E., Kemerlis, V.P., Polychronakis, M., Markatos, E.P.: ARC: protecting against HTTP parameter pollution attacks using application request caches. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 400–417. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_24
Balduzzi, M., Gimenez, C.T., Balzarotti, D., Kirda, E.: Automated discovery of parameter pollution vulnerabilities in web applications. In: NDSS (2011)
Berners-Lee, T.: Rfc 1630: universal resource identifiers in www: a unifying syntax for the expression of names and addresses of objects on the network as used in the world-wide web (1994)
Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 2396: uniform resource identifiers (uri): generic syntax (1998)
Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 3986: uniform resource identifier (uri): generic syntax (2005)
Berners-Lee, T., Masinter, L., McCahill, M.: Rfc 1738: uniform resource locators (url) (1994)
Canova, G., Volkamer, M., Bergmann, C., Reinheimer, B.: Nophish app evaluation: lab and retention study. In: NDSS workshop on usable security (2015)
Cao, Y., Wei, Q., Wang, Q.: Parameter pollution vulnerabilities detection study based on tree edit distance. In: Chim, T.W., Yuen, T.H. (eds.) ICICS 2012. LNCS, vol. 7618, pp. 392–399. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34129-8_37
Carettoni, L., di Paola, S.: Http parameter pollution. OWASP AppSec Europe (2009)
Costello, A.: Rfc 3492: Punycode: a bootstring encoding of unicode for internationalized domain names in applications (idna) (2003)
Dhamija, R., Tygar, J.D., Hearst, M.: Why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in computing systems (2006)
Duerst, M., Suignard, M.: Rfc 3987: internationalized resource identifiers (iris) (2005)
Durumeric, Z., Adrian, D., Mirian, A., Bailey, M., Halderman, J.A.: A search engine backed by Internet-wide scanning. In: 22nd ACM Conference on Computer and Communications Security (2015)
Felt, A.P., et al.: Improving SSL warnings: comprehension and adherence. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 2893–2902 (2015)
Felt, A.P., et al.: Rethinking connection security indicators. In: Twelfth Symposium on Usable Privacy and Security (\(\{\)SOUPS\(\}\) 2016) (2016)
Felt, A.P., Reeder, R.W., Almuhimedi, H., Consolvo, S.: Experimenting at scale with google chrome’s ssl warning. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014)
Fielding, R.: Rfc 1808: relative uniform resource locators (1995)
Hinden, R., Carpenter, B., Masinter, L.: Format for literal ipv6 addresses in url’s. Technical report, RFC 2732 (1999)
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Kettle, J.: Practical web cache poisoning. Port Swigger (2018)
Kreymer, I., Chuang, G.: Announcing the common crawl index! (2015)
Kumaraguru, P., et al.: School of phish: a real-world evaluation of anti-phishing training. In: Proceedings of the 5th Symposium on Usable Privacy and Security (2009)
Kumaraguru, P., et al.: Getting users to pay attention to anti-phishing education: evaluation of retention and transfer. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit (2007)
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching johnny not to fall for phish. ACM Trans. Internet Technol. (TOIT) 10, 1–31 (2010)
Kunze, J.: Rfc 1736: functional recommendations for internet resource locators (1995)
Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: IEEE INFOCOM. IEEE (2011)
Leitschuh, J.: Ssrf via maliciously crafted url due to host confusion. Hackerone (2019)
Ma, Z., et al.: The impact of secure transport protocols on phishing efficacy. In: 12th \(\{\)USENIX\(\}\) Workshop on Cyber Security Experimentation and Test (\(\{\)CSET\(\}\) 2019) (2019)
Marchal, S., Armano, G., Gröndahl, T., Saari, K., Singh, N., Asokan, N.: Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66, 1717–1733 (2017)
Marlinspike, M.: More tricks for defeating ssl in practice. Black Hat USA (2009)
Muñoz, F.: Invalid url parsing with #. CVE-2016-8624 (2016)
Parekh, S., Parikh, D., Kotak, S., Sankhe, S.: A new method for detection of phishing websites: Url detection. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE (2018)
Postel, J.: Rfc: 761 ien: 129 (1980)
Reynolds, J., et al.: Measuring identity confusion with uniform resource locators. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from urls. Expert Syst. Appl. 117, 345–357 (2019)
Sheng, S., et al.: Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security (2007)
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE symposium on security and privacy. IEEE (2011)
Thompson, C., Shelton, M., Stark, E., Walker, M., Schechter, E., Felt, A.P.: The web’s identity crisis: understanding the effectiveness of website identity indicators. In: 28th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2019) (2019)
Thompson, M.: The harmful consequences of the robustness principle (draft) (2018)
Tom: Security: uxss in chrome on ios. bugs.chromium.org (2018)
Tsai, O.: A new era of ssrf - exploiting url parser in trending programming languages! Black Hat USA (2017)
Tsai, O.: Breaking parser logic! take your path normalization off and pop 0days out. Black Hat USA (2018)
Vishwanath, A., Herath, T., Chen, R., Wang, J., Rao, H.R.: Why do people get phished? testing individual differences in phishing vulnerability within an integrated, information processing model. Decis. Supp. Syst. 51, 576–586 (2011)
Wang, X., Lau, W.C., Yang, R., Shi, S.: Make redirection evil again: Url parser issues in oauth. Black Hat Asia (2019)
Weber, C.: (2022). https://websec.github.io/unicode-security-guide/
Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Human-centric Comput. Inf. Sci. 7, 1–13 (2017)
Acknowledgements
This work was partially supported by the NSF under grants GR0005987 and CNS 1955228. We thank our anonymous peer reviewers as well as Zane Ma, Joshua Mason, Kent Seamons, Jay Misra, Kaylia M. Reynolds, Deepak Kumar, and Paul Murley for their feedback and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix A: Tested Parser Details
A Appendix A: Tested Parser Details
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Reynolds, J., Bates, A., Bailey, M. (2022). Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13556. Springer, Cham. https://doi.org/10.1007/978-3-031-17143-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-17143-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17142-0
Online ISBN: 978-3-031-17143-7
eBook Packages: Computer ScienceComputer Science (R0)