Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations

Reynolds, Joshua; Bates, Adam; Bailey, Michael

doi:10.1007/978-3-031-17143-7_9

Joshua Reynolds^11,12,
Adam Bates¹² &
Michael Bailey^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13556))

Included in the following conference series:

European Symposium on Research in Computer Security

2379 Accesses
1 Citations

Abstract

Uniform Resource Locators (URLs) are integral to the Web and have existed for nearly three decades. Yet URL parsing differs subtly among parser implementations, leading to ambiguity that can be abused by attackers. We measure agreement between widely-used URL parsers and find that each has made design decisions that deviate from parsing standards, creating a fractured implementation space where assumptions of uniform interpretation are unreliable. In some cases, deviations are severe enough that clients using different parsers will make requests to different hosts based on a single, “equivocal” URL. We systematize the thousands of differences we observed into seven pitfalls in URL parsing that application developers should beware of. We demonstrate that this ambiguity can be weaponized through misdirection attacks that evade the Google Safe Browsing and VirusTotal URL classifiers. URL parsing libraries have made a tradeoff to favor permissiveness over strict standards adherence. We hope this work will motivate the systemic adoption of a more unified URL parsing standard–enabling a more secure Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexa top 1,000,000 sites. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
parse_url. php.net (2021)
Google Scholar
Uniform resource identifier (uri) schemes. IANA (2021)
Google Scholar
Urlhaus. abuse.ch (2021)
Google Scholar
Google safe browsing (2022). https://safebrowsing.google.com/
Unicode character database: Unicodedata.txt (2022). https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
Url: Living standard. Web Hypertext Application Technology Working Group (2022)
Google Scholar
Virus total (2022). www.virustotal.com
Ahmed: url-parse package return wrong hostname. Hackerone (2018)
Google Scholar
Akhawe, D., Felt, A.P.: Alice in warningland: a large-scale field study of browser security warning effectiveness. In: 22nd \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2013) (2013)
Google Scholar
Albakry, S., Vaniea, K., Wolters, M.K.: What is this url’s destination? empirical evaluation of users’ url reading. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)
Google Scholar
Aljofey, A., Jiang, Q., Qu, Q., Huang, M., Niyigena, J.P.: An effective phishing detection model based on character level convolutional neural network from url. Electronics 9, 1514 (2020)
Article Google Scholar
Alsharnouby, M., Alaca, F., Chiasson, S.: Why phishing still works: user strategies for combating phishing attacks. Int. J. Hum.-Comput. Stud. 82, 69–82 (2015)
Article Google Scholar
Althobaiti, K., Rummani, G., Vaniea, K.: A review of human-and computer-facing url phishing features. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW). IEEE (2019)
Google Scholar
Anitha, A., Gudivada, K.S., Rakshitha Lakshmi, M., Kumari, S., Usha, C.: Identifying phishing websites through url parsing (2019)
Google Scholar
Athanasopoulos, E., Kemerlis, V.P., Polychronakis, M., Markatos, E.P.: ARC: protecting against HTTP parameter pollution attacks using application request caches. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 400–417. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_24
Chapter Google Scholar
Balduzzi, M., Gimenez, C.T., Balzarotti, D., Kirda, E.: Automated discovery of parameter pollution vulnerabilities in web applications. In: NDSS (2011)
Google Scholar
Berners-Lee, T.: Rfc 1630: universal resource identifiers in www: a unifying syntax for the expression of names and addresses of objects on the network as used in the world-wide web (1994)
Google Scholar
Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 2396: uniform resource identifiers (uri): generic syntax (1998)
Google Scholar
Berners-Lee, T., Fielding, R., Masinter, L.: Rfc 3986: uniform resource identifier (uri): generic syntax (2005)
Google Scholar
Berners-Lee, T., Masinter, L., McCahill, M.: Rfc 1738: uniform resource locators (url) (1994)
Google Scholar
Canova, G., Volkamer, M., Bergmann, C., Reinheimer, B.: Nophish app evaluation: lab and retention study. In: NDSS workshop on usable security (2015)
Google Scholar
Cao, Y., Wei, Q., Wang, Q.: Parameter pollution vulnerabilities detection study based on tree edit distance. In: Chim, T.W., Yuen, T.H. (eds.) ICICS 2012. LNCS, vol. 7618, pp. 392–399. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34129-8_37
Chapter Google Scholar
Carettoni, L., di Paola, S.: Http parameter pollution. OWASP AppSec Europe (2009)
Google Scholar
Costello, A.: Rfc 3492: Punycode: a bootstring encoding of unicode for internationalized domain names in applications (idna) (2003)
Google Scholar
Dhamija, R., Tygar, J.D., Hearst, M.: Why phishing works. In: Proceedings of the SIGCHI conference on Human Factors in computing systems (2006)
Google Scholar
Duerst, M., Suignard, M.: Rfc 3987: internationalized resource identifiers (iris) (2005)
Google Scholar
Durumeric, Z., Adrian, D., Mirian, A., Bailey, M., Halderman, J.A.: A search engine backed by Internet-wide scanning. In: 22nd ACM Conference on Computer and Communications Security (2015)
Google Scholar
Felt, A.P., et al.: Improving SSL warnings: comprehension and adherence. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 2893–2902 (2015)
Google Scholar
Felt, A.P., et al.: Rethinking connection security indicators. In: Twelfth Symposium on Usable Privacy and Security (\(\{\)SOUPS\(\}\) 2016) (2016)
Google Scholar
Felt, A.P., Reeder, R.W., Almuhimedi, H., Consolvo, S.: Experimenting at scale with google chrome’s ssl warning. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014)
Google Scholar
Fielding, R.: Rfc 1808: relative uniform resource locators (1995)
Google Scholar
Hinden, R., Carpenter, B., Masinter, L.: Format for literal ipv6 addresses in url’s. Technical report, RFC 2732 (1999)
Google Scholar
Jain, A.K., Gupta, B.B.: PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari, M.U., Agrawal, N., Saini, D. (eds.) Cyber Security. AISC, vol. 729, pp. 467–474. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8536-9_44
Chapter Google Scholar
Kettle, J.: Practical web cache poisoning. Port Swigger (2018)
Google Scholar
Kreymer, I., Chuang, G.: Announcing the common crawl index! (2015)
Google Scholar
Kumaraguru, P., et al.: School of phish: a real-world evaluation of anti-phishing training. In: Proceedings of the 5th Symposium on Usable Privacy and Security (2009)
Google Scholar
Kumaraguru, P., et al.: Getting users to pay attention to anti-phishing education: evaluation of retention and transfer. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit (2007)
Google Scholar
Kumaraguru, P., Sheng, S., Acquisti, A., Cranor, L.F., Hong, J.: Teaching johnny not to fall for phish. ACM Trans. Internet Technol. (TOIT) 10, 1–31 (2010)
Article Google Scholar
Kunze, J.: Rfc 1736: functional recommendations for internet resource locators (1995)
Google Scholar
Le, A., Markopoulou, A., Faloutsos, M.: Phishdef: url names say it all. In: IEEE INFOCOM. IEEE (2011)
Google Scholar
Leitschuh, J.: Ssrf via maliciously crafted url due to host confusion. Hackerone (2019)
Google Scholar
Ma, Z., et al.: The impact of secure transport protocols on phishing efficacy. In: 12th \(\{\)USENIX\(\}\) Workshop on Cyber Security Experimentation and Test (\(\{\)CSET\(\}\) 2019) (2019)
Google Scholar
Marchal, S., Armano, G., Gröndahl, T., Saari, K., Singh, N., Asokan, N.: Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66, 1717–1733 (2017)
Article MathSciNet Google Scholar
Marlinspike, M.: More tricks for defeating ssl in practice. Black Hat USA (2009)
Google Scholar
Muñoz, F.: Invalid url parsing with #. CVE-2016-8624 (2016)
Google Scholar
Parekh, S., Parikh, D., Kotak, S., Sankhe, S.: A new method for detection of phishing websites: Url detection. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE (2018)
Google Scholar
Postel, J.: Rfc: 761 ien: 129 (1980)
Google Scholar
Reynolds, J., et al.: Measuring identity confusion with uniform resource locators. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (2020)
Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from urls. Expert Syst. Appl. 117, 345–357 (2019)
Article Google Scholar
Sheng, S., et al.: Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security (2007)
Google Scholar
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE symposium on security and privacy. IEEE (2011)
Google Scholar
Thompson, C., Shelton, M., Stark, E., Walker, M., Schechter, E., Felt, A.P.: The web’s identity crisis: understanding the effectiveness of website identity indicators. In: 28th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 2019) (2019)
Google Scholar
Thompson, M.: The harmful consequences of the robustness principle (draft) (2018)
Google Scholar
Tom: Security: uxss in chrome on ios. bugs.chromium.org (2018)
Google Scholar
Tsai, O.: A new era of ssrf - exploiting url parser in trending programming languages! Black Hat USA (2017)
Google Scholar
Tsai, O.: Breaking parser logic! take your path normalization off and pop 0days out. Black Hat USA (2018)
Google Scholar
Vishwanath, A., Herath, T., Chen, R., Wang, J., Rao, H.R.: Why do people get phished? testing individual differences in phishing vulnerability within an integrated, information processing model. Decis. Supp. Syst. 51, 576–586 (2011)
Article Google Scholar
Wang, X., Lau, W.C., Yang, R., Shi, S.: Make redirection evil again: Url parser issues in oauth. Black Hat Asia (2019)
Google Scholar
Weber, C.: (2022). https://websec.github.io/unicode-security-guide/
Zouina, M., Outtaj, B.: A novel lightweight url phishing detection system using svm and similarity index. Human-centric Comput. Inf. Sci. 7, 1–13 (2017)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the NSF under grants GR0005987 and CNS 1955228. We thank our anonymous peer reviewers as well as Zane Ma, Joshua Mason, Kent Seamons, Jay Misra, Kaylia M. Reynolds, Deepak Kumar, and Paul Murley for their feedback and suggestions.

Author information

Authors and Affiliations

New Mexico State University, Las Cruces, USA
Joshua Reynolds
University of Illinois at Urbana-Champaign, Champaign, USA
Joshua Reynolds, Adam Bates & Michael Bailey
Georgia Institute of Technology, Atlanta, USA
Michael Bailey

Authors

Joshua Reynolds
View author publications
You can also search for this author in PubMed Google Scholar
Adam Bates
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bailey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua Reynolds .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Vijayalakshmi Atluri
Hamad Bin Khalifa University, Doha, Qatar
Roberto Di Pietro
Technical University of Denmark, Kongens Lyngby, Denmark
Christian D. Jensen
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng

A Appendix A: Tested Parser Details

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reynolds, J., Bates, A., Bailey, M. (2022). Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds) Computer Security – ESORICS 2022. ESORICS 2022. Lecture Notes in Computer Science, vol 13556. Springer, Cham. https://doi.org/10.1007/978-3-031-17143-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-17143-7_9
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17142-0
Online ISBN: 978-3-031-17143-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix A: Tested Parser Details

A Appendix A: Tested Parser Details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation