Skip to main content

Funny Accents: Exploring Genuine Interest in Internationalized Domain Names

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2019)

Abstract

International Domain Names (IDNs) were introduced to support non-ASCII characters in domain names. In this paper, we explore IDNs that hold genuine interest, i.e. that owners of brands with diacritical marks may want to register and use. We generate 15 276 candidate IDNs from the page titles of popular domains, and see that 43% are readily available for registration, allowing for spoofing or phishing attacks. Meanwhile, 9% are not allowed by the respective registry to be registered, preventing brand owners from owning the IDN. Based on WHOIS records, DNS records and a web crawl, we estimate that at least 50% of the 3 189 registered IDNs have the same owner as the original domain, but that 35% are owned by a different entity, mainly domain squatters; malicious activity was not observed. Finally, we see that application behavior toward these IDNs remains inconsistent, hindering user experience and therefore widespread uptake of IDNs, and even uncover a phishing vulnerability in iOS Mail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tranco-list.eu/list/RQ4M/1000000.

  2. 2.

    The other deviations are the Greek , converted to in IDNA2003, and the zero width non-joiner and joiner, both deleted by the IDNA2003 Punycode algorithm.

References

  1. IDN in Google Chrome. https://dev.chromium.org/developers/design-documents/idn-in-google-chrome

  2. Measuring the information society report 2017, vol. 1. Technical report, International Telecommunication Union (2017). https://www.itu.int/en/ITU-D/Statistics/Documents/publications/misr2017/MISR2017_Volume1.pdf

  3. Agten, P., Joosen, W., Piessens, F., Nikiforakis, N.: Seven months’ worth of mistakes: a longitudinal study of typosquatting abuse. In: 22nd Annual Network and Distributed System Security Symposium. Internet Society (2015). https://doi.org/10.14722/ndss.2015.23058

  4. Apple Inc.: About the security content of iOS 12.1.1, December 2018. https://support.apple.com/en-us/HT209340

  5. Braden, R.: Requirements for internet hosts - application and support. RFC 1123, October 1989

    Google Scholar 

  6. Canadian Internet Registration Authority: Domains with French accented characters, January 2018. https://cira.ca/register-your-ca/domains-french-accented-characters

  7. Carletti, S.: Ruby Whois. https://whoisrb.org/

  8. Chronicle: VirusTotal. https://www.virustotal.com

  9. Clayton, R., Mansfield, T.: A study of Whois privacy and proxy service abuse. In: 13th Annual Workshop on the Economics of Information Security (2014)

    Google Scholar 

  10. Costello, A.: Punycode: a bootstring encoding of Unicode for internationalized domain names in applications (IDNA). RFC 3492, March 2003

    Google Scholar 

  11. CZ.NIC: Czechs refused diacritics in domain names again, February 2017. https://www.nic.cz/page/3499/czechs-refused-diacritics-in-domain-names-again/

  12. Davis, M., Suignard, M.: Unicode IDNA compatibility processing. Technical Standard 46, The Unicode Consortium, May 2018. https://www.unicode.org/reports/tr46/

  13. DENIC: DENIC putting extensive changes into force for .DE Whois Lookup Service by 25 May 2018, May 2018. https://www.denic.de/en/whats-new/press-releases/article/denic-putting-extensive-changes-into-force-for-de-whois-lookup-service-as-of-25-may-2018/

  14. Dhamija, R., Tygar, J.D., Hearst, M.: Why phishing works. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 581–590. ACM (2006). https://doi.org/10.1145/1124772.1124861

  15. Dinaburg, A.: Bitsquatting: DNS hijacking without exploitation. White Paper #2011-307, Raytheon Company (2011)

    Google Scholar 

  16. Edelman, B.: Large-scale registration of domains with typographical errors. Technical report, Berkman Center for Internet & Society - Harvard Law School, September 2003. http://cyber.law.harvard.edu/people/edelman/typo-domains

  17. Eskandari, S., Leoutsarakos, A., Mursch, T., Clark, J.: A first look at browser-based cryptojacking. In: 3rd IEEE European Symposium on Security and Privacy Workshops - Security on Blockchains, pp. 58–66 (2018). https://doi.org/10.1109/EuroSPW.2018.00014

  18. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  19. EURid, UNESCO: World report on internationalised domain names 2018, August 2018. https://idnworldreport.eu/2018-2

  20. Faltstrom, P., Hoffman, P., Costello, A.: Internationalizing domain names in applications (IDNA). RFC 3490, March 2003

    Google Scholar 

  21. Gabrilovich, E., Gontmakher, A.: The homograph attack. Commun. ACM 45(2), 128 (2002). https://doi.org/10.1145/503124.503156

    Article  Google Scholar 

  22. GoDaddy: The GoDaddy API. https://developer.godaddy.com/

  23. Google: Safe Browsing. https://safebrowsing.google.com/

  24. Hannay, P., Baatard, G.: The 2011 IDN homograph attack mitigation survey. In: International Conference on Security and Management, pp. 653–657 (2012)

    Google Scholar 

  25. Hannay, P., Bolan, C.: An assessment of internationalised domain name homograph attack mitigation implementations. In: 7th Australian Information Security Management Conference (2009). https://doi.org/10.4225/75/57b405aa30dee

  26. Hannay, P., Bolan, C.: The 2010 IDN homograph attack mitigation survey. In: International Conference on Security and Management, pp. 611–614 (2010)

    Google Scholar 

  27. Harrenstien, K., Stahl, M., Feinler, E.: DoD internet host table specification. RFC 952, October 1985

    Google Scholar 

  28. Holgers, T., Watson, D.E., Gribble, S.D.: Cutting through the confusion: a measurement study of homograph attacks. In: USENIX Annual Technical Conference, pp. 261–266. USENIX Association (2006)

    Google Scholar 

  29. IDN Guidelines Working Group: Guidelines for the implementation of internationalized domain names, version 4.0, May 2018. https://www.icann.org/en/system/files/files/idn-guidelines-10may18-en.pdf

  30. Internet Assigned Numbers Authority: Repository of IDN Practices. https://www.iana.org/domains/idn-tables

  31. Internet Corporation for Assigned Names and Numbers: Label Generation Rules Tool. https://www.icann.org/resources/pages/lgr-toolset-2015-06-21-en

  32. Internet Corporation for Assigned Names and Numbers: Data Protection/privacy Issues, July 2017. https://www.icann.org/dataprotectionprivacy

  33. Kharraz, A., Robertson, W., Kirda, E.: Surveylance: automatically detecting online survey scams. In: 39th IEEE Symposium on Security and Privacy, pp. 70–86 (2018). https://doi.org/10.1109/SP.2018.00044

  34. Kintis, P., et al.: Hiding in plain sight: a longitudinal study of combosquatting abuse. In: 24th ACM SIGSAC Conference on Computer and Communications Security, pp. 569–586. ACM (2017). https://doi.org/10.1145/3133956.3134002

  35. Klensin, J.: Internationalized domain names for applications (IDNA): definitions and document framework. RFC 5890, August 2010

    Google Scholar 

  36. Korczyński, M., et al.: Cybercrime after the sunrise: a statistical analysis of DNS abuse in new gTLDs. In: 13th Asia Conference on Computer and Communications Security, pp. 609–623. ACM (2018). https://doi.org/10.1145/3196494.3196548

  37. Krawetz, N.: Looks like it, May 2011. https://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html

  38. Larsen, C., van der Horst, T.: Bad guys using internationalized domain names (IDNs), May 2014. https://www.symantec.com/connect/blogs/bad-guys-using-internationalized-domain-names-idns

  39. Le Pochat, V., Van Goethem, T., Tajalizadehkhoob, S., Korczyński, M., Joosen, W.: Tranco: a research-oriented top sites ranking hardened against manipulation. In: 26th Annual Network and Distributed System Security Symposium, February 2019. https://doi.org/10.14722/ndss.2019.23386

  40. Levine, J., Hoffman, P.: Variants in second-level names registered in top-level domains. RFC 6927, May 2013

    Google Scholar 

  41. Liu, B., et al.: A reexamination of internationalized domain names: the good, the bad and the ugly. In: 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 654–665 (2018). https://doi.org/10.1109/DSN.2018.00072

  42. Liu, S., Foster, I., Savage, S., Voelker, G.M., Saul, L.K.: Who is .com?: learning to parse WHOIS records. In: Internet Measurement Conference, pp. 369–380. ACM (2015). https://doi.org/10.1145/2815675.2815693

  43. Lv, P., Ya, J., Liu, T., Shi, J., Fang, B., Gu, Z.: You have more abbreviations than you know: a study of AbbrevSquatting abuse. In: Shi, Y., et al. (eds.) ICCS 2018. LNCS, vol. 10860, pp. 221–233. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93698-7_17

    Chapter  Google Scholar 

  44. Markham, G.: IDN display algorithm, April 2017. https://wiki.mozilla.org/IDN_Display_Algorithm

  45. McElroy, T., Hannay, P., Baatard, G.: The 2017 homograph browser attack mitigation survey. In: 15th Australian Information Security Management Conference, pp. 88–96 (2017). https://doi.org/10.4225/75/5a84f5a495b4d

  46. Mockapetris, P.: Domain names - concepts and facilities. RFC 1034, November 1987

    Google Scholar 

  47. Moore, T., Edelman, B.: Measuring the perpetrators and funders of typosquatting. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 175–191. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14577-3_15

    Chapter  Google Scholar 

  48. Nikiforakis, N., Balduzzi, M., Desmet, L., Piessens, F., Joosen, W.: Soundsquatting: uncovering the use of homophones in domain squatting. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 291–308. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13257-0_17

    Chapter  Google Scholar 

  49. Nikiforakis, N., et al.: Stranger danger: exploring the ecosystem of ad-based URL shortening services. In: 23rd International Conference on World Wide Web, pp. 51–62. ACM (2014). https://doi.org/10.1145/2566486.2567983

  50. Nikiforakis, N., Van Acker, S., Meert, W., Desmet, L., Piessens, F., Joosen, W.: Bitsquatting: exploiting bit-flips for fun, or profit? In: 22nd International Conference on World Wide Web, pp. 989–998. ACM (2013). https://doi.org/10.1145/2488388.2488474

  51. Nominet: .wales and .cymru domains - IDN policy, August 2015. https://nominet-prod.s3.amazonaws.com/wp-content/uploads/2015/08/CymruWalesIDNPolicy_0.pdf

  52. Núcleo de Informação e Coordenação do Ponto BR: Regras do domínio. https://registro.br/dominio/regras.html

  53. OpenDNS: PhishTank. https://www.phishtank.com

  54. Rüth, J., Zimmermann, T., Wolsing, K., Hohlfeld, O.: Digging into browser-based crypto mining. In: Internet Measurement Conference, pp. 70–76. ACM (2018). https://doi.org/10.1145/3278532.3278539

  55. Scheitle, Q., et al.: A long way to the top: significance, structure, and stability of Internet top lists. In: Internet Measurement Conference, pp. 478–493. ACM (2018). https://doi.org/10.1145/3278532.3278574

  56. Schiffman, M.: Global internationalized domain name homograph report, Q2/2018. Technical report, Farsight Security, June 2018

    Google Scholar 

  57. Shin, J.: Establish a process to update “top domain” skeleton list for confusability check, May 2017. https://bugs.chromium.org/p/chromium/issues/detail?id=722022

  58. Shin, J.: Mitigate spoofing attempt using Latin letters, April 2017. https://codereview.chromium.org/2784933002

  59. Sommers, J.: On the characteristics of language tags on the web. In: Beverly, R., Smaragdakis, G., Feldmann, A. (eds.) PAM 2018. LNCS, vol. 10771, pp. 18–30. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76481-8_2

    Chapter  Google Scholar 

  60. Spamhaus Project: The domain block list. https://www.spamhaus.org/dbl/

  61. Spaulding, J., Upadhyaya, S., Mohaisen, A.: The landscape of domain name typosquatting: techniques and countermeasures. In: 11th International Conference on Availability, Reliability and Security, pp. 284–289 (2016). https://doi.org/10.1109/ARES.2016.84

  62. SURBL: SURBL URI reputation data. http://www.surbl.org/

  63. Szurdi, J., Kocso, B., Cseh, G., Spring, J., Felegyhazi, M., Kanich, C.: The long “taile;; of typosquatting domain names. In: 23rd USENIX Security Symposium, pp. 191–206. USENIX Association (2014)

    Google Scholar 

  64. The Unicode Consortium: Unicode transliteration guidelines. http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines

  65. The Unicode Consortium: The Unicode Standard, Version 11.0.0 (2018). http://www.unicode.org/versions/Unicode11.0.0/

  66. Tian, K., Jan, S.T.K., Hu, H., Yao, D., Wang, G.: Needle in a haystack: tracking down elite phishing domains in the wild. In: Internet Measurement Conference, pp. 429–442. ACM (2018). https://doi.org/10.1145/3278532.3278569

  67. Vissers, T., Barron, T., Van Goethem, T., Joosen, W., Nikiforakis, N.: The wolf of name street: hijacking domains through their nameservers. In: 24th ACM SIGSAC Conference on Computer and Communications Security, pp. 957–970. ACM (2017). https://doi.org/10.1145/3133956.3133988

  68. Vissers, T., Joosen, W., Nikiforakis, N.: Parking sensors: analyzing and detecting parked domains. In: 22nd Annual Network and Distributed System Security Symposium. Internet Society (2015)

    Google Scholar 

  69. Wang, Y.M., Beck, D., Wang, J., Verbowski, C., Daniels, B.: Strider typo-patrol: discovery and analysis of systematic typo-squatting. In: 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet, pp. 31–36. USENIX Association (2006)

    Google Scholar 

  70. Wood, P., Johnston, N.: Spammers taking advantage of IDN with URL shortening services, February 2011. https://www.symantec.com/connect/blogs/spammers-taking-advantage-idn-url-shortening-services

  71. Zheng, X.: Phishing with Unicode domains, April 2017. https://www.xudongz.com/blog/2017/idn-phishing/

Download references

Acknowlegdments

We would like to thank our shepherd Ignacio Castro for his valuable feedback, and Gertjan Franken and Katrien Janssens for their help in the user agent survey. This research is partially funded by the Research Fund KU Leuven. Victor Le Pochat holds a PhD Fellowship of the Research Foundation - Flanders (FWO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Le Pochat .

Editor information

Editors and Affiliations

Appendices

A Common Character Substitutions

 

Original

ä

ö

ü

ß

æ

ø

å

œ

þ

Substitution

ae

oe

ue

ss

ae

oe

aa

oe

th

B Tested User Agent Versions

 

 

Client

Version

Operating system

Browser desktop

Google Chrome

69.0.3497.100

Ubuntu Linux 18.04.1

Firefox

62.0

Ubuntu Linux 18.04.1

Safari

12.0.1 (13606.2.100)

macOS 10.13.6 (17G65)

Opera

55.0.2994.61

Ubuntu Linux 18.04.1

Internet Explorer

11.0.9600.18894

Windows 8.1

Microsoft Edge

42.17134.1.0

Windows 10 17.17134

Browser mobile

Google Chrome

69.0.3497.100

Android 7.0.0

Safari

iOS 12.0 (16A366)

Firefox

62.0.2

Android 7.0.0

UC Browser

12.9.3.1144

Android 7.0.0

Samsung Internet

7.4.00.70

Android 7.0.0

Opera

47.3.2249.130976

Android 7.0.0

Microsoft Edge

42.0.0.2529

Android 7.0.0

Email desktop

Outlook 2016

16.0.4738.1000

Windows 10 17.17134

macOS Mail

11.5 (3445.9.1)

macOS 10.13.6 (17G65)

Thunderbird

52.9.1

Ubuntu Linux 18.04.1

Email mobile

Gmail

8.9.9.213351932

Android 7.0.0

Outlook

2.2.219

Android 7.0.0

iOS Mail

iOS 12.0 (16A366)

  

iOS 12.1.2 (16C104)

Webmail

Gmail

Yahoo

Yandex

Outlook

RoundCube

1.2.9

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Le Pochat, V., Van Goethem, T., Joosen, W. (2019). Funny Accents: Exploring Genuine Interest in Internationalized Domain Names. In: Choffnes, D., Barcellos, M. (eds) Passive and Active Measurement. PAM 2019. Lecture Notes in Computer Science(), vol 11419. Springer, Cham. https://doi.org/10.1007/978-3-030-15986-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15986-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15985-6

  • Online ISBN: 978-3-030-15986-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics