Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research

  • Walter RweyemamuEmail author
  • Tobias Lauinger
  • Christo Wilson
  • William Robertson
  • Engin Kirda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11419)


Top domain rankings (e.g., Alexa) are commonly used in security research, such as to survey security features or vulnerabilities of “relevant” websites. Due to their central role in selecting a sample of sites to study, an inappropriate choice or use of such domain rankings can introduce unwanted biases into research results. We quantify various characteristics of three top domain lists that have not been reported before. For example, the weekend effect in Alexa and Umbrella causes these rankings to change their geographical diversity between the workweek and the weekend. Furthermore, up to 91% of ranked domains appear in alphabetically sorted clusters containing up to 87k domains of presumably equivalent popularity. We discuss the practical implications of these findings, and propose novel best practices regarding the use of top domain lists in the security community.



This work was supported by Secure Business Austria and the National Science Foundation under grants CNS-1563320, CNS-1703454, and IIS-1553088.


  1. 1.
  2. 2.
    Amazon Alexa top sites.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Quantcast top websites.
  8. 8.
    Symantec BlueCoat WebPulse site review.
  9. 9.
    Alrwais, S., et al.: Under the shadow of sunshine: understanding and detecting bulletproof hosting on legitimate service provider networks. In: Security and Privacy Symposium (2017)Google Scholar
  10. 10.
    Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: EXPOSURE: finding malicious domains using passive DNS analysis. In: NDSS (2011)Google Scholar
  11. 11.
    Chen, Q.A., Osterweil, E., Thomas, M., Mao, Z.M.: MitM attack by name collision: cause analysis and vulnerability assessment in the new gTLD era. In: Security and Privacy Symposium (2016)Google Scholar
  12. 12.
    Chen, Q.A., et al.: Client-side name collision vulnerability in the new gTLD era: a systematic study. In: CCS (2017)Google Scholar
  13. 13.
    Durumeric, Z., Kasten, J., Bailey, M., Halderman, J.A.: Analysis of the HTTPS certificate ecosystem. In: IMC (2013)Google Scholar
  14. 14.
    Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: CCS (2016)Google Scholar
  15. 15.
    Heiderich, M., Frosch, T., Holz, T.: IceShield: detection and mitigation of malicious websites with a frozen DOM. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 281–300. Springer, Heidelberg (2011). Scholar
  16. 16.
    Hubbard, D.: Cisco Umbrella 1 million (2016).
  17. 17.
    Jones, D.: Majestic million CSV now free for all, daily (2012).
  18. 18.
    Larisch, J., Choffnes, D., Levin, D., Maggs, B.M., Mislove, A., Wilson, C.: CRLite: a scalable system for pushing all TLS revocations to all browsers. In: Security and Privacy Symposium (2017)Google Scholar
  19. 19.
    Lauinger, T., Chaabane, A., Arshad, S., Robertson, W., Wilson, C., Kirda, E.: Thou Shalt not depend on me: analysing the use of outdated JavaScript libraries on the Web. In: NDSS (2017)Google Scholar
  20. 20.
    Le Pochat, V., van Goethem, T., Tajalizadehkhoob, S., Korczynski, M., Joosen, W.: Rigging research results by manipulating top websites rankings. In: NDSS (2019)Google Scholar
  21. 21.
    Lee, S., Kim, J.: WarningBird: detecting suspicious URLs in Twitter stream. In: NDSS (2011)Google Scholar
  22. 22.
    Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., Antonakakis, M.: A lustrum of malware network communication: evolution and insights. In: Security and Privacy Symposium (2017)Google Scholar
  23. 23.
    Lever, C., Walls, R.J., Nadji, Y., Dagon, D., McDaniel, P., Antonakakis, M.: Domain-Z: 28 registrations later. In: Security and Privacy Symposium (2016)Google Scholar
  24. 24.
    Li, Z., Zhang, K., Xie, Y., Yu, F., Wang, X.: Knowing your enemy: understanding and detecting malicious web advertising. In: CCS (2012)Google Scholar
  25. 25.
    Lo, B.W.N., Sedhain, R.S.: How reliable are website rankings? Implications for e-business advertising and internet search. Issues Inf. Syst. 7(2), 233–238 (2006)Google Scholar
  26. 26.
    Nadji, Y., Antonakakis, M., Perdisci, R., Lee, W.: Connected colors: unveiling the structure of criminal networks. In: Stolfo, S.J., Stavrou, A., Wright, C.V. (eds.) RAID 2013. LNCS, vol. 8145, pp. 390–410. Springer, Heidelberg (2013). Scholar
  27. 27.
    Pearce, P., Ensafi, R., Li, F., Feamster, N., Paxson, V.: Augur: internet-wide detection of connectivity disruptions. In: Security and Privacy Symposium (2017)Google Scholar
  28. 28.
    Pitsillidis, A., Kanich, C., Voelker, G.M., Levchenko, K., Savage, S.: Taster’s choice: a comparative analysis of spam feeds. In: IMC (2012)Google Scholar
  29. 29.
    Felt, A.P., Barnes, R., King, A., Palmer, C., Bentzel, C., Tabriz, P.: Measuring HTTPS adoption on the Web. In: USENIX Security (2017)Google Scholar
  30. 30.
    Scheitle, Q., t al.: A long way to the top: significance, structure, and stability of internet top lists. In: IMC (2018)Google Scholar
  31. 31.
    Scheitle, Q., Jelten, J., Hohlfeld, O., Ciprian, L., Carle, G.: Structure and stability of internet top lists. In: eprint arXiv:1802.02651 [cs.NI] (2018)
  32. 32.
    Starov, O., Nikiforakis, N.: XHOUND: quantifying the fingerprintability of browser extensions. In: Security and Privacy Symposium (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Walter Rweyemamu
    • 1
    Email author
  • Tobias Lauinger
    • 1
  • Christo Wilson
    • 1
  • William Robertson
    • 1
  • Engin Kirda
    • 1
  1. 1.Northeastern UniversityBostonUSA

Personalised recommendations