Advertisement

Knockin’ on Trackers’ Door: Large-Scale Automatic Analysis of Web Tracking

  • Iskander Sanchez-RolaEmail author
  • Igor Santos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10885)

Abstract

In this paper, we present the first generic large-scale analysis of different known and unknown web tracking scripts on the Internet to understand its current ecosystem and their behavior. To this end, we implemented TrackingInspector the first automatic method capable of detecting generically different types of web tracking scripts. This method automatically retrieves the existing scripts from a website and, through code similarity and machine learning, detects modifications of known tracking scripts and discovers unknown tracking script candidates.

TrackingInspector analyzed the Alexa top 1M websites, computing the web tracking prevalence and its ecosystem, as well as the influence of hosting, website category, and website reputation. More than 90% websites performed some sort of tracking and more than 50% scripts were used for web tracking. Over 2,000,000 versions of known tracking scripts were found. We discovered several script renaming techniques used to avoid blacklists, performing a comprehensive analysis of them. 5,500,000 completely unknown likely tracking scripts were found, including more than 700 new different potential device fingerprinting unique scripts. Our system also automatically detected the fingerprinting behavior of a previously reported targeted fingerprinting-driven malware campaign in two different websites not previously documented.

Keywords

Device fingerprinting Privacy Web tracking 

Notes

Acknowledgments

We would like to thank the reviewers for their insightful comments and our shepherd Nick Nikiforakis for his assistance to improve the quality of this paper. This work is partially supported by the Basque Government under a pre-doctoral grant given to Iskander Sanchez-Rola.

References

  1. 1.
    Abine: Tracking list, October 2017. https://www.abine.com/index.html
  2. 2.
    Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A., Diaz, C.: The web never forgets: persistent tracking mechanisms in the wild. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) (2014)Google Scholar
  3. 3.
    Acar, G., Juarez, M., Nikiforakis, N., Diaz, C., Gürses, S., Piessens, F., Preneel, B.: FPDetective: dusting the web for fingerprinters. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) (2013)Google Scholar
  4. 4.
    AdblockPlus: EasyPrivacy, October 2017. https://easylist.adblockplus.org/
  5. 5.
    AdGuard: Tracking list, October 2017. https://adguard.com/
  6. 6.
    Blocksi: Web content filtering, October 2017. http://www.blocksi.net/
  7. 7.
    Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web (2011)Google Scholar
  8. 8.
    Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the International Symposium on Software Testing and Analysis (2012)Google Scholar
  9. 9.
    Cliqz: Ghostery, October 2017. https://www.ghostery.com/
  10. 10.
    Cloudacl: Web security service, October 2017. http://www.cloudacl.com/
  11. 11.
    Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious Javascript code. In: Proceedings of the 19th International Conference on World Wide Web (2010)Google Scholar
  12. 12.
    Curtsinger, C., Livshits, B., Zorn, B.G., Seifert, C.: ZOZZLE: fast and precise in-browser JavaScript malware detection. In: Proceedings of the USENIX Security Symposium (SEC) (2011)Google Scholar
  13. 13.
    NerdyData: Search engine for source code, October 2017. https://nerdydata.com/
  14. 14.
    Disconnect: Tracking list, October 2017. https://disconnect.me/
  15. 15.
    Eckersley, P.: How unique is your web browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14527-8_1CrossRefGoogle Scholar
  16. 16.
    van Eijk, R.: Tracking detection system (TDS), October 2017. https://github.com/rvaneijk/ruleset-for-AdBlock
  17. 17.
    Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS) (2016)Google Scholar
  18. 18.
    Englehardt, S., Narayanan, A.: OpenWPM, October 2017. https://github.com/citp/OpenWPM
  19. 19.
    Fanboy: Tracking list, October 2017. https://www.fanboy.co.nz/
  20. 20.
    Fifield, D., Egelman, S.: Fingerprinting web users through font metrics. In: Böhme, R., Okamoto, T. (eds.) FC 2015. LNCS, vol. 8975, pp. 107–124. Springer, Heidelberg (2015).  https://doi.org/10.1007/978-3-662-47854-7_7CrossRefGoogle Scholar
  21. 21.
    FileWatcher: The file search engine, October 2017. http://www.filewatcher.com/
  22. 22.
    Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner Publishing Co., New York (1956)zbMATHGoogle Scholar
  23. 23.
    Fortinet: FortiGuard web filtering, October 2017. http://www.fortiguard.com/
  24. 24.
    Electronic Frontier Foundation: Privacy Badger, October 2017. https://www.eff.org/es/privacybadger
  25. 25.
    Mozilla Foundation: Public suffix list, October 2017. https://publicsuffix.org/list/
  26. 26.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Hidayat, A.: PhantomJS, October 2017. http://phantomjs.org/
  28. 28.
    Ho, T.K.: Random decision forests. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR) (1995)Google Scholar
  29. 29.
    Kamkar, S.: Evercookie, October 2017. https://github.com/samyk/evercookie
  30. 30.
    Kaspersky: Tracking list (ABBL), October 2017. http://forum.kaspersky.com/
  31. 31.
    Krishnamurthy, B., Wills, C.: Privacy diffusion on the web: a longitudinal perspective. In: Proceedings of the International Conference on World Wide Web (WWW) (2009)Google Scholar
  32. 32.
    Lerner, A., Simpson, A.K., Kohno, T., Roesner, F.: Internet jones and the raiders of the lost trackers: an archaeological study of web tracking from 1996 to 2016. In: Proceedings of the USENIX Security Symposium (SEC) (2016)Google Scholar
  33. 33.
    Lielmanis, E.: JS Beautifier, October 2017. http://jsbeautifier.org/
  34. 34.
    Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)MathSciNetCrossRefGoogle Scholar
  35. 35.
    MalwareBytes: Unusual exploit kit targets Chinese users (part 1). https://blog.malwarebytes.org/exploits-2/2015/05/unusual-exploit-kit-targets-chinese-users-part-1/. Accessed Oct 2017
  36. 36.
    MalwareBytes: Unusual exploit kit targets Chinese users (part 2). https://blog.malwarebytes.org/intelligence/2015/06/unusual-exploit-kit-targets-chinese-users-part-2/. Accessed Oct 2017
  37. 37.
    Mayer, J.R., Mitchell, J.C.: Third-party web tracking: policy and technology. In: Proceedings of the International Symposium on Security and Privacy, Oakland (2012)Google Scholar
  38. 38.
    MeanPath: The source code search engine, October 2017. https://meanpath.com/
  39. 39.
    Mowery, K., Shacham, H.: Pixel perfect: fingerprinting canvas in HTML5. In: Proceedings of the Web 2.0 Workshop on Security and Privacy (W2SP) (2012)Google Scholar
  40. 40.
    Nikiforakis, N., Invernizzi, L., Kapravelos, A., Van Acker, S., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: You are what you include: large-scale evaluation of remote Javascript inclusions. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS)Google Scholar
  41. 41.
    Nikiforakis, N., Kapravelos, A., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: Proceedings of IEEE Symposium on Security and Privacy, Oakland (2013)Google Scholar
  42. 42.
    Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Proceedings of the Annual Computer Security Applications Conference (CSS) (2010)Google Scholar
  43. 43.
    Roesner, F., Kohno, T., Wetherall, D.: Detecting and defending against third-party tracking on the web. In: Proceedings of the USENIX conference on Networked Systems Design and Implementation (NDSI) (2012)Google Scholar
  44. 44.
  45. 45.
    Selzer, A.: Beaverbird, October 2017. https://github.com/AlexanderSelzer/BeaverBird
  46. 46.
    Amazon Web Services: Alexa top sites, October 2017. https://aws.amazon.com/es/alexa-top-sites/
  47. 47.
    Shekyan, S., Vinegar, B., Zhang, B.: PhantomJS hide and seek, October 2017. https://github.com/ikarienator/phantomjs_hide_and_seek
  48. 48.
    Singer, N.: Do not track? Advertisers say “don’t tread on us” (2012). http://www.nytimes.com/2012/10/14/technology/do-not-track-movement-is-drawing-advertisers-fire.html
  49. 49.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRefGoogle Scholar
  50. 50.
    AVG Technologies: Privacyfix tracking list, October 2017. http://privacyfix.com
  51. 51.
    Threat Intelligence, FireEye: Pinpointing Targets: Exploiting Web Analytics to Ensnare Victims (2015). https://www2.fireeye.com/rs/848-DID-242/images/rpt-witchcoven.pdf
  52. 52.
    Web of Trust: Crowdsourced web safety, October 2017. https://www.mywot.com/
  53. 53.
    TrustArc: TRUSTe tracking list, October 2017. https://www.truste.com/
  54. 54.
    Vasilyev, V.: FingerprintJS, October 2017. https://github.com/Valve/fingerprintjs
  55. 55.
    Webutation: Open website reputation, October 2017. http://www.webutation.net/
  56. 56.
    Yamaguchi, F., Lindner, F., Rieck, K.: Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the USENIX Conference on Offensive Technologies (WOOT) (2011)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.DeustoTechUniversity of DeustoBilbaoSpain

Personalised recommendations