A Machine Learning Approach for Detecting Third-Party Trackers on the Web

  • Qianru Wu
  • Qixu Liu
  • Yuqing Zhang
  • Peng Liu
  • Guanxing Wen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9878)

Abstract

Nowadays, privacy violation caused by third-party tracking has become a serious problem and yet the most effective method to defend against third-party tracking is based on blacklists. Such method highly depends on the quality of the blacklist database, whose records need to be updated frequently. However, most records are curated manually and very difficult to maintain. To efficiently generate blacklists, we propose a system with high accuracy, named DMTrackerDetector, to detect third-party trackers automatically. Existing methods to detect online tracking have two shortcomings. Firstly, they treat first-party tracking and third-party tracking the same. Secondly, they always focus on a certain way of tracking and can only detect limited trackers. Since anti-tracking technology based on blacklists highly depends on the coverage of the blacklist database, these methods cannot generate high-quality blacklists. To solve these problems, we firstly use the structural hole theory to preserve first-party trackers, and only detect third-party trackers based on supervised machine learning by exploiting the fact that trackers and non-trackers always call different JavaScript APIs for different purposes. The results show that 97.8 % of the third-party trackers in our test set can be correctly detected. The blacklist generated by our system not only covers almost all records in the Ghostery list (one of the most popular anti-tracking tools), but also detects 35 unrevealed trackers.

References

  1. 1.
    Nikiforakis, N., Kapravelos, A., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: On the workings and current practices of web-based device fingerprinting. Secur. Priv. IEEE 12(3), 28–36 (2014)CrossRefGoogle Scholar
  2. 2.
    Qianru, W., Liu, Q., Zhang, Y., Wen, G.: Trackerdetector: A system to detect third-party trackers through machine learning. Comput. Netw. 91, 164–173 (2015)CrossRefGoogle Scholar
  3. 3.
    Bau, J., Mayer, J., Paskov, H., Mitchell, J.C.: A promising direction for web tracking countermeasures. In: Web, vol. 2 (2013)Google Scholar
  4. 4.
    Mayer, J.R., Mitchell, J.C.: Third-party web tracking: Policy and technology. In: 2012 IEEE Symposium on Security and Privacy (SP), pp. 413–427. IEEE (2012)Google Scholar
  5. 5.
  6. 6.
    Donottrackme: Online privacy protection (2016). https://addons.mozilla.org/zh-cn/firefox/addon/donottrackplus/
  7. 7.
  8. 8.
    Pan, X., Cao, Y., Chen, Y.: I do not know what you visited last summer: Protecting users from third-party web tracking with trackingfree browser (2015)Google Scholar
  9. 9.
    Nikiforakis, N., Kapravelos, A., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: Cookieless monster: Exploring the ecosystem of web-based devicefingerprinting. In: 2013 IEEE Symposium on Security and Privacy (SP), pp. 541–555. IEEE (2013)Google Scholar
  10. 10.
    Acar, G., Juarez, M., Nikiforakis, N., Diaz, C., Gürses, S., Piessens, F., Preneel, B.: Fpdetective: dusting the web for fingerprinters. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 1129–1140. ACM (2013)Google Scholar
  11. 11.
    Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A., Diaz, C.: The web never forgets: Persistent tracking mechanisms in the wild. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 674–689. ACM (2014)Google Scholar
  12. 12.
    Ayenson, M., Wambach, D.J., Soltani, A., Good, N., Hoofnagle, C.J.: Flash cookies and privacy ii: Now with html5 and etag respawning. Available at SSRN 1898390 (2011)Google Scholar
  13. 13.
    Soltani, A., Canty, S., Mayo, Q., Thomas, L., Hoofnagle, C.J.: Flash cookies and privacy. In: AAAI Spring Symposium: Intelligent Information Privacy Management (2010)Google Scholar
  14. 14.
    Eckersley, P.: How unique is your web browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Doubleclick ad exchange real-time bidding protocol: cookie matching. https://developers.google.com/ad-exchange/rtb/cookie-guide
  16. 16.
    Kohno, T., Roesner, F.: University of Washington David Wetherall. Detecting and defending against third-party tracking on the web. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (2012)Google Scholar
  17. 17.
  18. 18.
    Krishnamurthy, B., Naryshkin, K., Wills, C.: Privacy leakage vs. protection measures: the growing disconnect. Proc. Web 2, 1–10 (2011)Google Scholar
  19. 19.
  20. 20.
  21. 21.
  22. 22.
    Kushmerick, N.: Learning to remove internet advertisements. In: Proceedings of the Third Annual Conference on Autonomous Agents, pp. 175–181. ACM (1999)Google Scholar
  23. 23.
    Orr, C.R., Chauhan, A., Gupta, M., Frisz, C.J., Dunn, C.W.: An approach for identifying javascript-loaded advertisements through static program analysis. In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, pp. 1–12. ACM (2012)Google Scholar
  24. 24.
    Bhagavatula, S., Dunn, C., Kanich, C., Gupta, M., Ziebart, B.: Leveraging machine learning to improve unwanted resource filtering. In: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pp. 95–102. ACM (2014)Google Scholar
  25. 25.
    Yamada, A., Masanori, H., Miyake, Y.: Web tracking site detection based on temporal link analysis. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 626–631. IEEE (2010)Google Scholar
  26. 26.
    Mayer, J.: Tracking the trackers: Self-help tools. http://cyberlaw.stanford.edu/blog/2011/09/tracking-trackers-self-help-tools, 9 2011
  27. 27.
    Witten, I.H., Frank, E., Trigg, L.E., Hall, M.A., Holmes, G., Cunningham, S.J., Weka: Practical machine learning tools and techniques with java implementations (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Qianru Wu
    • 1
  • Qixu Liu
    • 2
  • Yuqing Zhang
    • 1
  • Peng Liu
    • 3
  • Guanxing Wen
    • 4
  1. 1.National Computer Network Intrusion Protection CenterUniversity of Chinese Academy of ScienceBeijingChina
  2. 2.Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  3. 3.College of Information Sciences and TechnologyPennsylvania State UniversityUniversity ParkUSA
  4. 4.Team PanguShanghaiChina

Personalised recommendations