Skip to main content
Log in

Using mixed methods to study the historical use of web beacons in web tracking

  • Original Paper
  • Published:
International Journal of Digital Humanities Aims and scope Submit manuscript

Abstract

Historical studies of the use of tracking technologies collecting data about web users and their behaviour can help us understand the spread and implications of web tracking. This article presents a historical study of the use of a specific tracking technology, the web beacon, on the Danish web from 2006 to 2015 using archived web materials from the national Danish web archive. The study combines a large-scale quantitative mapping of the use of web beacons on the Danish web with a qualitative study of specific websites. Using this mixed-method design, the article identifies the prevalent third-party domains setting web beacons and the different purposes for beacon use. The findings show the ratio of Danish to international third-party domains involved in the tracking and the development, over time, of what types of beacon providers are dominant on the Danish web. The article also addresses the methodological challenges related to using archived web for a mixed-method historical study of web tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

Notes

  1. A cryptographic hash function is used to assess whether two files are identical and in cases where the file is identical, instead of downloading and storing the file again, a log entry is made in the crawl log with a reference of where the file is already archived. This allows replay software like the Wayback Machine to include the file in replay of later websites using the file, even if it was archived at another point in time.

  2. It is very common to focus on third-party HTTP requests in studies of web tracking (e.g. Libert, 2015), since it is mostly the collection of data across websites (and devices) which is cause for concern.

  3. When looking at the top 500, there are no third-party domains that are only linked to from one other domain, but it includes domains with images embedded in only a few domains. In the top 100, 17 or more domains embed images from these domains.

  4. Top 100 TLDs and ccSLDs in alphabetical order: co.uk, com, cz, de, dk, info, is, net, nl, nu, org, ru, se, us.

  5. Top 500 TLDs and ccSLDs in alphabetical order: al, at, bg, biz, ca, cc, ch, cn, co, co.jp, co.uk, com, com.au, com.tw, cx, cz, de, dk, ee, es, eu, fm, fo, fr, hr., hu, in, info, is, it, jp, li, lt, lv, me.uk, ms, net, nl, no, nu, org, pl, pp.ua, pro, ru, se, ua, us, ws.

  6. The corpus combines the first broad crawl of the year (except in 2011 and 2012, where the second broad crawl is used) with selective crawls from the same period (or closest to it) to get the best possible coverage (Brügger et al., 2020).

  7. As mentioned, the number of domains with beacons from googleadservices.com in 2012 is a substantial outlier, which we have not so far been able to explain.

  8. Ei.dk is a strange case because clicking one of the captures either opens a login module or redirects to either danseek.dk captures (the first time I tried) or sedoparking.com/ei.dk (another time), even though I clicked what should be the same captures. This again speaks to the difficulties of assessing materials in the web archive.

  9. See Brookman et al., 2017; Estrada-Jiménez et al., 2017; Falahrastegar et al., 2014; Gerlitz & Helmond, 2013; Gomez et al., 2009; Krishnamurthy & Wills, 2006, 2009; Macbeth, 2017; Malandrino et al., 2013; Mayer & Mitchell, 2012; Olejnik et al., 2013; Ruohonen & Leppänen, 2018; Schelter & Kunegis, 2016, 2018; Trevisan et al., 2019.

References

  • Acar, G. (2017). Online Tracking Technologies and Web Privacy. Doctoral Thesis, KU Leuven. Retrieved 13.09.2020 from: https://lirias.kuleuven.be/handle/123456789/580934.

  • Altaweel, I., Good, N., & Hoofnagle, C. (2015). Web Privacy Census. Technology Science. 2015121502. https://techscience.org/a/2015121502/.

  • Brookman, J., Rouge, P., Alva, A., & Yeung, C. (2017). Cross-device tracking: Measurement and disclosures. Proceedings on Privacy Enhancing Technologies, 2017(2), 133–148. https://doi.org/10.1515/popets-2017-0020.

    Article  Google Scholar 

  • Brügger, N. (2018). The archived web: Doing history in the digital age. MIT Press.

  • Brügger, N., Laursen, D., & Nielsen, J. (2019). Establishing a corpus of the archived web: The case of the Danish web from 2005 to 2015. In N. Brügger & D. Laursen (Eds.), The historical web and digital humanities: The case of national web domains (pp. 124–142). Routledge.

  • Brügger, N., Nielsen, J. & Laursen, D. (2020). Big data experiments with the archived Web: Methodological reflections on studying the development of a nation’s Web. FirstMonday, 25(3). https://doi.org/10.5210/fm.v25i3.10384.

  • Dougherty, M., & Schneider, S. M. (2011). Web historiography and the emergence of new archival forms. In D. W. Park, N. W. Jankowski, & S. Jones (Eds.), The long history of new media : technology, historiography, and contextualizing newness (pp. 253–266). Peter Lang.

  • Englehardt, S., & Narayanan, A. (2016). Online Tracking: A 1-million-site Measurement and Analysis (pp. 1388–1401). Presented at the the 2016 ACM SIGSAC Conference, ACM Press. https://doi.org/10.1145/2663716.2663719

  • Estrada-Jiménez, J., Parra-Arnau, J., Rodríguez-Hoyos, A., & Forné, J. (2017). Online advertising: Analysis of privacy threats and protection approaches. Computer Communications, 100, 32–51. https://doi.org/10.1016/j.comcom.2016.12.016.

    Article  Google Scholar 

  • Falahrastegar, M., Haddadi, H., Uhlig, S., & Mortier, R. (2014). Anatomy of the Third-Party Web Tracking Ecosystem. arXiv.org.

  • Gerlitz, C., & Helmond, A. (2013). The like economy: Social buttons and the data-intensive web. New Media & Society, 15(8), 1348–1365.

    Article  Google Scholar 

  • Gomez, J., Pinnick, T., & Soltani, A. (2009). KnowPrivacy. knowprivacy.org. UC Berkeley, School of Information.

  • Helmond, A. (2017). Historical website ecology: Analyzing past states of the web using archived source code. In N. Brûgger (Ed.), Web 25: Histories from the first 25 years of the World Wide Web (pp. 139–155). Peter Lang.

  • Kimpton, M., & Dubois, J. (2006). Year-by-year: From an archive of the internet to an archive on the internet. In J. Masanes (Ed.), Web Archiving (pp. 201–212). Springer.

  • Krishnamurthy, B., & Wills, C. E. (2006). Generating a Privacy Footprint on the Internet. Presented at the IMC.

  • Krishnamurthy, B., & Wills, C. E. (2009). Privacy diffusion on the web - a longitudinal perspective. Presented at the World Wide Web Conference.

  • Laursen, D., & Møldrup-Dalum, P. (2017). Looking back, looking forward: 10 years of development to collect, preserve, and access the Danish web. In N. Brügger (Ed.), Web 25: Histories from the first 25 years of the world wide web (pp. 207–227). Peter Lang.

  • Lerner, A., Simpson, A. K., Kohno, T., & Roesner, F. (2016). Internet Jones and the Raiders of the Lost Trackers: An archaeological study of web tracking from 1996 to 2016. Presented at the 25th USENIX Security Symposium.

  • Li, T.-C., Hang, H., Faloutsos, M., & Efstathopoulos, P. (2015). TrackAdvisor - Taking Back Browsing Privacy from Third-Party Trackers. Presented at the Passive and Active Measurement Conference.

  • Libert, T. (2015). Exposing the invisible web: An analysis of third-party HTTP requests on 1 million websites. International Journal of Communication, 9, 18.

    Google Scholar 

  • Macbeth, S. (2017). Tracking the Trackers: Analysing the global tracking landscape with GhostRank. Retrieved October 3, 2018, from https://www.ghostery.com/wp-content/themes/ghostery/images/campaigns/tracker-study/Ghostery_Study_-_Tracking_the_Trackers.pdf.

  • Malandrino, D., Petta, A., Scarano, V., Serra, L., Spinelli, R., & Krishnamurthy, B. (2013). Privacy awareness about information leakage - who knows what about me? Presented WPES’13.

  • Masanes, J. (2006). Web archiving. Springer.

  • Mayer, J. R., & Mitchell, J. C. (2012). Third-Party Web Tracking: Policy and Technology (pp. 413–427). Presented at the 2012 IEEE Symposium on Security and Privacy (SP). https://doi.org/10.1109/SP.2012.47.

  • Moretti, F. (2000). Conjectures on world literature. New Left Review, 1(Jan.-Feb.), 56–58.

  • Nielsen, J. (2018). Recording the web. In A. Romele & E. Terrone (Eds.), Towards a philosophy of digital media (pp. 51–76). Palgrave Macmillan.

  • Nielsen, J. (2019). Experimenting with computational methods for large-scale studies of tracking technologies in web archives. Internet Histories: Digital Technology, Culture and Society, 3(3–4), 293–315. https://doi.org/10.1080/24701475.2019.1671074.

  • Nielsen, J. (forthcoming). Quantitative approaches to the Danish web archive. In Gomes, D., E. Demidova, J. Winters & T. Risse (eds.), The past web: exploring web archives. Springer.

  • Olejnik, L., Minh-Dung, T., & Castelluccia, C. (2013). Selling Off Privacy at Auction. Hal.Inria.Fr.

  • Owens, T., & Thomas, G. H. (2019). The invention and dissemination of the spacer gif: Implications for the future of access and use of web archives. International Journal of Digital Humanities, 1(1), 71–84. https://doi.org/10.1007/s42803-019-00006-8.

    Article  Google Scholar 

  • PC Virus Cleaner (2015). Tips to Remove Ak2.imgaft.com- Removal Guide. Retrieved 14.09.2020 from http://pcvirusescleaner.blogspot.com/2015/10/tips-to-remove-ak2imgaftcom-removal.html.

  • Rogers, R. (2009). Google and the Politics of Tabs . Govcom.org, Amsterdam. Retrieved 14.09.2020 from: https://www.youtube.com/watch?v=nUrRz6Ejg-o.

  • Rogers, R. (2013). Digital methods. MIT Press.

  • Rogers, R. (2017). Doing web history with the internet archive: Screencast documentaries. Internet Histories, 1(1–2), 160–172. https://doi.org/10.1080/24701475.2017.1307542.

    Article  Google Scholar 

  • Ruohonen, J., & Leppänen, V. (2018). Invisible Pixels Are Dead, Long Live Invisible Pixels! (pp. 28–32). Presented at the Proceedings of the 2018 Workshop on Privacy in the Electronic Society, ACM. https://doi.org/10.1145/3267323.3268950.

  • Schelter, S., & Kunegis, J. (2018). On the ubiquity of web tracking: Insights from a billion-page web crawl. Journal of Web Science, 4, 53–66.

  • Schelter, S., & Kunegis, J. (2016). Tracking the Trackers - A Large-Scale Analysis of Embedded Web Trackers. Presented at Tenth International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13024.

  • Schneider, S. M., & Foot, K. A. (2004). The web as an object of study. New Media Society, 6(1), 114–122. https://doi.org/10.1177/1461444804039912.

    Article  Google Scholar 

  • Schostag, S., & Fønss-Jørgensen, E. (2012). Webarchiving: Legal deposit of internet in Denmark. A curatorial perspective. Microform & Digitization Review, 41(3–4), 110–120. https://doi.org/10.1515/mir-2012-0018.

    Article  Google Scholar 

  • Trevisan, M., Traverso, S., Bassi, E., & Mellia, M. (2019). 4 years of EU cookie law - results and lessons learned. Proceedings on Privacy Enhancing Technologies, 2, 126–145.

    Article  Google Scholar 

Web archive references

    The following references list the domain, the date of harvest, and the unique identification number in Netarkivet:

    • adform.net, 29.07.2009, 52214–100–20090729185723-00135-kb-prod-har-004 kb.dk arc.gz/16216639.

    • chart.dk, 01.01.2006, 2873–25–20060101013022-00110-kb-prod-har-001.kb.dk.arc.gz/17272613.

    • chart.dk, 02.07.2007, 13946–38–20070302092854-00148-sb-prod-har-002.statsbiblioteket.dk.arc.gz/540406.

    • cmsstats.com, 14.01.2014, 198220–203–2011401114110058-00000-sb-prod-har-005.statsbiblioteket.warc.gz/9186625.

    • extremetracking.com/?reg, 23.07.2009, 51645–100–20090723011652-001188-sb-prod-har-004.arc"gz/62161493.

    • quantcast.com, 20.01.2011, 106159–123–20110120095603-00096-kb-prod-har-004.kb.dk.arc.gz/352401.

    • zipstat.dk, 03.01.2006, 2905–25–20060103145121-00060-kb-prod-har-001 kb.dk arc.gz/56467885.

    Download references

    Acknowledgements

    The author wishes to thank Asger Askov Blekinge, The Royal Danish Library, and Ulrich Karstoft Have, NetLab/ DIGHUMLAB, Aarhus University, for providing indispensable expertise in the use of Netarkivet and the DeiC National Cultural Heritage Cluster. Thanks also to The DeiC National Cultural Heritage Cluster, NetLab/DIGHUMLAB, Aarhus University, and The Center for Humanities Computing Aarhus, Aarhus University.

    Funding

    Part of this research was supported by the DeiC National Cultural Heritage Cluster, The Royal Danish Library and NetLab/DIGHUMLAB, Aarhus University.

    Author information

    Authors and Affiliations

    Authors

    Corresponding author

    Correspondence to Janne Nielsen.

    Ethics declarations

    Conflicts of interest/Competing interests

    Not applicable.

    Rights and permissions

    Reprints and permissions

    About this article

    Check for updates. Verify currency and authenticity via CrossMark

    Cite this article

    Nielsen, J. Using mixed methods to study the historical use of web beacons in web tracking. Int J Digit Humanities 2, 65–88 (2021). https://doi.org/10.1007/s42803-021-00033-4

    Download citation

    • Received:

    • Accepted:

    • Published:

    • Issue Date:

    • DOI: https://doi.org/10.1007/s42803-021-00033-4

    Keywords

    Navigation