Skip to main content

Invisible market for online personal data: An examination


Despite the widespread knowledge that corporations collect and exchange user online personal data (OPD) between themselves in a market for OPD, there have been few attempts to systematically understand the nature and structure of these markets or answer basic questions about the behavior of parties in these markets. This paper addresses these questions using records of data sharing behavior by 218 websites across eight economic sectors. Two datasets, collected 4 years apart, are analyzed using social network analysis (SNA). Findings indicate linear preferential attachment is the most likely coordinating mechanism in the OPD market. Further, this market has a much higher number of brokers (intermediary corporations that facilitate exchange between other corporations) than comparable markets. Building on these findings, implications for research and practice are presented along with future research directions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. Acar, G., Eubank, C., Englehardt, S., Juarez, M., Narayanan, A., & Diaz, C. (2014). The web never forgets: Persistent tracking mechanisms in the wild. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 674–689. .

  2. Access Now. (2015). The rise of mobile tracking headers: How telcos around the world are threatening your privacy. Retrieved from

  3. Achrol, R. S. (1997). Changes in the theory of interorganizational relations in marketing: Toward a network paradigm. Journal of the Academy of Marketing Science, 25(1), 56–71. .

    Article  Google Scholar 

  4. Achrol, R. S., & Kotler, P. (1999). Marketing in the network economy. The Journal of Marketing, 63, 146–163. .

    Article  Google Scholar 

  5. Achrol, R. S., & Kotler, P. (2012). Frontiers of the marketing paradigm in the third millennium. Journal of the Academy of Marketing Science, 40(1), 35–52. .

    Article  Google Scholar 

  6. Agarwal, L., Shrivastava, N., Jaiswal, S., & Panjwani, S. (2013). Do not embarrass: Re-examining user concerns for online tracking and advertising. Proceedings of the Ninth Symposium on Usable Privacy and Security, 1–13. .

  7. Albert, R., Jeong, H., & Barabási, A.-L. (2000). Error and attack tolerance of complex networks. Nature, 406(6794), 378–382. .

    Article  Google Scholar 

  8. Bagley, A. W., & Brown, J. S. (2014). Consumer Legal Protections Against the Layers of Big Data. 2014 TPRC Conference Paper. .

  9. Bain, J. S. (1968). Industrial organization. John Wiley & Sons.

  10. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

  11. Barabási, A.-L., & Bonabeau, E. (2003). Scale-free networks. Scientific American, 288(5), 60–69.

  12. Barrat, A., Barthelemy, M., Pastor-Satorras, R., & Vespignani, A. (2004). The architecture of complex weighted networks. Proceedings of the National Academy of Sciences, 101(11), 3747–3752. .

    Article  Google Scholar 

  13. Bauch, A., & Superti-Furga, G. (2006). Charting protein complexes, signaling pathways, and networks in the immune system. Immunological Reviews, 210(1), 187–207. .

    Article  Google Scholar 

  14. Bearman, P. S., Moody, J., & Stovel, K. (2004). Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of Sociology, 110(1), 44–91. .

    Article  Google Scholar 

  15. Binns, R., Zhao, J., Kleek, M. V., & Shadbolt, N. (2018). Measuring third-party tracker power across web and mobile. ACM Transactions on Internet Technology (TOIT), 18(4), 1–22. .

    Article  Google Scholar 

  16. Bohn, D. (2020). Google to ‘phase out’ third-party cookies in chrome, but not for two years. The Verge. Retrieved from

  17. Boss, M., Elsinger, H., Summer, M., & Thurner, S. (2004). Network topology of the interbank market. Quantitative Finance, 4(6), 677–684. .

    Article  Google Scholar 

  18. Burt, R. S., & Merluzzi, J. (2014). Embedded brokerage: Hubs versus locals. Contemporary Perspectives on Organizational Social Networks, 40, 161–177.

  19. Butts, C. T. (2010). Tools for social network analysis. R Package Version, 2.

  20. Cadogan, R. A. (2004). An imbalance of power: The readability of internet privacy policies. Journal of Business & Economics Research (JBER), 2(3). .

  21. Chakrabarti, D., Faloutsos, C., & McGlohon, M. (2010). Graph mining: Laws and Generators. In C. C. Aggarwal & H. Wang (Eds.), Managing and Mining Graph Data, 40 (69–123). Springer, Boston, MA.

  22. Chatterjee, D., & Ravichandran, T. (2004). Beyond exchange models: Understanding the structure of B2B information systems. Information Systems and e-Business Management, 2(2–3), 169–186. .

    Article  Google Scholar 

  23. Chen, P., & Wu, S. (2013). The impact and implications of on-demand services on market structure. Information Systems Research, 24(3), 750–767. .

    Article  Google Scholar 

  24. Chircu, A. M., & Kauffman, R. J. (1999). Strategies for internet middlemen in the intermediation/disintermediation/reintermediation cycle. Electronic Markets, 9(1–2), 109–117. .

    Article  Google Scholar 

  25. Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703. .

    Article  Google Scholar 

  26. Clifton, J. A. (1977). Competition and the evolution of the capitalist mode of production. Cambridge Journal of Economics, 1(2), 137–151.

    Google Scholar 

  27. Coles, N. (2001). It’s not what you know—It’s who you know that counts. Analysing serious crime groups as social networks. British Journal of Criminology, 41(4), 580–594. .

    Article  Google Scholar 

  28. Comanor, W. S., & Wilson, T. A. (1972). Advertising market structure and performance. Journal of Reprints for Antitrust Law and Economics, 4, 25. .

    Article  Google Scholar 

  29. Cravens, D. W., Shipp, S. H., & Cravens, K. S. (1994). Reforming the traditional organization: The mandate for developing networks. Business Horizons, 37(4), 19–28. .

    Article  Google Scholar 

  30. Crona, B., & Bodin, Ö. (2006). What you know is who you know? Communication patterns among resource users as a prerequisite for co-management. Ecology and Society, 11(2), 7.

    Article  Google Scholar 

  31. Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695(5), 1–9.

    Google Scholar 

  32. Culnan, M. J. (1993). “ How did they get my name?”: An exploratory investigation of consumer attitudes toward secondary information use. MIS Quarterly, 17, 341–363. .

    Article  Google Scholar 

  33. Cummings, T., & Worley, C. (2014). Organization development and change (10th ed.). Cengage learning.

  34. Duhaime-Ross, A. (2014). Here’s how well Google’s search engine knows you. The Verge. Retrieved from

  35. Englehardt, S., & Narayanan, A. (2016). Online tracking: A 1-million-site measurement and analysis. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1388–1401. .

  36. Englehardt, S., Reisman, D., Eubank, C., Zimmerman, P., Mayer, J., Narayanan, A., & Felten, E. W. (2015). Cookies that give you away: The surveillance implications of web tracking. Proceedings of the 24th international conference on world wide web, 289–299. .

  37. Englehardt, S., Han, J., & Narayanan, A. (2018). I never signed up for this! Privacy implications of email tracking. Proceedings on Privacy Enhancing Technologies, 2018(1), 109–126. .

    Article  Google Scholar 

  38. Ermakova, T., Fabian, B., Bender, B., & Klimek, K. (2018). Web tracking—A literature review on the state of research. Proceedings of the 2018 Hawaii International Conference on System Sciences (HICSS)

  39. Fohlin, C., Gehrig, T., & Haas, M. (2016). Rumors and Runs in Opaque Markets: Evidence from Panic of 1907. CESifo Working Paper Series, 6048.

  40. Fouad, I., Bielova, N., Legout, A., & Sarafijanovic-Djukic, N. (2020). Missed by filter lists: Detecting unknown third-party trackers with invisible pixels. PETS 2020-20th Privacy Enhancing Technologies Symposium. .

  41. Gassmann, O., Daiber, M., & Enkel, E. (2011). The role of intermediaries in cross-industry innovation processes. R&D Management, 41(5), 457–469. .

    Article  Google Scholar 

  42. Gillespie, C. S. (2015). Fitting heavy tailed distributions: The poweRlaw package. Journal of Statistical Software, 64(2), 1–16. .

    Article  Google Scholar 

  43. Giustiziero, G., Somaya, D., & Wu, B. (2020). A Resource-based Theory of Hyperspecialization and Hyperscaling. Available at SSRN.

  44. Gould, R. V., & Fernandez, R. M. (1989). Structures of mediation: A formal approach to brokerage in transaction networks. Sociological Methodology, 19(1989), 89–126.

  45. Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380.

    Article  Google Scholar 

  46. Granovetter, M. (2005). The impact of social structure on economic outcomes. The Journal of Economic Perspectives, 19(1), 33–50.

    Article  Google Scholar 

  47. Greenstein, S. (2015). Behind the buzz of behavioral data. IEEE Micro, 35(2), 88–c3. .

    Article  Google Scholar 

  48. Grover, V., & Teng, J. T. (2001). E-commerce and the information market. Communications of the ACM, 44(4), 79–86. .

    Article  Google Scholar 

  49. Hahn, T. (2015). Cross-industry innovation processes: Strategic implications for telecommunication companies. Springer.

  50. Ham, C.-D., & Nelson, M. R. (2016). The role of persuasion knowledge, assessment of benefit and harm, and third-person perception in coping with online behavioral advertising. Computers in Human Behavior, 62, 689–702. .

    Article  Google Scholar 

  51. Hardy, Q. (2015). Using algorithms to determine character. New York Times. Retrieved from

  52. Helpman, E., & Krugman, P. R. (1985). Market structure and foreign trade: Increasing returns, imperfect competition, and the international economy. MIT Press.

  53. Helveston, M. N. (2014). Judicial deregulation of consumer markets. Cardozo Law Review, 36, 1739.

    Google Scholar 

  54. Helveston, M. N. (2018). Reining in commercial exploitation of consumer data symposium. Penn State Law Review, 123(3), 667–702.

    Google Scholar 

  55. Hong, W.-H., & Lee, D. (2018). Asymmetric pricing dynamics with market power: Investigating island data of the retail gasoline market. Empirical Economics, 58, 1–41. .

    Article  Google Scholar 

  56. Iacovou, G. (2019). How third party cookies could be putting your company at risk [Metomic.Io]. Explainer. Retrieved from

  57. ISBA, & PWC. (2020). ISBA programmatic supply chain transparency study. ISBA. Retrieved from

  58. Jakobi, T., von Grafenstein, M., Legner, C., Labadie, C., Mertens, P., Öksüz, A., & Stevens, G. (2020). The role of IS in the conflicting interests regarding GDPR. Business and Information Systems Engineering, 62, 261–272. .

    Article  Google Scholar 

  59. Johnson, G., & Shriver, S. (2019). Privacy & market concentration: Intended & unintended consequences of the GDPR. Available at SSRN.

  60. Joseph, S. (2020). “It is not a panacea”: Why log-level data hasn’t lived up to its promise for advertisers—Digiday. DigiDay.

  61. Karaj, A., Macbeth, S., Berson, R., & Pujol, J. M. (2018). Whotracks. Me: Monitoring the online tracking landscape at scale. ArXiv Preprint.

  62. Kessler, S. (2012). Google thinks I’m a middle-aged man. What about you? Mashable. Retrieved from

  63. Kim, H. J., Kim, I. M., Lee, Y., & Kahng, B. (2002). Scale-free network in stock markets. Journal of the Korean Physical Society, 40, 1105–1108.

    Google Scholar 

  64. Kluemper, D. H., Rosen, P. A., & Mossholder, K. W. (2012). Social networking websites, personality ratings, and the organizational context: More than meets the eye?1. Journal of Applied Social Psychology, 42(5), 1143–1172. .

    Article  Google Scholar 

  65. Kohavi, R., Rothleder, N. J., & Simoudis, E. (2002). Emerging trends in business analytics. Communications of the ACM, 45(8), 45–48. .

    Article  Google Scholar 

  66. Kunegis, J., Blattner, M., & Moser, C. (2013). Preferential attachment in online networks: Measurement and explanations. Proceedings of the 5th Annual ACM Web Science Conference, 205–214. .

  67. Libert, T. (2015). Exposing the invisible web: An analysis of third-party HTTP requests on 1 million websites. International Journal of Communication, 9, 18.

    Google Scholar 

  68. Linden, T., Khandelwal, R., Harkous, H., & Fawaz, K. (2020). The privacy policy landscape after the GDPR. Proceedings on Privacy Enhancing Technologies, 2020(1), 47–64. .

    Article  Google Scholar 

  69. Lobosco, K. (2013). Facebook friends could change your credit score. CNNMoney. Retrieved from

  70. Loury, G. C. (1979). Market structure and innovation. The Quarterly Journal of Economics, 93(3), 395–410. .

    Article  Google Scholar 

  71. Malthouse, E. C., Maslowska, E., & Franks, J. U. (2018). Understanding programmatic TV advertising. International Journal of Advertising, 37(5), 769–784. .

    Article  Google Scholar 

  72. Mayer, J. R., & Mitchell, J. C. (2012). Third-party web tracking: Policy and technology. 2012 IEEE Symposium on Security and Privacy, 413–427. .

  73. Merton, R. K. (1968). The Matthew effect in science: The reward and communication systems of science are considered. Science, 159(3810), 56–63. .

    Article  Google Scholar 

  74. Meyer, R. (2015). Could a Bank deny your loan based on your Facebook friends? The Atlantic. Retrieved from

  75. Momen, N., Hatamian, M., & Fritsch, L. (2019). Did app privacy improve after the GDPR? IEEE Security and Privacy, 17(6), 10–20. .

    Article  Google Scholar 

  76. Mortier, R. (2016). Tracking personal identifiers across the Web. In Passive and Active Measurement: 17th International Conference, PAM 2016, Proceedings, 9631, 30.

  77. Nasraoui, O., Cardona, C., Rojas, C., & Gonzalez, F. (2003). Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. Proc. of WebKDD, 71–81.

  78. Nunn, B. (2020). Baking Up New Strategies For A Post-Cookie World. Digital News Daily. Retrieved from

  79. Palvia, S., & Vemuri, V. (1998). The Impact of Electronic Commerce on Traditional Marketing Channels. AMCIS 1998 Proceedings, 150.

  80. Papadopoulos, P., Kourtellis, N., & Markatos, E. (2019). Cookie synchronization: Everything you always wanted to know but were afraid to ask. The World Wide Web Conference.

  81. Pasternack, A., & Melendez, S. (2019). Here are the data brokers quietly buying and selling your personal information. Fast Company. Retrieved from

  82. Pettersson, T. (2003). Ethnicity and violent crime: The ethnic structure of networks of youths suspected of violent offences in Stockholm. Journal of Scandinavian Studies in Criminology & Crime Prevention, 4(2), 143–161. .

    Article  Google Scholar 

  83. Picker, R. C. (2009). Online advertising, identity and privacy. U of Chicago Law & Economics, Olin Working Paper, 475.

  84. Piskorski, M. J. (2004). Networks of power and status: Reciprocity in venture capital syndicates. WorkingPaper, Harvard Business School.

  85. Podolny, J. M., & Baron, J. N. (1997). Resources and relationships: Social networks and mobility in the workplace. American Sociological Review, 62, 673–693. .

    Article  Google Scholar 

  86. Porter, M. E. (1979). The structure within industries and companies’ performance. The Review of Economics and Statistics, 61(2), 214–227. .

    Article  Google Scholar 

  87. Porter, M. E. (1989). How competitive forces shape strategy. In D. Asch & C. Bowman (Eds.), Readings in Strategic Management (pp. 133–143). Macmillan Education UK. .

  88. Ramachandran, J., Manikandan, K. S., & Pant, A. (2013). Why conglomerates thrive (outside the U.S.). Harvard Business Review, December 2013. Retrieved from

  89. Redman, T. C., & Waitman, R. M. (2020). Do you care about privacy as much as your customers do? Harvard Business Review. Retrieved from

  90. Rensmann, B., & Smits, M. (2008). Analyzing the added value of electronic intermediaries in the dutch health care sector. BLED 2008 Proceedings, 29.

  91. Rhoades, S. A. (1993). The herfindahl-hirschman index. Federal Reserve Bulletin, 79, 188–189.

    Google Scholar 

  92. Richmond, J. (1974). Estimating the efficiency of production. International Economic Review, 15(2), 515–521. .

    Article  Google Scholar 

  93. Rieke, A. (2014). Knowing the score: New report offers tour of financial data, underwriting, and marketing. Equal Future. Retrieved from

  94. Robinson, D., & Yu, H. (2014). Knowing the score: new data, underwriting, and marketing in the consumer credit marketplace. A guide for financial inclusion stakeholders. pp. 1–34.

  95. Rossignoli, C., & Ricciardi, F. (2015). Emerging business models in B2B research: Virtual organization and e-intermediaries. In C. Rossignoli & F. Ricciardi (Eds.), Inter-Organizational Relationships: Towards a Dynamic Model for Understanding Business Network Performance (pp. 77–95). Springer International Publishing. .

  96. Rubio-Campillo, X., Coto-Sarmiento, M., Pérez-Gonzalez, J., & Rodríguez, J. R. (2017). Bayesian analysis and free market trade within the Roman empire. Antiquity, 91(359), 1241–1252. .

    Article  Google Scholar 

  97. Ruffell, M., Hong, J. B., & Kim, D. S. (2015). Analyzing the effectiveness of privacy related add-Ons employed to thwart web based tracking. 2015 IEEE 21st Pacific Rim International Symposium On Dependable Computing (PRDC), 264–272.

  98. Ryan, R. (2013). Yes, employers will check your Facebook before offering you a job. The Huffington Post. Retrieved from

  99. Sakamoto, T., & Matsunaga, M. (2019). After GDPR, still tracking or not? Understanding opt-out states for online behavioral advertising. 2019 IEEE Security and Privacy Workshops (SPW), 92–99. .

  100. Sarkar, M. B., Butler, B., & Steinfield, C. (1995). Intermediaries and cybermediaries: A continuing role for mediating players in the electronic marketplace. Journal of Computer-Mediated Communication, 1(3), 1–14. .

    Article  Google Scholar 

  101. Scherer, F. M., & Ross, D. (1990). Industrial market structure and economic performance. University of Illinois at Urbana-Champaign’s Academy for Entrepreneurial Leadership Historical Research Reference in Entrepreneurship.

  102. Schneier, B. (2015). Data and goliath: The hidden battles to collect your data and control your world. WW Norton & Company.

  103. Sheridan, P., & Onodera, T. (2018). A preferential attachment paradox: How preferential attachment combines with growth to produce networks with log-normal in-degree distributions. Scientific Reports, 8(1), 2811. .

    Article  Google Scholar 

  104. Smith, W. P., & Kidder, D. L. (2010). You’ve been tagged! (then again, maybe not): Employers and Facebook. Business Horizons, 53(5), 491–499. .

    Article  Google Scholar 

  105. Snow, C. C. (1997). Twenty-first-century organizations: Implications for a new marketing paradigm. Journal of the Academy of Marketing Science, 25(1), 72–74. .

    Article  Google Scholar 

  106. Son, J.-Y., Kim, S. S., & Riggins, F. J. (2006). Consumer adoption of net-enabled infomediaries: Theoretical explanations and an empirical test. Journal of the Association for Information Systems, 7(7), 18. .

    Article  Google Scholar 

  107. Spiekermann, S., & Korunovska, J. (2017). Towards a value theory for personal data. Journal of Information Technology, 32(1), 62–84. .

    Article  Google Scholar 

  108. Spiekermann, S., Acquisti, A., Böhme, R., & Hui, K.-L. (2015a). The challenges of personal data markets and privacy. Electronic Markets, 25(2), 161–167. .

    Article  Google Scholar 

  109. Spiekermann, S., Böhme, R., Acquisti, A., & Hui, K.-L. (2015b). Personal data markets. Electronic Markets, 25(2), 91–93. .

    Article  Google Scholar 

  110. Stojanovic, L., Dinic, M., Stojanovic, N., & Stojadinovic, A. (2016). Big-data-driven anomaly detection in industry (4.0): An approach and a case study. 2016 IEEE international conference on big data (big data), 1647–1652. .

  111. Tanaka, H., & Kitayama, N. (2019). Japan’s DPA proposes amendments to APPI. IAPP.

  112. Tanner, A. (2017). The Gay Jewish Immigrant Whose Company Sells Your Medical Secrets. The Forward.

  113. Thitimajshima, W., Esichaikul, V., & Krairit, D. (2018). A framework to identify factors affecting the performance of third-party B2B e-marketplaces: A seller’s perspective. Electronic Markets, 28(2), 129–147. .

    Article  Google Scholar 

  114. Timmers, P. (1998). Business models for electronic markets. Electronic Markets, 8(2), 3–8. .

    Article  Google Scholar 

  115. Treber, S., & Lanza, G. (2018). Transparency in global production networks: Improving disruption management by increased information exchange. Procedia CIRP, 72, 898–903. .

    Article  Google Scholar 

  116. Ur, B., Leon, P. G., Cranor, L. F., Shay, R., & Wang, Y. (2012). Smart, useful, scary, creepy: Perceptions of online behavioral advertising. Proceedings of the Eighth Symposium on Usable Privacy and Security, 1–15. .

  117. Vallina-Rodriguez, N., Sundaresan, S., Kreibich, C., & Paxson, V. (2015). Header enrichment or ISP enrichment? Emerging privacy threats in mobile networks. Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization, 25–30. .

  118. Eijk, R. van, Asghari, H., Winter, P., & Narayanan, A. (2019). The impact of user location on cookie notices (inside and outside of the European union). Workshop on Technology and Consumer Protection (ConPro’19).

  119. Wachter, S. (2018). The GDPR and the internet of things: A three-step transparency model. Law, Innovation and Technology, 10(2), 266–294. .

    Article  Google Scholar 

  120. Wan, Y. (2015). The Matthew effect in social commerce. Electronic Markets, 25(4), 313–324. .

    Article  Google Scholar 

  121. Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.

  122. Wilkinson, I. (2001). A history of network and channels thinking in marketing in the 20th century. Australasian Marketing Journal; AMJ, 9(2), 23–52. .

    Article  Google Scholar 

  123. Yoo, B., Choudhary, V., & Mukhopadhyay, T. (2001). Neutral versus biased marketplaces: A comparison of electronic B2B marketplaces with different ownership structures. ICIS 2001 Proceedings, 15.

  124. Zahedi, F. M., & Song, J. (2008). Dynamics of trust revision: Using health infomediaries. Journal of Management Information Systems, 24(4), 225–248. .

    Article  Google Scholar 

  125. Zhang, M. (2010). Social network analysis: History, concepts, and research. In B. Furht (Ed.), Handbook of Social Network Technologies and Applications (pp. 3–21). Springer US. .

  126. Zhang, C., Bu, Y., Ding, Y., & Xu, J. (2018). Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology, 69(1), 72–86. .

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to David Agogo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

Dataset 1: Spring 2016

Lightbeam was created by Atul Varma, software developer at Mozilla and originally called Collusion. This application made it possible to create a visualization of the network of websites collecting data about one’s browsing behavior on each page they visit online. In February 2012, Mozilla CEO at the time, Gary Kovacs, spoke about Collusion in a TED talk leading to the plugin going viral. In September 2012, Mozilla along with faculty and student researchers at Emily Carr University of Art + Design extended this plugin and relaunched it as Lightbeam in 2013. This application was supported by the Ford Foundation and the Natural Sciences and Engineering Research Council (NSERC). The full reference for this application and its source code can be accessed from:

The data retrieved from Lightbeam has the following layout:

[source, target, timestamp, contentType, cookie, sourceVisited, secure, sourcePathDepth, sourceQueryDepth, sourceSub, targetSub, method, status, cacheable]
For instance, [““, ““, 1,456,366,106,722, “text\/html”, true, false, true, 1, 0, “www.”, “cm.g.”, “GET”, 204, true, false]

WHOIS Lookup is a query and response protocol that is used to query internet registry databases that store the registered users or assignees of an internet resource, such as a domain name, an IP address block or an autonomous system

Steps in creating the dataset:

  1. 1.

    Selected 8 different economic sectors

  2. 2.

    Identified the top 20–25 ranked websites in each sector using Alexa

  3. 3.

    Visited the homepage only of the top ranked websites

  4. 4.

    Save the data on websites that ‘talked’ to the visited page using Lightbeam

  5. 5.

    Retrieve the name of the corporation that owns each website in the dataset using a WHOIS look up tool.

Dataset 2 Spring 2020

OpenWpm is an automated web privacy measurement framework that makes it easy to collect data from thousands to millions of websites. It is built on top of Firefox and runs in a windowed or windowless state, crawling the provided list of websites automatically and according to configurations supplied. This tool is still in active development. The full reference for this application and its source code can be accessed from:

The data retrieved from OpenWpm is in form of an SQLite database with different tables. More information about the Http requests table can be found at this link:

Steps in creating the dataset:

  1. 1.

    Crawled websites from dataset 1 using OpenWpm script

  2. 2.

    Extracted the same information as used for dataset 1

  3. 3.

    Retrieve the name of the corporation that owns each website in the dataset using WHOIS records collected for dataset 1.

  4. 4.

    Updated WhoIS records for those websites that were new in this dataset.

Complete List of Websites Crawled.

Social Media

Appendix 2

A t-test was performed to compare the incidence of different forms of brokerage between the observed network and the commensurate random networks. The full results of this test by brokerage type, and for each observed network is shown below in Tables 6 and 7. Table 8 contains details of the companies with the most brokerage positions at both times

Table 6 T-tests comparing observed networks to random networks (Time 1)
Table 7 T-tests comparing observed networks to random networks (Time 2)
Table 8 Companies and the number of Brokerage positions occupied (Time 1)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Agogo, D. Invisible market for online personal data: An examination. Electron Markets (2020).

Download citation


  • Online personal data
  • Social network analysis
  • Cookie-syncing
  • Personal data markets
  • Third-party tracking

JEL classification

  • M15
  • L10