In the previous section I have argued that big data companies must presuppose a ‘finders, keepers’ ethic to explain why their appropriation of the new, valuable insights they manage to extract out of existing (personal) data can be seen as legitimate. In this section I will describe, in a very tentative manner, three assumptions of the ‘finders, keepers’ ethic that are especially problematic in big data contexts: (1) the presumed ‘divisibility’ of personal data; (2) the legitimacy of the original acquisition of personal data; and (3) the historical conception of justice that underlies finders–keepers. All three assumptions are problematic due to their insensitivity to the specificity of what kind of things personal data are and the functioning of personal data in big data contexts. As the discussion of these problematic assumptions shows, explicating the normative basis of big data entrepreneurship allows for new types of critique on the conduct of big data companies.
Divisibility of personal data
As I have argued, the ‘finders, keepers’ ethic depends on the idea that within the same goods, some of the properties can be owned by the original holder, while other properties, namely those allowing for applications the original holder is not explicitly aware of, are unheld at the very same time and can thus, after discovery, be appropriated by the finder-creator. This introduces a certain kind of divisibility to goods which is necessary for finders-keepers to function adequately. In the case of inanimate objects this theory may be plausible—although even in those cases the divisibility of objects might feel highly artificial. But even if we assume, for the sake of argument, that this divisibility is plausible and accepted by everyone in the case of inanimate objects, it still does not follow that it is, by extension, equally plausible to think of personal data in a similar fashion. Granted, we often do speak of personal data as something—a resource, a thing—that can be owned, but does that automatically mean that personal data are to be understood as nothing more than inanimate objects?
I believe that the relationship between a person and her data is not exactly the same as the relationship between a person and a quotidian object (a phone, an orange, etc.) she owns. Floridi expresses this suspicion very accurately:
[O]ne may still argue that an agent “owns” his or her information, […] in the precise sense in which an agent is her or his information. “My” in “my information” is not the same “my” as in “my car” but rather the same “my” as in “my body” or “my feelings”: it expresses a sense of constitutive belonging, not of external ownership, a sense in which my body, my feelings and my information are part of me but are not my (legal) possession (Floridi 2005: 195).
If we understand the relation between an individual and her personal data the way Floridi does, it becomes immediately clear that it is far from unproblematic to conceive of personal data as if they were like oranges and orange juice. If Floridi’s understanding of personal data is plausible, and I believe it is, then it can explain why the idea of divisibility—something finders–keepers needs—is much less convincing in relation to personal data than it is in relation to inanimate objects. Floridi notes that the ‘my’ in ‘my information’ “expresses a sense of constitutive belonging” (Floridi 2005: 195). This remark expresses the idea that your identity as a person is always necessarily constituted—at least partly—by your information (either information about you, or information that you happen to ‘possess’), seeing the person “as an informational entity” (Floridi 2005: 194) or “the nature of a person as being constituted by that person’s information” (Floridi 2005: 195). This, in effect, means that unwanted meddling with one’s personal data constitutes “changes in one’s own identity as an informational entity” (Floridi 2005: 195). Based on Floridi’s characterization of personal data, one could argue that thinking about personal data exactly like one thinks of oranges is to make a category mistake.
Footnote 14 As an effect, additional arguments are needed to extend this idea of divisibility from inanimate objects to personal data.
At this point, the objection might be raised that big data analyses do not even need personal data to be effective. Completely anonymized data can also do the trick in some instances. If this is the case, the objector could claim that my argument, which is based on the ‘specialness’ of personal data, fails. In response, I would like to draw attention to different ways to define and understand the term ‘personal data’. An often-used definition is the one found in the European Union’s Data Protection Directive (95/46/EC), namely “any information relating to an identified or identifiable natural person” (article 2 (a)). This definition hinges on the question whether a piece of information or data can be explicitly related back to a person. If this standard definition is adopted, my argument my indeed seem dubious. However, in light of Floridi’s remarks and big data’s ability to generate inherently unpredictable outcomes that can influence the standing of data subjects significantly, I would like to suggest that a broader notion of personal data is appropriate. Even data that cannot be directly related to natural persons can be used, in big data contexts, to generate insights that can nonetheless have a significant impact on the lives and self-understanding of persons. Think for instance of discriminatory targeting practices as described by Turow (2011) that need not necessarily be based on personal data in the legal sense of the word to still have those discriminatory effects. I want to propose that in those cases where, legally speaking, anonymized and therefore non-personal data are used, there is still something personal about the data in a moral sense. Because these data can still have a significant influence on the lives and self-understandings of persons and are, seen from this perspective, still constitutive of personhood, I believe it makes sense to say that these data are still, in a moral sense, personal. As a result, it is still unconvincing to assume, without argument, that these data can be treated as just any quotidian object.
Acquisition of personal data
The acquisition of personal data—understood in the broader sense advocated above—by big data companies has not been problematized thus far. It has simply been assumed that big data companies acquire personal data in a just manner on the market, by way of transactions based on mutual consent. The idea that personal data are usually acquired in a just manner by big data companies because individuals consent to it may seem plausible. In reality, however, this position is quite hard to maintain. The idea that these transactions of personal data are based on informed consent, and that this informed consent is truly informed consent, is not very convincing in the face of the apparent failures of the informed consent model.
Zuiderveen Borgesius (2014) investigates the actual functioning of the informed consent model for the placement of cookies on computers and concludes that informed consent mechanisms are not strong enough to protect individuals. In a similar vein, Hoofnagle and Urban (2014) contend that informed consent mechanisms assume man to be a pure homo economicus: “Companies, long encouraged by regulators, issue privacy policies for consumers to read and act upon. In theory, consumers read these notices and make decisions according to their overall preferences, including preferences about privacy, price, service offering, and other attributes” (Hoofnagle and Urban 2014: 261–262). But for informed consent to function properly, this model of man as a perfect homo economicus must be somewhat adequate, and it is far from obvious that it is. Solove (2013: 1883) calls this informed consent based approach ‘privacy self-management’ and states that “empirical evidence and social science literature demonstrates that people’s actual ability to make such informed and rational decisions does not even come close to the vision contemplated by privacy self-management”.
The problems of the informed consent model can potentially erode the legitimacy of the original acquisition of the personal data that are used by big data companies. This, in turn, raises the question whether the appropriation of newly mined insights can be just if the data entrepreneurs work with to generate these insights were not acquired justly.
Historical conception of justice
Finders–keepers presupposes a historical conception of justice (Kirzner 1978: 9). A clear formulation of this historical conception can be found in Nozick (1974: 151–153). A historical conception of justice evaluates outcomes by focusing exclusively on two questions: (1) was the original acquisition just, and (2) were all the subsequent transfers just. If both conditions are satisfied, then outcomes must necessarily be just. As an effect, outcomes cannot be evaluated in their own right.
This conception of justice is problematic in big data contexts since an exclusive focus on the original acquisition of data and the subsequent transfers of data does not allow us to deal adequately with the challenges big data presents us with. To see why, one should notice that the current model of “[p]rivacy self-management addresses privacy in a series of isolated transactions guided by particular individuals. Privacy costs and benefits, however, are more appropriately assessed cumulatively and holistically” (Solove 2013: 1881). As was already shown, one of the unique aspects of big data is that outcomes are inherently unpredictable. Therefore, an exclusive focus on individual transactions, without focus on the actual aggregated outcomes these transactions can lead to, will necessarily miss something important. An historical conception of justice neglects data subjects’ structural inability—and the general impossibility—to foresee the future outcomes of data mining. Not being able to evaluate these unpredictable outcomes in their own right is a serious problem for any analysis of big data that wants to focus on the desirability of certain applications and their outcomes.