1 Introduction

One of the benefits of using big data in the health care sector is the prospect of improving the efficiency of service delivery.Footnote 1 The efficiency lies in the potential of analysing large volumes of data, which enables health care providers to monitor individuals or systems in real time.Footnote 2 However, there are concerns about sharing and reusing data in formats that include big data practices.Footnote 3 The barriers mostly stem from privacy and security regulations as well as legal concerns,Footnote 4 which can deepen the existing inequalities between peoples and countries.Footnote 5 Additionally, the main concern with regard to big data in biomedicine is how to overcome the barriers to sharing and reusing such data for health-related research.Footnote 6

To contextualise the above concerns, it is worth considering the nature of big data briefly. Although there is no precise definition of big data, its attributes are well documented in current literature. These are comprised of the large increase in the volume of data that can be generated and stored, the velocity with which data can be delivered to foster decision-making in real time, the variety of formats in which data can be adopted, the veracity or confidence level that is associated with certain types of dataFootnote 7 and the ability to extract value by “identifying what is valuable and then transforming and extracting [it] for analysis”.Footnote 8 Accordingly, the term is defined by reference to five attributes, namely volume, velocity, variety, veracity and value.Footnote 9

The attribute of variety, which highlights the diverse interests, is evident from the following sources of information that make up big data in health-related research: “electronic health care records, social media, patient summaries, genomic and pharmaceutical data, test results, claims, telemedicine, mobile apps, home monitoring, clinical trials, sensors and information on wellbeing, behaviour and socioeconomic indicators”.Footnote 10

The diverse sources of big data show how the line between health care and health research has become blurred.Footnote 11 The scope of data producers has equally expanded with the growing numbers of citizens and citizen scientists with access to mobile cellular networks.Footnote 12 These developments have led to proposals that advocate outright ownership of data by data subjectsFootnote 13 or monetisation of their access and control rights.Footnote 14 The proposals raise three fundamental questions; are data capable of being owned? Who owns data? And what is the basis of such ownership?

Different approaches to resolving the above questions have been indecisiveFootnote 15 and current literature has mostly focused on technical, privacy and security issues as the main challenges of implementing big data.Footnote 16 For example, Wyber and colleagues have observed that the field of big data “is fraught with ethical, regulatory and technological issues” and have accordingly called for a “move from a reactive model to a proactive, norm-forming approach” in global health governance.Footnote 17 Zwitter has also observed that “a rethinking in philosophy, professional ethics, policy-making, and research” is essential in the era of big data.Footnote 18 The European Commission has suggested steps that can be taken in this regard by calling for flexibility of stakeholders’ roles and responsibilities in order to avoid single actor responsibilities across the data value chain.Footnote 19 This paper focuses on how claims of data ownership are impacting data sharing and implementation of big data in health-related research. The point of departure is UNESCO’s observation that the concept of ownership is no longer an adequate normative framework in the era of big data.Footnote 20

The landscape of big data is still taking shape and this paper attempts to contribute to the much-needed legal and policy guidance by examining the fuzzy contours of data ownership and related intellectual property rights (IPRs), which have been cited as the main obstacles or possible solutions to data sharing. Notably, issues of data ownership remain mostly unresolved amidst calls for regulating data as property.Footnote 21 The central argument in this paper is that ownership is a concept that is ill-suited for governing rights in big data. The dawn of big data calls for an alternative normative framework. This framework must be capable of reconciling competing societal, individual and industries’ interests in the data with a view to ensuring fair access while minimising legal and ethical risks, as recommended by the OECD.Footnote 22 The ultimate aim of the paper is to propose a paradigm shift from ownership to custodianship in the governance of big data, particularly in international health-related research. The focus on international health-related research is warranted by the fact that the digital nature of big data has made research more globalised and collaborative, yet few countries have developed suitable policies or strategies to govern the use of big data in the health sector.Footnote 23

Issues that are related to access to information and data-owning companies’ concerns about disclosure of IPRs in data that they own have not been considered sufficiently.Footnote 24 In contrast, a lot of valuable research has been done on stakeholders’ perspective on data sharing of public health research by LMICs.Footnote 25 This paper seeks to fill the gap in the literature by focusing on two specific regulatory issues, namely ownership of data and related IPRs that data holders rely on to impede sharing and reuse of data. The second part provides an overview of the concerns in health-related research using big data as well as ownership and IPR issues that impede data sharing. The third part discusses the proposed paradigm shift from ownership to custodianship and paves the way for an explanation, in the fourth part, of the attributes of the proposed alternative normative framework for governing competing rights in big data.

2 Concerns in Health-related Research Using Big Data

The increased use of mobile devices and wearable or implanted devices and biosensors, which produce, collate and facilitate access to data have led to a wider circulation of digital health data.Footnote 26 The production and distribution of health data through these means effectively disrupt the conventional modes of data collection through established institutional channels such as hospitalsFootnote 27 and increase the number of people who are involved as research participants.Footnote 28 Health data may be accessed directly from devices or may be voluntarily shared by individuals on social media. The real-life data that are derived from these sources can be accessed and used for research in ways that raise ethical issues related to ownership and data access.Footnote 29

Before discussing the concerns in health-related research, it is important to clarify the different types of data sets and the interests that are at stake. The first set is real-life data, which can be obtained from wearable digital devices and social media platforms where individuals share their health data from these devices with their peers. In this context, device and social media users co-produce and circulate the data.Footnote 30 Three parties are typically interested in this type of data, namely device or platform users who share data with each other, researchers who use the data to support their theories, and businesses that have vested financial interests in analysing the data.Footnote 31 Notably, researchers and businesses that analyse the data can incur considerable investments in generating a second type of data set of high quality. From the sources of health data that were mentioned in the introduction to this paper, this second type of data set can consist of genomic, pharmaceutical data or clinical trial results. An interesting observation is that these types of data sets may be derived from individual level data with implications on the rights of data subjects. Both data sets are the objects of contested access and claims of ownership, particularly when producers of high quality data sets attempt to protect their interests through trade secrecy and mechanisms that can guarantee exclusive rights, as further explained in Sect. 2.3 of this paper.

Health-related research thrives on data sharing from diverse sources since it is highly interdisciplinary in nature.Footnote 32 Additionally, the research is mostly conducted in a globalised and collaborative context. In the era of big data, sharing of digital data occurs on a global scale. Data sharing is vital because data generators, analysts and researchers have to work together as a team for purposes of making appropriate use of big data.Footnote 33 The unique circumstances, particularly in LMICs that have implications on the uptake of big data in health-related research, are the large size of the population and the complexity of health care delivery, which have led to a gap between health care delivery and population health.Footnote 34 Wyber and colleagues argue that this gap may be bridged and health care outcomes can be improved using the big data approach.Footnote 35 This hope can only be realised if three concerns in health-related research using big data are addressed appropriately. These are: access to data, data subjects’ consent to data processing for research, and ownership claims that can impede data sharing.

2.1 Data Access

With the emergence of big data, the risk of “putting so much personal data in the hands of either companies or governments” is real and this can lead to misuse of such data.Footnote 36 One of the critical questions in this regard is access,Footnote 37 which is closely related to the data subjects’ control over who may access, use and share their information. Companies may also be interested in timely access and dissemination of the data in a manner that provides investment incentives to stakeholders such as firms that collect and process data.Footnote 38 Catering for these interests requires other mechanisms, such as legislative provisions, oversight mechanisms, and procedures for the use of health data,Footnote 39 to be in place to foster control. Accordingly, Pentland and colleagues have suggested ensuring data access as one of the means of supporting the development of big data health systems. In their view, this entails updating “privacy and data ownership policies to ensure that data are accessible to patients and their healthcare providers”.Footnote 40 The element of control should therefore be understood as a way of building trust in health-related research, thus encouraging data subjects to agree to their data being made accessible in a manner that safeguards their interests.

A qualitative study on five LMICsFootnote 41 established that while most stakeholders are open to sharing health research data, they have concerns about ownership and allowing free access to data. Additionally, data sharing is a challenge in LMICs and at a global level due to a lack of guidance and regulations.Footnote 42 It is worth noting that there are no “locally enforceable data protection rules and standards” in LMICs.Footnote 43 Nevertheless, stakeholders in these countries are comfortable to share de-identified data for academic and public health purposes so long as anonymity of the research participants’ personal information is guaranteed but not beyond these limits.Footnote 44 For instance, some stakeholders argue that “making data available [for re-use] actually demonstrates respect for the respondents, in that you care about what they’re saying, it’s not just something that you use and discard”.Footnote 45 This shows that making data accessible in a manner that respects the wishes of data subjects can build trust and encourage them to make their data accessible for health-related research. In this regard, Vayena and Blasimme have correctly argued that “the availability of data control – being a sign of respect for people’s interests – may promote rather than hinder the propensity to share data for health and health research related purposes”.Footnote 46

Data exportation and re-use for commercial purposes, on the other hand, are perceived as a threat to the local researchers and communities since there is no guarantee of local benefitsFootnote 47 and consequently a threat to the local researchers and participants.Footnote 48 The following statement from one of the stakeholders is very instructive on the issue of data sharing in a manner that is beneficial to the local researchers and communities:

there has to be a benefit sharing component that’s in the data sharing process and the benefit sharing has to be … done in a critical way where there is not just benefit for the investigator who is now going to have a patent and generating billions versus the community who’s still living in poverty.Footnote 49

An additional challenge that makes data exportation and re-use for commercial purposes to be perceived as a threat is a lack of legal capacity in LMIC-based research institutions to ensure that the agreements that they enter into are equitable enough to cover issues such as fair data ownership, IPRs and future benefit sharing.Footnote 50 This observation resonates with Indian stakeholders’ wish that recompense be expressed “more in terms of benefits to communities than in the form of acknowledgment or authorship”.Footnote 51

The urge to ensure benefits for data subjects can cause fear of loss of control, based on the inability to control the nature of use and beneficiation to the local communities by secondary end users. This concern has been expressed particularly when data are shared with developed country partners in circumstances where research data are handed over for rapid analysis in developed countries with technological and technical capacity.Footnote 52 In research involving big data, the concern also relates to the possibility of incidental findings, which may have limited clinical relevance due to the scale of the research.Footnote 53 The concern is essentially linked to the question of custodianship over data, which can be challenging when data are used on a global scale, thus making it difficult for data subjects to have control over their data.Footnote 54

Informed consent, as discussed in the next subsection of the paper, allows data subjects to maintain control over the use of their data. The conditions under which informed consent is given by data subjects usually enable them to determine whether or not their expectations and best interests are taken into account.Footnote 55 Furthermore, the digital world in which data are used presents threats of data subjects losing control over their data.Footnote 56 The era of big data thus makes it difficult for data subjects to foresee specific future uses and users, mostly due to the complex interrelationships between multiple and changing data sources.Footnote 57 Therefore, while consent may take care of concerns related to the nature of use to some extent, it does not sufficiently address the issue of beneficiation to the local communities by secondary end users. It is in this context that LMIC stakeholders have suggested that “additional regulations to protect the community’s interests should be applied to non-local data-access requests”.Footnote 58 In Vietnam, for instance, stakeholders have suggested that international data sharing policies, mostly developed by funders and publishers, “should not be imposed without consideration of local research culture, needs, and expectations”.Footnote 59

The above suggestions for averting a potential loss of control over data are difficult to implement outside the health care and research settings. For example, commercial health-related databases and data collected from social networking platforms or commercial apps that encourage data subjects to upload their data may be difficult to control.Footnote 60 Data subjects that upload data in this manner may include citizen scientists whose consent should determine the terms of using data.Footnote 61 However, the possibility of foreseeing and specifying terms of use through consent has become a challenge with advances in big data and citizen science,Footnote 62 which entails citizens becoming “experimenters, stakeholders and purveyors of data”.Footnote 63 These challenges have led to calls for a move beyond consent to a broader framework of accountability, which reckons with issues such as harm and risk assessment.Footnote 64

Recent developments have introduced a number of rights to ensure that data subjects maintain some control over their personal data. For example, the European Union’s (EU) General Data Protection Regulation (GDPR) provides for rights to access, rectification, erasure and data portability.Footnote 65 Notably, Recital 63 of the GDPR also protects third-party rights by specifying that data subjects’ rights to access should not adversely affect the rights of others to trade secrets or IP such as copyright that protects software. Recital 156 provides for derogations, inter alia, in the public interest, and for scientific or historical research purposes, provided that conditions and safeguards are in place to protect data subjects’ rights. In addition, data processing should be pursuant to proportionality and necessity principles. Data processing for scientific purposes should also comply with other legislation.

The right to portability entitles data subjects to receive their personal data “in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller”.Footnote 66 This has two important implications on access and sharing of health data. Firstly, data subjects are entitled to receive their complete records and, secondly, they can freely share or transfer the data to any person that they wish, thus fostering data interoperability, competition and accessibility.Footnote 67 If these effects are achieved in practice, then this requirement can address concerns about loss of control over data, thus facilitating sharing of data with data subjects’ health care providers.Footnote 68 Indeed there are settings that warrant imposing the duty to ensure data access and data interoperability.Footnote 69 Notably, health-related research warrants this approach due to the sensitivity of the data and the need to serve the common good as envisaged by the data subjects’ intention in consenting to the sharing of their data for research purposes.

The GDPR is an EU regulation without an automatic extraterritorial application. However, Art. 3 provides for three instances of extraterritorial application. Firstly, the regulation applies to controllers or processors that are established in the Union, irrespective of whether the processing takes place in the Union or not. Secondly, it applies to controllers or processors that are not established in the Union if the processing involves monitoring the behaviour of data subjects in the Union. Thirdly, it applies to controllers or processors that are not established in the Union if Member State law applies by virtue of public international law. Article 50 also provides for international cooperation between the EU and third countries in the enforcement of privacy laws. In this case, the GDPR would apply if the research is funded by the EU or if appropriate mechanisms are in place for international cooperation with third countries.

2.2 Data Subjects’ Informed Consent

Obtaining data subjects’ consent ensures the preservation of their autonomy to intervene in the decision-making process regarding the use of their data.Footnote 70 Consequently, attempts have been made to use informed consent in governing the use of data by specifying what should be done with individual research data and the terms of the contractual obligations that are considered “to guard privileged information on behalf of the research funder or sponsors”.Footnote 71

Using data from sources such as social media raises additional issues that underpin big data as a phenomenon. Boyd and Crawford, have accurately observed that social media users create their data in highly context-sensitive spaces and they are unlikely to permit their data to be used elsewhere; many of these users “are not aware of the multiplicity of agents and algorithms currently gathering and storing their data for future use”; researchers are rarely part of the users’ imagined audience; and “users are not necessarily aware of all the multiple uses, profits, and other gains that come from information they have posted”.Footnote 72 The two authors have raised the following pertinent questionsFootnote 73:

Should someone be included as a part of a large aggregate of data? What if someone’s “public” blog post is taken out of context and analyzed in a way that the author never imagined? What does it mean for someone to be spotlighted or to be analyzed without knowing it? Who is responsible for making certain that individuals and communities are not hurt by the research process? What does informed consent look like?

The above questions demonstrate that using data for health-related research requires the issue of informed consent to be addressed appropriately as provided in Art. 6 of the Universal Declaration on Bioethics and Human Rights (UDBHR), which requires scientific research to be carried out “with the prior, free, express and informed consent of the person concerned”.Footnote 74 At the point of obtaining informed consent from data subjects, they have the right to know if access to the data that they are contributing will be open or limited due to commercial reasons.Footnote 75 Providing such relevant information to data subjects allays their fears regarding the possible misuse of their data in the course of commercialisation, particularly in view of the fact that “data sharing is not yet commonplace and trust in such processes is established slowly”.Footnote 76 In the context of big data, there are calls for a suitable and dynamic model of informed consent that can facilitate access and respect of the data subjects’ autonomy,Footnote 77 mostly because health information privacy laws are rather permissive and patient-generated information is not governed by privacy laws.Footnote 78 Additionally, using technical measures such as anonymisation cannot always avert possible re-identification.Footnote 79

Considering the diverse sources of big data and the fact that the future utility of big data is usually uncertain at the point of obtaining informed consent, Mittelstadt and Floridi have correctly argued that consent cannot be truly informed because of the difficulties of predicting and informing the data subjects of the future uses and consequences of the data.Footnote 80 Evidently, the rapid production, collection, use and sharing of health-relevant data in the era of big data are challenging the extent to which informed consent can be used to preserve data subjects’ autonomy, thus calling for new models of consent in health-related research.

Three models of consent that have emerged in the context of big data are broad consent, opt-out consent and dynamic consent.Footnote 81 In broad consent, data subjects consent to “a range of possible research that could be done with [their] information in relation to a specific area or line of investigation”.Footnote 82 For broad consent to be valid, relevant forms of governance and safeguards, such as relevant review committees that ensure the protection of data subjects’ rights, must be in place.Footnote 83 The opt-out model assumes that data can be used unless the data subject has explicitly opted out. The challenge with this model is that data subjects may not be adequately informed of the terms of use, particularly in commercial data bases or the use of social media. The dynamic model allows data subjects to update their consent on an ongoing basis. Although this model has mostly been used in biobanking it can also be used in circumstances that entail multiple and varied uses of data, such as big data, where different kinds of consent may be required over time.Footnote 84 It provides an open communication process between data subjects and researchers thus ensuring that the evolving data subject preferences are taken into consideration in the adaptive process, which gives them more control over their data for the duration of the research.Footnote 85

Ideally, a suitable consent model should engage with the individuals beyond the point of data collection, thus allowing them to harness big data for their own personal use and to constrain unacceptable uses.Footnote 86 Such a model can only work within the context of custodianship, which recognises big data as a common good for the benefit of humankind. It thus entails granting individuals “meaningful rights to access their data in a usable, machine-readable format” while at the same time striking a delicate balance between providing insight into the decisional criteria of organisations that draw conclusions from personal information and the protection of IPRs.Footnote 87

The issue of consent, as highlighted above, shows that competing interests can be governed better if researchers and other stakeholders focus on acting ethically and are accountable in addressing each other’s concerns.Footnote 88 Therefore, issues of consent and accountability can be better governed through an alternative normative framework that takes into account the diverse stakeholders’ interests in the data.

2.3 Ownership of Big Data and Its Correlation with IPRs that Impede Data Sharing, Reuse and Accountability

The concept of ownership can refer to the right to “control” data or the right to “benefit from” data.Footnote 89 The right to control access to data directly affects data sharing and is closely linked to the conventional use of the term in IP law where claims to ownership of intellectual content must be recognised by law for the rights to be effective.Footnote 90 Notably, the terms “data controller” or processor are used in data protection law to denote, respectively, the person, entity or body that “determines the purposes and means of the processing of personal data” and “processes personal data on behalf of the controller”.Footnote 91 This, however, leaves the issue of ownership ill-defined.Footnote 92 The right to benefit from data relates more to custodial rights such that data custodians are expected to allow data subjects to access and utilise the data for their own benefit.Footnote 93 The first formulation (control of access) is accordingly used in this section of the paper and it should be understood as a controversial concept in the context of big data. This is because the underlying information or data, over which IP law seeks to create related property rights, are the building blocks of science and the results of multiple producers’ efforts.Footnote 94

So far, the issue of who owns data generally or even in the context of big data has not been explored sufficiently in current literature.Footnote 95 The complexity of the issue lies in the public-good character of such data and the diverse sources of information that constitute the data. This is not surprising since the main legal barrier to sharing data, which has been identified in current literature, is based on ownership of data insofar as those who “collect public health data are also often responsible for the protection of individual and community privacy and may feel that a guardianship or ownership role is bestowed on them by the public”.Footnote 96 To further illustrate the complexity of the issue, Burtscher and Fritz have likened big data to a block of marble that a number of cutters are working on such that “various legal concepts including data privacy, database rights, IP rights, antitrust law as well as the basic civil rights of ownership and possession are playing a role when dealing with the legal alien big data but are each only addressing bits of it”.Footnote 97 The two authors have accordingly wondered whether the concept of ownership correctly captures big data in legal terms. Hoeren has also observed that the question of how “new property right” in data fits into the existing property law framework remains to be solved.Footnote 98

The discussions in this section will demonstrate that granting property rights in data as such would be ill-advised and lacks a sound legal basis. Evidently, data are subject to access rights and restrictions but these are not property rights. For example, contracts or competition law may be used to regulate access to data but they do not create property rights.Footnote 99 These mechanisms of governing access to data are important because competitors are interested in using data to develop innovative services and products.Footnote 100

There have also been attempts to govern issues of ownership through informed consent in the mistaken belief that data subjects own their data as aptly stated below:

whether the informed consent form [authorises] the transfer of those rights from the participant to the investigator or the sponsor – if they [the participant] have not … agreed to transfer their rights of the data [then] neither the sponsor, nor the investigator, [nor] the collaborator outside of the institution can actually say that they own the data.Footnote 101

According to the above statement, the researchers “should not be viewed as owning the data but rather as having custodial responsibilities and rights over it”.Footnote 102 They are considered to have possession of the data but not the right of ownership. This point is developed further below, in discussing copyright protection of compilations and sui generis rights over non-original databases. A number of stakeholders in India have, for instance, suggested that ownership and authorship rights should be clearly stipulated in the data sharing agreements while others were of the view that the organisation that had collected the data, rather than the data subjects themselves, should own the data.Footnote 103

Burtscher and Fritz have correctly observed that “the legal discussion and legislation around allocation of ownership of anonymous data or at least of the right to use and to exclude others from using such data” is inconclusive.Footnote 104 Ekbia and colleagues have equally noted that the legal and ethical problems that changes in technology have brought to the fore have raised “new questions about the scope of individual privacy and the proper role of intellectual property protection”.Footnote 105

From the above highlights; it would appear that the prevailing view regarding ownership of data is that “the entity or individual controlling the production of anonymous data should be entitled to use them”.Footnote 106 Similarly, most LMIC stakeholders are of the view that funders “reserved the right to share data because [they] possessed the intellectual property rights for that data”.Footnote 107 This raises a fundamental question of what IP law protects in big data, particularly considering that the collection of data may not qualify for patent protection or even copyright protection if the information or data sets are presented in a factual manner. The compilation of big data would have to be original in the copyright law sense to qualify for protection. Ekbia and colleagues argue that “existing intellectual property laws may also need to be adapted in order to accommodate Big Data practices”.Footnote 108 In the paragraphs that follow, the contents of big data that may qualify for protection under IP law are discussed with a view to determining what is protected and if such protection impedes data sharing in health-related research and should accordingly be adapted to accommodate big data practices as suggested by Ekbia and colleagues. Reference will mostly be made to the international IP instruments, which set the minimum standards, and regional or national laws will only be mentioned for illustrative purposes since this paper focuses on international health-related research.

Five of the sources of information that make up the contents of big data in health-related research, as mentioned in the introduction to this paper,Footnote 109 may be protected through intellectual property law if they meet the requisite requirements. These are: compilations of electronic health records, patient summaries, genomic and pharmaceutical data, test results and mobile applications. Three possible avenues of protecting these contents are copyright, sui generis database rights and trade secrets. The relevance of these IPRs in the context of big data and their impact on data sharing are discussed below.

2.3.1 Protection of Compilations Through Copyright

Copyright law is one means of protecting creative original compilations of data. At the international level, original structures of databases are protected under Art. 2 of the Berne Convention,Footnote 110 Art. 5 of the World Intellectual Property Organisation’s (WIPO) Copyright TreatyFootnote 111 and Art. 10(2) of the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS). These international instruments protect compilations of data or other material in either machine readable format or other form. For such compilations of data or material to qualify for copyright protection, Art. 10(2) of the TRIPS Agreement provides that “the selection or arrangement of their contents [must] constitute intellectual creations”. The Article further stipulates the scope of rights in such compilations by explicitly providing that “such protection, which shall not extend to the data or material itself, shall be without prejudice to any copyright subsisting in the data or material itself”. Article 5 of the WIPO Copyright Treaty equally provides for the protection of compilations if the selection or arrangement of their contents constitute intellectual creations. The “protection does not extend to the data or the material itself and is without prejudice to any copyright subsisting in the data or material contained in the compilation”.Footnote 112

The fact that data as such are not protected through copyright has mistakenly been viewed as a shortcoming in copyright law.Footnote 113 Proposals have therefore been made for a new construct called “datarights” that can “be available to applicants who disclose clear and complete descriptions of their data collection and preparation methods alongside the data shaped by those methods”.Footnote 114 Datarights are intended to protect data that are “collected or manipulated according to one or more methods not readily apparent to a person of ordinary skill in the art” from unauthorised use for a limited period.Footnote 115 This subject matter essentially resembles the originality or creative skills requirement as it currently exists in the protection of original compilations through copyright except that it extends to the protection of data from unauthorised downstream use such as analysis. Notably, the proposed new construct would still leave the underlying data free for reproduction and redistribution by other stakeholders unless barred through contracts.Footnote 116 Consequently, this does not solve the alleged shortcoming in copyright law. In addition, proponents of this construct have conceded that it would not be effective in encouraging disclosure of big data practices where there are concerns related to privacy and commercial interests.Footnote 117

Another source of confusion is the “without prejudice” clause as used in the latter parts of Art. 10(2) of the TRIPS Agreement and Art. 5 of the WIPO Copyright Treaty. The confusion arises from the fact that data or material itself, as already indicated in the first parts, is not protectable through copyright. Consequently, the clause “without prejudice to any copyright subsisting in the data or material contained in the compilation” may give the wrong impression that underlying data or material, as contained in the compilation, are protected by copyright.Footnote 118 Such misinterpretation contradicts the established copyright protection of literary and artistic works under the Berne Convention and is likely to be used to impede data sharing in the mistaken belief that data as such are capable of being subject to proprietary rights.

The protection of compilations has to be interpreted with reference to Art. 2(1) of the Berne Convention, which requires the exercise of skills in compiling the material. Accordingly, the clause should be interpreted as referring to copyright protection in literary and artistic works subsisting in the data that are included in the compilation.Footnote 119 This is, for example, clearly illustrated in the “without prejudice” wording of Art. 13 of the EU Database Protection Directive (discussed in more detail in the next section), which lists the possible rights that may subsist in the data that are included in the database.Footnote 120

Apart from originality, copyright law requires the subject matter to be expressed in material form. Article 10(2) of the TRIPS Agreement requires the data to be compiled in machine readable or other form. Notably, the format of big data involves dynamic data sets and uses cloud computing services, which technically makes it difficult to meet this requirement.

2.3.2 Sui Generis Protection of Non-original Databases

There is no obligation to protect non-original databases under Art. 10(2) of the TRIPS Agreement or Art. 5 of the WIPO Copyright Treaty. To date, there are no international norms on the protection of non-original databases.Footnote 121 This does not, however, imply that database rights are not valuable. Indeed, WIPO has acknowledged the importance of protecting databases for purposes of developing a global information infrastructure while at the same time ensuring the interests of users in having appropriate access.Footnote 122 The possibility of granting sui generis protection of databases that do not necessarily meet the threshold of originality in copyright law was introduced by the European Union during WIPO’s diplomatic conference.Footnote 123 The proposal was, however, not pursued further at WIPO.Footnote 124 The United States also considered enacting laws to protect non-original databases from misappropriation but due to the ensuing controversies during the congressional debates no laws were enacted.Footnote 125 Consequently, only EU Member States grant sui generis protection for non-original databases.

Sui generis database protection was created by the EU Database Protection Directive to protect the “substantial investment in either the obtaining, verification or presentation of the contents to prevent extraction and/or re-utilization of the whole or of a substantial part”.Footnote 126 This right is different from the copyright protection that the Directive grants for “databases which, by reason of the selection or arrangement of their contents, constitute the author’s own intellectual creation”.Footnote 127 This was emphasised in the case of Football Dataco Ltd and Others v Yahoo! UK Ltd and Others, where the European Court of Justice (ECJ) clarified that the purpose of the Database Directive is to “stimulate the creation of data storage and processing systems in order to contribute to the development of an information market … and not to protect the creation of materials capable of being collected in a database”.Footnote 128 The ECJ further clarified that the requirement of the author’s “own intellectual creation” for copyright protectionFootnote 129 refers to the criterion of originality.Footnote 130

The scope of the sui generis right in Art. 7(1) of the Directive and the meaning of “a substantial investment in either the obtaining, verification or presentation of the contents” of the database were decided by the ECJ in the British Horseracing Board and Fixtures Marketing cases.Footnote 131 The Court stated that: “Article 7(1) of the directive reserves the protection of the sui generis right to databases which meet a specific criterion, namely to those which show that there has been qualitatively and/or quantitatively a substantial investment in the obtaining, verification or presentation of their contents.”Footnote 132 This effectively excludes raw machine-generated databases and big data, which are typically drawn from multiple sources, from sui generis protection.Footnote 133 The ECJ also provided the following clarification regarding the expression “a substantial investment in either the obtaining, verification or presentation of the contents”:

[It has to] be understood to refer to the resources used to seek out existing independent materials and collect them in the database, and not to the resources used for the creation as such of independent materials. The purpose of the protection by the sui generis right provided for by the directive is to promote the establishment of storage and processing systems for existing information and not the creation of materials capable of being collected subsequently in a database.Footnote 134

Consequently, the holder of the sui generis right can prohibit the manufacture of competing parasitical products and any actions that can cause significant detriment to the investment.Footnote 135

The above clarifications evidently rule out any reliance on the sweat of the brow theory in granting sui generis rights in databases. As the European Commission correctly observed, sui generis rights are granted “to prevent misappropriation of the contents of a database in which there has been a substantial investment”.Footnote 136 A distinction must therefore be made between “the establishment of storage and processing systems for existing information” and “the creation of materials capable of being collected subsequently in a database”.Footnote 137 Investments in the establishment of the former are the object of sui generis rights, not the latter.Footnote 138 Evidently, sui generis rights do not give rise to new rights in the works, data or materials that are contained in the databases.Footnote 139 Accordingly, sui generis rights should not be equated to IPRs that can be relied on to impede data sharing, since data as such are not owned by the party who incurs expenses on the investments. The investments are incurred to ensure the reliability of the information contained in the database, monitor the accuracy of the materials collected when creating the database and during its operation.Footnote 140 This essentially means that the scope of the investments is limited to the creation of the database.Footnote 141

The effects of sui generis database rights on data sharing in health-related research should be considered in the context of a recent call, by UNESCO,Footnote 142 for data to be framed as a common good of humankind in line with Art. 2 of the Universal Declaration on Bioethics and Human Rights.Footnote 143 The Article requires the promotion of “equitable access to medical, scientific and technological developments as well as the greatest possible flow and the rapid sharing of knowledge concerning those developments and the sharing of benefits, with particular attention to the needs of developing countries”.Footnote 144Sui generis database rights may be used as a mechanism to impede the flow and rapid sharing of data that are contained in the protected database. This is the case because such rights are essentially used to control access to the data contained in the database such that any means of access that is considered to amount to misappropriation of the database is prohibited. This effect arises from the fact that sui generis database rights are modelled on laws that protect trade secrets or confidential information with a view to repressing any conduct that amounts to the “misappropriation” of an electronic database producer’s investment.Footnote 145

Notably, the European Commission has conceded that the wording that is used to describe the objectives of the Directive “suggests that the sui generis right may become a form of indirect property in data”.Footnote 146 Right holders can rely on such proprietary claims over databases to restrict access to data for anti-competitive reasons, thus restricting data flows artificially.Footnote 147 Consequently, the emerging trend of protecting non-original databases on the basis of substantial investment seems problematic due to over-protection in a research environment that is already facing challenges, particularly in LMICs where copyright protection of the database or the software that is required for the organisation, integration and analysis as well as production of data may require purchasing a licence, which is usually unaffordable for most LMICs.Footnote 148 Such protection essentially entails reliance on the substantial investment formula (protecting the value that is created in analysing the data),Footnote 149 which is admittedly very contentious in most jurisdictions since this leads to over-protection, thus restricting access to valuable information that is required for research and use by other interested stakeholders.Footnote 150 Additionally, such databases do not meet the threshold of originality, and factual information that is not original belongs to the intellectual commons, which should be accessed and used by interested stakeholders as appropriate.

A recent survey by the European Commission confirmed that sui generis rights have not achieved the intended purpose of incentivising the creation of databases; instead, they are mostly used in litigation when parties disagree.Footnote 151 Moreover, stakeholders from academia and research sectors indicated that the Directive did not achieve a balanced outcome in terms of safeguarding the legitimate interests of database makers and users.Footnote 152 The survey established that although there is no evidence that the sui generis regime itself leads to data lock-up, users found the licensing process complex due to additional layers of protection.Footnote 153 It also established that contract law is used to protect database owners’ rights in addition to sui generis rights, thus leading academics and researchers to experience contractual overrides to the exceptions that are provided for in the Directive.Footnote 154 Clearly, creating sui generis rights for non-original databases was unnecessary.Footnote 155 Unfortunately, sui generis database rights are bound to continue existing and being used to unreasonably impede data sharing.

2.3.3 Trade Secrets

The purpose of protecting trade secrets is not to encourage secrecy or create any intellectual property rights. They merely protect the data against unfair misappropriation.Footnote 156 The law therefore provides for the enforcement of trust relationships in this regard.Footnote 157 It is therefore not surprising that trade secret law has been labelled as “parasitic” because it relies on a host theory for normative support.Footnote 158 For example, it relies on other norms that are aimed at honouring contractual obligations and averting fraud for its existence.Footnote 159

Data holders have relied on secrecy to protect their interests in data.Footnote 160 Such secrecy is based on Art. 39 of the TRIPS Agreement, which protects undisclosed information against unfair competition. Admittedly, this approach is rather controversial in protecting data in the context of big data since it is akin to erecting digital barbed wire around data that many deserving stakeholders are entitled to access. Reliance on trade secrecy in this regard is contestable since a proper interpretation of Art. 39 of the TRIPS Agreement confirms that the protection only extends to competition that is contrary to honest commercial practices, which is not the case for health-related research stakeholders.Footnote 161 The Article protects undisclosed information that has commercial value due to its being kept secret, thus ensuring business integrity. Consequently, loss of secrecy automatically leads to non-protection.

Secrecy or keeping information confidential to avoid sharing it with other stakeholders in health-related research goes against the intention of data subjects who consent to their information being used for research. As already noted, under the discussion of consent, commercialisation of data is not viewed favourably by data subjects. Secrecy erodes trust and can lead to data subjects declining to give their consent for the use of their data, thus stifling research and innovation.

The rationale of protecting trade secrets lies in the fact that the underlying information is generally unknown.Footnote 162 In the context of big data, data holders share very limited information on how data are collected (the factors considered) and the inferences drawn from the data.Footnote 163 This is mostly because methods of data preparation are viewed as valuable trade secrets, which have competitive advantage.Footnote 164 Withholding such vital information impedes the prospects of reusing, sharing and repurposing the data in a meaningful way. Although, as already clarified above, secrecy does not create IPRs, it defeats the purpose of protecting IPRs, which is to encourage sharing and dissemination of information.Footnote 165 This has led authors to wonder whether the IP regime should be amended to address the issue of non-disclosure in big data.Footnote 166 Clearly the solution lies in an alternative framework, as proposed in the next section, rather than amending the IP regime.

Proponents of protection through trade secrecy have argued that “information-based processes that are not readily perceived by consumers are particularly well suited for trade secret protection”.Footnote 167 This simply reinforces the argument that trade secrets are not IPRs since processes are not creative or inventive content that are capable of being protected through IP law.Footnote 168 Additionally, in health-related research, processes are important for follow-on research. Consequently, failure to disclose such information hinders further research and makes the generated data worthless (without the processes and insights that are drawn in the course of data analysis).

Other alternative mechanisms for incentivising data holders to share high quality data that may be the subject of considerable investments are already in use and should be considered instead of trade secrets. For example, medicine regulatory authorities may rely on the data for abridged approval of similar products submitted by competitors without disclosing the data.Footnote 169 Another option is ensuring that data holders’ policies align with the FAIR data principles, namely making the data findable, accessible, interoperable and reusable.Footnote 170 A large network of international collaborators have developed FAIRsharing, which is an informative and educational resource that has adopted this approach for data management.Footnote 171

3 Paradigm Shift from Ownership to Custodianship

The discussions in the preceding part of this paper have demonstrated that claiming ownership rights over data is misconceived because no such rights, over data as such, exist in the IP regime. It is clear that where individuals or companies claim to own data, such claims are either based on misinterpreting the scope of rights under copyright protection of compilations and sui generis rights over non-original databases or they use secrecy to avoid sharing data so that they can erroneously rely on Art. 39 of the TRIPS Agreement, or they use contractual terms to create proprietary rights that have no basis in the intellectual property regime. This confirms that ownership is a concept that is ill-suited for governing competing rights in big data. This fact finds fortification in the concerns that have been raised by authors such as Ekbia and colleagues that the law has not developed any “principle to balance the competing interests of individuals, industries, and society as a whole in the burgeoning age of Big Data”.Footnote 172 The World Medical Association (WMA) has also urged relevant authorities to formulate policies and law that protect health data on the basis of the principles set forth in the Declaration of Taipei.Footnote 173 Custodianship is one of the principles of governance, which is stipulated in the Declaration.Footnote 174

Due to claims of ownership over data, sharing and re-use of data may be restricted entirely or privileged access may be granted for a fee, or small data sets may be offered to university-based researchers.Footnote 175 Such practices deepen inequalities based on privileged access, mostly because data-owning companies have total control over data and no responsibility to make their data available, nor accountability to data subjects to ensure that their data are used in a manner that does not lead to harm. The ethical and governance challenges that beset Iceland in 1998 are very instructive in this regard. Serious issues arose from the declaration of health records, which included health, genetic and genealogical data, as a national resource that was owned by the Icelandic government and could be made available to private industry without the consent of the individuals.Footnote 176 As a result of national and international opposition to the inappropriate manner in which the Icelandic government handled the issue of ownership of data, the project collapsed in 2003.Footnote 177

The IP regime is intended to stimulate creativity and not to protect non-proprietary matters such as underlying data or investments in creating databases as the current trend shows. As established in the previous part of the paper, data are not the object of monopoly rights under the IP regime since their generation is not the result of any creative endeavour. As the Hague Declaration on knowledge and discovery in the digital age succinctly puts it:Footnote 178 “Intellectual property was not designed to regulate the free flow of facts and ideas, but has as a key objective the promotion of research activity.… Licenses and contract terms should not restrict individuals from using facts, data and ideas.”

This declaration is in line with Art. 9 of the TRIPS Agreement, which provides that copyright protection extends to expressions and not ideas. It essentially means that the urge to tap the full potential of big data must at the same time be accompanied by respect for other users’ rights to access the information.

The legal framework in place mainly governs structured databases yet there is a massive amount of data that falls outside the scope of the current governance through ownership and IP. The reasons for this status quo are that current developments in big data have outpaced the existing legal frameworkFootnote 179 and big data practices do not fit within the frameworks of ownership and IP.Footnote 180 A paradigm shift from ownership to custodianship is warranted on two grounds: firstly, as already established in this paper, data are not the object of proprietary rights or ownership according to international IP law regime. Secondly, the emerging trends that lead to claims of ownership over data are based on flawed models and on implausible arguments. The first point is extensively discussed in the preceding section of this paper. Therefore, this section focuses on advancing the second point.

One reason that is often advanced for claiming ownership rights over data is that a company or individual may have extracted new insights from original data, thus creating a new data set, which they should own. Sax observes that this argument is modelled on a “finders, keepers” ethic without due regard for the potential impact of the insights on the lives of data subjects.Footnote 181 The argument that the data in question may not be personal, thus warranting their appropriation and use without the data subject’s consent may not be justifiable.Footnote 182 Additionally, as established under the discussion of copyright in compilations and sui generis database rights, investments in creating data are not taken into account in granting these types of proprietary rights in the IP regime. This confirms that there is no basis for claiming ownership rights in the newly created data set just because a company or individual has generated new insights from the original data.

Notably, all the five sources of big data that were highlighted in the introductory part of this paper are derived from personal information. Such information has accurately been described as expressing a sense of a person’s “constitutive belonging, not of external ownership”.Footnote 183 The criticism against viewing personal information through the lens of ownership further clarifies the constitutive nature of data such that it does not make sense to grant proprietary rights over it. The criticism, of relevance here, is the fact that one’s personal information can never be lost when it is acquired by someone else.Footnote 184 In the context of big data, Sax has observed that “data that cannot be directly related to natural persons can be used, in big data contexts, to generate insights that can nonetheless have a significant impact on the lives and self-understanding of persons”.Footnote 185 This observation is very instructive for appreciating that even the use of anonymised or de-identified data may be capable of re-identification due to the varying de-identification practices of data holders,Footnote 186 thus re-identifying the data subjects through the insights that are drawn from them.

Data subjects are entitled to informational privacy rights in their data. In essence, data subjects do not transfer their informational privacy rights to the parties who process their data. This approach can resolve the long-standing question of ownership of big data in health-related research. The position is that the data are not capable of being owned in the proprietary sense. The paradigm shift, which this paper advocates, entails recognising that researchers, commercial organisations and repositories that collect and process data have custodial rights and responsibilities in handling the data.Footnote 187 The only proprietary rights that are capable of being owned are original compilations and not the data as such. As already explained, related rights are protected through copyright in the original compilation of the data or sui generis rights over non-original databases and trade secrets. It then becomes clear why arguments to the effect that “clinical trials data … are the property of the sponsoring company”Footnote 188 are neither accurate nor sustainable.

Granting property rights over underlying data is incapable of resolving the concerns in health-related research using big data, which have been discussed in this paper. New property rights in data will further impede data sharing, thus leading all stakeholders to lay claims over data. Such new rights can even lead to the emergence of data trolls who demand ransoms and nuisance fees based on potential property rights in data.Footnote 189 It thus makes sense to advocate a normative framework that is based on custodianship to ensure better accountability among stakeholders.

Custodianship has accurately been defined as “the responsibility for the safety and well-being of someone or something and represents ethical values like care, custody, … protection and trust to the guardianship or the safekeeping”.Footnote 190 It is suitable for ensuring access to data and promoting fair data sharing practices while safeguarding data subjects’ informational privacy at the same time. Appropriate custodianship of big data is necessary to ensure that data subjects maintain some control over access and future uses of their data while delegating decision-making in some matters to the data custodians. Such delegated decision-making gives rise to custodial rights, not ownership of the data.

So far, custodianship has been used as an ethical framework for ensuring shared accountability among all stakeholders involved in biospecimen-based research.Footnote 191 Although biospecimen-based research is significantly different from big data research, they have a common attribute in terms of claims of ownership that impede sharing of biospecimens or data, respectively. The salient feature of the framework is that it is based on ethical instead of “strictly legal principles to govern the collection and use of biospecimens in research”.Footnote 192 As already established in the preceding discussions, reliance on the legal concept of ownership and even use of contractual terms have not resolved concerns in health-related research using big data. However, custodianship is much broader than the legal concept of ownership and has been used to ensure that all stakeholders recognise and honour their ethical obligations to serve the best interests of biomedical research.Footnote 193 The specific attributes of this proposed normative framework are discussed in the next part.

4 Attributes of the Alternative Normative Framework to Govern Rights in Big Data

The significance of health-related research lies in the fact that big data can be used for developing public health policies for disease surveillance and managing population health, hence the need for good governance. This requires properly governed access to data sets for research, including citizens’ access to personal information, to avoid misunderstandings or bypassing doctor–patient relationships since medical professionals have to provide an accurate interpretation of the information in the data sets. Other valuable approaches that have been proposed for ensuring data sharing, such as data cooperatives,Footnote 194 would need the ethical framework of custodianship to function optimally especially after the cooperatives have granted access to research groups and other stakeholders.

UNESCO’s recommendation that a framework with new approaches to ownership and custodianship of personal data be developedFootnote 195 as well as an appreciation of the fuzzy contours of data ownership and related IPRs are good starting points in highlighting two attributes of the alternative normative framework. Firstly, it makes a clear distinction between the underlying data and related IPRs in big data, thus ensuring respect for IPRs. Secondly, it is premised on ethical principles that can be used to manage diverse interests, thus effectively addressing concerns in health-related research using big data. These are explained below.

4.1 Distinguishing Between the Data and Related IPRs

Having established that there are components of big data that are protected through IPRs, it should be clear at this stage that the proposed normative framework should guide stakeholders in managing these IPRs in a manner that fosters data sharing. All stakeholders have custodial responsibilities over data since, even if they hold related IPRs, they need to fulfil custodial responsibilities over the underlying data that are not part of their monopoly rights. These responsibilities arise from their fiduciary relationship with the data subjects.Footnote 196 Consequently, it is essential to distinguish between the data and related IPRs.

The arguments that have been advanced in this paper essentially emphasise the fact that owners of related IPRs do not own the underlying data. They are custodians of the data, which should be made accessible to the interested stakeholders in accordance with the data subjects’ consent. Citizens and data subjects are also key stakeholders in this regard, and they should be directly involved in the governance of big data.Footnote 197 Therefore, related IPRs should be managed to preserve open access to data and promotion of downstream commercialisation of inventions in a manner that fosters future research.Footnote 198

4.2 Managing Interests in Data Through the Ethical Framework of Custodianship

This framework entails acknowledging data as a gift from data subjects to be used with their consent to advance science for the benefit of society and not to be owned by researchers, host institutions or funders/sponsors. The reason for this approach is that data subjects consider researchers who obtain their data to be custodians of the data. The custodial responsibilities entail compliance with rigorous ethical and regulatory requirements such as providing accurate and timely data and safeguarding data subjects’ privacy and confidentiality.Footnote 199

Using custodianship to govern rights in big data is premised on five ethical principles that are explained below:Footnote 200

  1. i.

    Respect for privacy and autonomy: This entails ensuring that measures are in place to protect data subjects’ privacy while at the same time being accountable to them by maintaining open communication.Footnote 201

  2. ii.

    Reciprocity: Data custodians should provide feedback of the general results to relevant institutions and data subjects.Footnote 202 This principle can guide data controllers and other responsible stakeholders in ensuring the timely and efficient dissemination of aggregate research findings for the benefit of research participants and the public.Footnote 203

  3. iii.

    Freedom of scientific enquiry: This principle resonates with UNESCO’s recommendation that stakeholders adopt an understanding of big data as a common good of humankind, hence the need to facilitate open access and use of data for the common good.Footnote 204

  4. iv.

    Attribution: As already noted, sui generis database rights can be used to restrict access to the underlying data. The principle of attribution can reduce such restrictive practices by ensuring that stakeholders acknowledge the substantial investments in creating the databases and mutually agree on the terms of use and access.

  5. v.

    Respect for intellectual property: Although the underlying data are not the object of IP protection, stakeholders should respect related IPRs.

If the above principles that are embedded in custodianship are applied, then the ideal framework that proponents of data ownership have suggested, namely one that can ensure better trust in the accuracy of the data and facilitate enhanced sharing,Footnote 205 can be established without granting ownership rights that risk reducing data to a commodity for profit, thereby restricting data sharing and reuse.

5 Conclusions

The key insights from the discussions in this paper are: firstly, data as such are not capable of being owned. However, this does not mean that they should not be protected through other mechanisms that are aimed at ensuring accountability instead of granting proprietary rights. Having established that the underlying data are not owned by anyone, it is safe to conclude that the IP regime does not need to adapt to current big data practices. What is urgently needed is an alternative normative framework that is based on the ethical principle of custodianship to ensure accountability and responsible data sharing by all stakeholders. Secondly, granting property rights over underlying data has no basis in IP law and is incapable of solving the concerns in health-related research using big data. An important lesson from the discussion of the scope of sui generis rights in databases, as discussed in this paper, is that the underlying raw data should not be protected as IPRs. Such property rights in data will further impede data sharing. The concerns can be addressed through the paradigm shift that is discussed in this paper, which entails recognising that researchers, commercial organisations and repositories that collect and process data have custodial rights and responsibilities in handling the data.