Background

Healthcare provides an opportune setting for increased data sharing and secondary data analysis. Secondary data analysis of existing data originally collected for other purposes [1] can provide insights into real-world clinical practice [2] and generate new clinical evidence [3]. There are many forms of data collected during an individual’s interactions with health services, including administrative and clinical trial data which are the focus of this review. Administrative data are data originally collected for administrative and billing purposes [4], but have the capacity to be used to identify systemic issues and service gaps and used to inform improved health resourcing. Clinical trials are expensive and take an approximately 17 years to complete, and less than 14% of the evidence is translated into practice [5]. Given the low rates of evidence being translated into practice, it can be suggested that the secondary use of this data has greater importance. The secondary analysis of clinical trial data can further advance the medical community’s understanding of diseases and potentially limit the expenditure of funds on already tested hypotheses.

Increased access to data for secondary use is complex and continues to attract strong debate within the health and scientific communities as well as the general public. While researchers are now being encouraged to increase data accessibility for secondary research [6, 7], a range of stakeholder-perceived barriers and concerns remain, including issues such as trust, transparency, and privacy [8, 9]. Despite the impact of these issues on willingness to share data, there is a lack of synthesis of stakeholder views to guide policy and practice.

This paper presents the results of a subset of articles identified in our systematic literature review and focuses on healthcare consumer concerns relating to privacy, trust, and transparency in the setting health administrative and clinical trial data reuse.

Methods

This systematic literature review presents the results of a subset of articles identified in a larger review of articles addressing data sharing and was undertaken in accordance with the PRISMA statement for systematic reviews and meta-analysis [10]. The protocol was prospectively registered on PROSPERO (www.crd.york.ac.uk/PROSPERO, CRD42018110559; updated June 2020).

The following databases were searched: EMBASE/ MEDLINE, Cochrane Library, PubMed, CINAHL, Informit Health Collection, PROSPERO Database of Systematic Reviews, PsycINFO, and ProQuest. The search was conducted on 24 June 2020. No date restrictions were placed on the search; key search terms are listed in Table 1.

Table 1 Example search strategy

Our original goal was to focus on attitudes towards data reuse by breast cancer patients. However, due to a paucity of studies targeting this group, we re-ran the search without this limitation and present the results of all disease settings and noted specific cases where breast cancer or any cancers were included. Breast cancer is a disease that impacts older individuals; therefore, respondents under the age of 18 years were excluded from this analysis, as were attitudes towards biobanking and genetic research.

We noted that increasingly the delineation between data collected for administrative purposes and other forms of electronic documentation such as electronic health records (EHR) (or other terms for these) becomes less clear. These records can contain both administrative and clinical data. Where possible, EHRs were excluded from this literature review; however, we acknowledge that the lack of separation has made this a grey area.

Papers were considered eligible if they were published in English in a peer-reviewed journal; reported original research, either qualitative or quantitative with any study design, related to data sharing in any disease setting; and included subjects over 18 years of age. Reference list and hand searching was undertaken to identify additional papers. Systematic literature reviews were included in the wider search but were not included in the results. Papers were considered ineligible if they focused on electronic health records (including other terms for these), health information exchanges, biobanking and genetics, and were review articles, opinion pieces, articles, letters, editorials or non-peer-reviewed theses from masters and doctoral research. Duplicates were removed and title and abstract and full text screening were undertaken using the Cochrane systematic literature review programme ‘Covidence’ [11]. One author screened articles for eligibility and two authors were involved in the full text review process; conflicts were resolved by consensus.

Quality and bias were assessed at a study level using the QualSyst system for quantitative and qualitative studies as described by Kmet et al [12]; this is a validated tool and can be used to assess both qualitative and quantitative studies. No modifications were made to the QualSyst criteria prior to use. Quality and bias assessment was undertaken independently by two authors; conflicts were resolved by consensus. A maximum score of 20 is assigned to articles of high quality and low bias; the final QualSyst score is a proportion of the total, with a possible score ranging from 0.0 to 1.0 [12].

Data extraction was undertaken by one author using a pre-piloted form in Microsoft Office Excel; a second author confirmed the data extraction. Conflicts were resolved by consensus. Data points included author, country and year of study, study design and methodology, health setting, and key themes and results. Where available, detailed information on research participants was extracted including age, sex, employment status, highest level of education, and health status.

Quantitative data were summarised using descriptive statistics. Synthesis of qualitative findings used a meta-aggregative approach, in accordance with guidelines from Lockwood et al [13]. The main themes of each qualitative study were first identified and then combined, if relevant, into categories of commonality. Using a constant comparative approach, higher-order themes and subthemes were developed. Quantitative data relevant to each theme were then incorporated. Using a framework analysis approach as described by Gale et al [14], the perspectives of different groups towards data sharing were identified. Where differences occurred, they are highlighted in the results. Similarly, where systematic differences according to other characteristics (such as age or sex) occurred, these are highlighted.

Results

This search identified 10,499 articles, of which 323 underwent full text screening; 75 articles met the inclusion criteria for the larger review. The PRISMA diagram is presented in Fig. 1. This article presents a subset of the results of the wider search which explores attitudes of health consumers towards privacy, trust, and transparency. The results relating to attitudes towards data sharing and reuse by researchers and healthcare professionals, and attitudes towards consent in the context of data sharing and reuse by healthcare consumers are presented in subsequent publications.

Fig. 1
figure 1

PRISMA flow diagram

A subset of 35 [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49] of the 75 articles addressed issues relating to privacy [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49], trust [16,17,18, 21, 23, 24, 26, 28,29,30, 32,33,34,35,36,37, 39,40,41,42, 44,45,46, 48], and/or transparency [15,16,17, 26, 30, 32, 33, 37, 40, 42,43,44, 48] and are included in this analysis (Fig. 1 and Table 2). A total of 56,365 respondents were included in the studies.

Table 2 Included studies

Study design, location, clinical focus, and study populations

Qualitative research methodologies included face-to-face interviews and/or focus groups [32,33,34, 36,37,38, 49]. Other designs included surveys [16,17,18,19,20,21, 23,24,25,26,27,28,29, 35, 39, 41, 44] and combinations of deliberative sessions with surveys [15, 40, 45, 46] and focus groups and interviews [43]. Two studies used a citizens’ jury model [47, 48] and another was a nested cohort within a randomised controlled trial [22]. Studies were conducted in several countries; a breakdown by country is presented in Table 3.

Table 3 Studies by country

Most articles focused on the general public’s attitudes towards secondary data usage, particularly in general medicine [18, 22, 25,26,27,28,29,30, 34, 37,38,39, 43,44,45, 48, 49], but also national cancer databases [31, 41], clinical trials [21, 32], fertility [33], pharmaco-epidemiological [47], and epidemiological [42] research. Other studies focused on health consumers’ attitudes to secondary data usage in individuals: attending US Veterans Affairs (VA) facilities [40] or recently discharged from tertiary care [15], or with arthritis and other chronic conditions [36]. Others were in the setting of human immunodeficiency virus (HIV) [49], breast cancer (BC), colon cancer (CC) [35], or heterogeneous cancers [19, 20], acquired immune deficiency syndrome (AIDS), or multiple sclerosis (MS), or mental health concerns [23], presenting with rare diseases [16, 17], in adults or parents of children with cystic fibrosis (CF), sickle cell disease (SCD), or diabetes mellitus (DM) [35], or in adults with potentially stigmatising health conditions (DM, hypertension, chronic depression, alcoholism, HIV, BC, or lung cancer) [46].

The majority of articles discussed general attitudes towards health data linkage and secondary use [16, 22, 27, 30, 37, 39, 43, 46], linking health administrative data to clinical trial data [20] or clinical trial data reuse [21, 32], linking administrative data to survey data [38], access to medical records [15, 19, 23, 25, 26, 28, 29, 34, 35, 40, 45, 47, 48], statistical databases [49], research registries [17, 18, 31, 33, 36, 41], and health data for epidemiological research [42]. Privacy as sociotechnical capital [24] and commercial access to health data [44] were considered in one article each.

Study quality

Results of the quality assessment are provided in Table 2. QualSyst [12] scores ranged from 0.5 to 1.0 (possible range 0.0 to 1.0). While none were blinded studies, most provided clear information on respondent selection and data analysis methods and used justifiable study designs and methodologies. No key themes stood out for studies which received poorer judgements. No data were from randomised studies, with the highest level of evidence from a nested cohort study. Other data were obtained from lower-quality studies such as surveys and interviews.

Themes

Trust

A total of 12,794 respondents provide a view on trust; results were from surveys, questionnaires, focus groups, and interviews. One study was a nested cohort in a randomised control trial and two used a citizens’ jury model. Participants emphasised that organisations must develop, maintain, and promote high levels of patient trust [24, 26, 42]. Developing this trust can be achieved through the maintenance of confidential records and by providing information on how the individual’s information is used and by whom [40]. The importance of trust in health organisations, clinicians, and university researchers was also noted [18, 26, 29, 40, 42], although generally respondents trusted that organisations would keep their data private and confidential and that this would not be intentionally violated [40]. If a personal connection with the research team is established, then it is easier for individuals to form a trusting relationship [23]. The highest levels of trust was placed in the doctor [16, 26, 46], the National Health Service (NHS) [26], and hospitals [16, 46], while the lowest trust was in commercial organisations [26], pharmaceutical companies and insurance companies [46], or for-profit organisations [16]. An individual’s trust in an organisation was a determinant of what level of control they preferred over their data [40] and their willingness to participate in research [42], with trust overcoming concerns about privacy and confidentiality [49]. Where an organisation shows clear and relevant connections between their research and the information contained in the records, respondents trusted that the organisations will maintain the data appropriately [48]. Ensuring researchers act in the patient’s best interest and clearly and transparently disclosing the research being undertaken also built trust [40]. Respondents were generally trusting of the original research team and they trusted that they would use their data appropriately [32]. In a study about the use of fertility data, many respondents believed that registry data was already used for research purposes thus showing an established trust in the clinic, hospital, and wider health institutions [33].

The ability to maintain data security, privacy, confidentiality, and accurate records, change or delete incorrect data, and ensure that data would not be used to discriminate against an individual, all contributed to levels of trust [26]. Granting access to a small number of named individuals was not seen as a solution to resolving privacy concerns, as these individuals themselves may not be trustworthy [44]. Any research undertaken using secondary data analysis must not undermine or compromise an individual’s trust in medical research [30]. The level of respondents’ education influenced their view of trust, with a higher level of education being more trusting of their government and research institutions compared to those who finished their education earlier [16]. In the setting of fertility, most respondents were willing to share their data, suggesting trust in the organisation and registry [33].

Distrust

In contrast, the theme of distrust was noted in several articles representing a total of 6830 respondents and included data from questionnaires, surveys, focus groups, and a citizens’ jury. A general distrust in the health system, research, and sharing of health information [30, 36] was noted, with some respondents not trusting any organisation with their data [26] or the organisational capacity to maintain records appropriately [48]. While there was a desire to support the use of anonymised health data for research purposes, concerns regarding trust in the systems and data security remained [34, 40]. The provision of information on the source of research funding [40, 42] and data management systems [45] can increase transparency and trust, but providing more information on data use does not necessarily increase public trust [48]. In a study from the UK, some individuals with a ‘pessimistic dystopian’ mindset had limited trust in commercial organisations accessing health data, believing it would create new harms [44], with some suggesting that organisations may use the data inappropriately (exploit or manipulate individuals or populations or might manipulate the data to support their own agenda) [48]. Access to information by pharmaceutical companies and insurance agencies had lower levels of support, suggesting a distrust in these organisations. Older respondents (≥ 65 years of age) showed less trust in these organisations compared to younger respondents (≤ 25 years of age) [16]. Respondents who believed that data sharing had more negative than positive effects were more likely to have a college education [21]. Generally, these respondents believe that people could not be trusted and were concerned about data reidentification and information theft [21]. These low levels of trust were associated with a decreased willingness to share data with both for-profit and non-profit organisations alike [21]. Sharing data with an ‘unknown’ researcher was also associated with distrust; further, some believed that the increased digitisation of healthcare would lead to a decrease in the traditional provision of care [32]. In the setting of fertility, respondents’ levels of trust decreased given some respondents saw them as a business; it is essential that the information provided to people clearly state the purpose of data reuse and should note that it would not be used for purposes such as marketing [33]. Lucero et al. noted that there is a psychological component to uncertainty and mistrust. This leads to a distrust in volunteering for research and the need for organisations and ethics review boards to engage with communities to build trust [37]. To decrease distrust, respondents wanted to have face-to-face contact with researchers during a study’s recruitment process [37].

Privacy and confidentiality: differences according to demographic and health characteristics

A total of 44,366 respondents provide a view on privacy and confidentiality. Responses were obtained through surveys, deliberative workshops, dialogues and interviews, and questionnaires.

General concerns about privacy

Concerns about privacy and confidentiality were one reason for not sharing health data [16, 28, 29, 36, 37, 40, 44]. One study noted that the respondents’ concerns about privacy had increased over the past 5 years [29]. Concerns about the sense of ‘big brother’ and the potential for data to be used to discriminate [43] were expressed, with some consumers expressing a belief in the natural right (not dependent on law or custom) to privacy [34]. Where safeguards were in place to protect the data, most respondents in one study were willing to share their data, irrespective of the proposed data use [21, 32], except in the setting of litigation [21].

Demographic characteristics

The inclusion of an individual’s postcode, name or address, and receiving a letter inviting them to participate in research from the cancer registry was not considered to be a breach of privacy [31]. In a study of UK respondents, no substantial differences in privacy concerns were found according to sex or age; however, small but significant variations were noted by factors such as education, ethnicity, socioeconomic status, and an experience of cancer in the immediate family [31].

In other studies, the relationship between age and sex and concerns regarding trust and privacy were contradictory. Younger respondents expressed higher levels of trust in researchers and were more willing to let their data be used for research, but they also had high levels of privacy concerns [18]. Conversely, other studies noted that older respondents were more likely to agree to data linkage [22], while respondents aged 18 to 19 years and over 60 years had lower levels of privacy concerns compared to other groups [49].

Levels of concern about privacy were also influenced by the respondent’s level of education and employment. Those with commercial or technical qualifications had more concerns regarding privacy compared to all other education groups and those with a post-graduate degree had the fewest privacy concerns [49]. One article considered privacy in the context of sociotechnical capital, composed of awareness of privacy, attitudes towards the importance of privacy and data sharing, and confidence in the ability to maintain privacy [24]. Individuals with higher levels of education and income had higher rates of health privacy capital [24]. In a second study, respondents who were employed in manual, routine, or intermediate work were more likely to share their data compared to those in professional roles [33]. Respondents employed by a government organisation were more concerned about privacy [49]. Respondents who did not respond to finance questions had lower rates of consent for data linkage [38]. Other influences on privacy included social networks [24]. In one study, differences in rates of privacy concern between those who answered the survey online compared to those who answered via telephone were noted; those who answered by telephone were less privacy concerned [46]. Some differences in privacy concerns were noted by country. A study of European respondents found those based in Sweden, Slovenia, and Denmark were less concerned about privacy concerned compared to respondents from Lithuania [25].

Health status

Health status also impacted privacy concerns. Respondents in good health were more likely to agree to the use of data in healthcare registries compared to those in poor health [18]. Nevertheless, in the setting of no additional digital security measures (restricted access, etc.) being applied, individuals with poor health were less concerned about privacy compared to those in good health [49].

Sensitivity of data

A total of 3347 respondents provided a view on sensitivity of data. Individuals may consider some forms of medical data to be more sensitive than others. Data related to sexually transmitted diseases, family medical history including genetic disorders, drug and alcohol use [49], and mental illness [43, 49] raised the most privacy concerns, particularly the possibility of inappropriate data access [23, 49]. A UK report noted that ‘new ways of collecting and sharing data, under new circumstances, can give rise to conflicting expectations around data privacy’ [44] and that different types of data came with different privacy expectations [44]. In one study, respondents believed that data was a similar resource as tissue samples, suggesting that data is equally as sensitive as biospecimens [32]. A study of respondents who were seeking fertility services found that while there was a willingness to share data, they were concerned about the potential for the data to cause harm (potential for stigma), not only to them but also their children [33]. Further, some respondents were concerned that the data collected on them, while required by legislation, was not collected on fertile couples [33].

Control of data

A total of 6859 respondents provided a view on control of data; results were obtained from surveys, a nested cohort, focus groups, and a citizens’ jury. Individuals’ desire to maintain some control over their health data was evident across studies, with many seeing this as key to transparency [40, 48]. Respondents were selective about those with whom they were willing to share their data. Respondents to two UK studies preferred their data to stay within the NHS [34, 43], with some believing that once data left the organisation control would be lost [34]. Health data access by private/commercial organisations [21, 26, 37, 41,42,43,44], pharmaceutical companies [21, 42, 43], and health insurance companies [21, 43, 44], all were seen as inappropriate. Respondents were concerned that insurance companies would use health data to adjust premiums which was considered inappropriate and without clear public benefit [44]. Not allowing third parties to access their data was based on a distrust of these organisations [42, 44], perceived lack of transparency from research conducted by pharmaceutical companies [42], concern about the companies’ motivations (e.g. profit, marketing) [40, 42, 44], doubts about data security, distrust in their capacity to put society before profit, and a belief that a commercial organisation may on-sell their data [44]. In one study, some respondents indicated they would prefer that research not be undertaken if it required allowing commercial access to health data; however, most respondents wanted third-party access to health data if disallowing this resulted in research not being undertaken [44]. In contrast, other studies found respondents were happy to allow pharmaceutical company access to their registry data if it is undertaken in a transparent manner [17], and where consent is sought, respondents in a second study accepted the principle of commercial access to health data [44]. Respondents to one survey suggested that increased sharing may be used for marketing purposes, stolen, used for profit, or to discriminate against an individual [21]. Further, some respondents wanted to be informed when their data was being used [32]. Digitisation of health data was seen by some as a mechanism to increase control and transparency over their data, and increased participation in research [32]. Some believed that consent was required to access records for research, or to identify potential research participants, without which it was a violation of their privacy [37].

Benefit to society

The importance of research and its benefit to society was noted as important in several studies with a total of 7006 respondents. It was noted that society’s views on privacy may be changing, creating conflicting values between privacy protection and public benefit [39]. Generally, respondents were positive about sharing their data for research [18, 20, 49]. In some circumstances, societal benefit may outweigh concerns regarding privacy [29, 44, 47]; further, research using health data should have a societal impact and not be undertaken just for academic reasons [40]. Ensuring transparency about the public benefit of research and sharing of results and analysis at a study’s conclusion [44] were important. Where data was used for public benefit, such as improved medical care and treatments, improved public health, or management of public funds, and organisations made a clear and compelling case for access to the data, access should be granted as it could potentially benefit both the individual and the health service [48]. Public benefit was seen as a justification for access to health data and an individual’s right to privacy should not prevent research that could benefit the general public [48]. Altruism was also noted as reason to support health research using existing data, with some wanting their data to be used to ‘maximum potential’ [32]. In the setting of fertility data, sharing for the greater good was important to some [33]; however, this was not universal as some believed that it increased the risk of harm (fraud, identity theft, targeted marketing) [33] and that the premise that public benefit outweighs privacy concerns was not supported [18]. In one survey, some respondents valued maintaining individual control over their data more than societal benefit and respondents expressed a lack of willingness to trade loss of privacy for public good [23]. This was echoed in a second study where the majority of respondents believed that the right to privacy should be respected over all else; however, respondents also believed that if the data were made anonymous and privacy was maintained, data should be used for research that benefits society [26].

Views about specific data sharing scenarios

Digitisation of health records

Data linkage

A total of 700 respondents provided a view on health data linkage. An Australian study identified concerns relating to confidentiality during the data linkage process, specifically the possibility that the individual making the linkage may know the person and find out confidential information about them, although this was not universal [39]. The use of de-identified data was not seen as a breach of privacy [39] and that the current data linkage best practice provides sufficient privacy protection [30]. Transparency about process and data usage was an important factor in individuals’ decisions to allow data linkage [43]. The importance of ensuring privacy when undertaking the linkage of clinical trial data to health administrative data was seen as important [20]. A survey of UK respondents found that they were not concerned about health record linkage as long as the data was used to increase health knowledge, consistency between health services, and administrative efficiency [43].

Registries and patient-provided data

A total of 25,814 respondents provide a view on registries and patient-provided data. Studies indicated that use of health information exchanges and digital health platforms can improve care [25]. While these positives were noted, views on use of these technologies varied widely [36].

While respondents agreed with the principles of electronic data sharing, they desired transparency and a mechanism for independent scrutiny of data access and use [44]. Some respondents indicated a preference for more electronic health data sharing [25].

There was a high level of trust in using data from disease registries [17] and a willingness to share data; in a cancer setting, only a small number of respondents were opposed to data collection [41]. The most common concern about registry-based research was the protection of privacy [18]. One mechanism suggested to ensure maximum transparency in registry research was to involve patient organisations in the development of clinical trials and registries [17].

Strategies to address privacy and trust concerns

Data security

A total of 25,052 respondents provided a view on data security. Many health organisations already have well-established protocols to ensure patient privacy. Transparent information about data security and protection measures is important to maintain trust [26, 32] with some suggesting that systems to protect confidentiality should be more secure for shared data than those used in usual medical practice [40]. Some respondents were particularly concerned about unauthorised access to their data [40] particularly the chance for data to be lost, stolen, or ‘hacked’, or shared without consent [43], although trust in researchers and health care providers to maintain data security [25] remained. Some respondents in one study were happy to ‘trade-off’ any potential risk to privacy and security for benefits, like improved treatment and services [41]. Breaches in data security significantly reduced the levels of trust in an organisation to keep the individual’s health data private [33, 46]. Interestingly in one study, respondents were more willing to allow access to their electronic records, compared to paper-based records [37]

The role of legal and ethics bodies in protecting privacy

A total of 4219 respondents provided a view on the role legal and ethical bodies have in protecting privacy. The use of laws, regulations, and policies to protect an individual’s privacy in the UK [31], the USA [40], and Australia [49] were noted. Without developing an understanding of individual privacy concerns and perceptions of privacy, King et al. note that it will be ‘impossible to provide adequate law as well as effective technical solutions for protecting privacy’ [49]. In the UK, laws allow for data use if the risk to privacy is proportionate [15]; the NHS Code of Practice on Confidentiality establishes rules for the protection of privacy [31]. In relation to new laws to improve data collection, one study noted that 81% of respondents (N = 2335) would support a law making a cancer registry statutory in the UK [31]. In the USA, mechanisms such as the National Institute of Health (NIH) requirement for manuscripts from NIH-funded research to be made publicly available were considered beneficial in fostering public accountability and trust [40]. Further, the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) establishes a national standard for the protection of health information [40]. However, some believe that concerns about privacy are not fully addressed by HIPAA, which treats all health data, except psychotherapy, the same [40]. In a study, some respondents were unaware that under some circumstances their medical data could be used without their permission [40]. Respondents in two studies advocated for clear and consistently applied penalties for individuals who breach privacy, such as job termination, paying fines, and/or going to jail [40]; measures such as this may increase perceptions of trust and accountability [40]. The role of ethics and institutional review boards in protecting privacy was noted in two articles [17, 40]. Respondents supported the role of ethics committees to manage access to health data and trusted their decisions [32]. It is important that health consumers recognise the role of these bodies in regulating access to data for research [40] and in protecting patient rights [17]. Finally, the development of clear policies and procedures will allow for more support for the secondary use of data, while increasing transparency for the healthcare consumer [37].

Anonymisation

A total of 5302 respondents provided a view on data anonymisation. Data anonymisation was central to an individual’s decision to share health data for research or health and service improvement programmes [26, 28, 40, 49]. There was a lack of understanding between the terms anonymisation and identifiable data [33]. In one study, many respondents were in favour of anonymous databases for research, noting it was beneficial and would advance medical research without impacting on their privacy [35]. In the setting of appropriate privacy, confidentiality frameworks, and ethical oversight, Parkin and Paul note that an informed public are more likely to be receptive to research using potentially identifiable health information [47].

Even when data is de-identified, some respondents remained concerned in the setting of extra security measures and data anonymisation [49], which were not seen as safeguards [44]. Some respondents believed that even if data had identifying features removed it was not completely de-identified [26] and were concerned about sharing de-identified data with non-healthcare professionals [28]. Respondents were asked about their preferences for either a computer system or human programmer to anonymise (extract and link) data; some expressed concern about the potential for identification of individuals and noted a need for trust in the people undertaking these tasks [34]. While respondents recognised the capacity of computers to undertake the anonymisation process, they suggested they would not trust a completely computerised system citing concerns about data infrastructure and data accuracy [34].

Communication and education

A total of 8511 respondents provided a view on the importance of communication and the role of education in promoting data sharing. Providing increased information about data use and research more generally allowed individuals to feel their privacy is being maintained while contributing to health research with societal benefits [45]. Providing education was also seen as a mechanism to improve transparency [15, 16, 33] and trust [23, 26]. Specific information on how and when the data will be used [15, 45], and knowing how and where their contact details were sourced [42] were all important to individuals. Information on the data aggregation and anonymisation processes [44], and the systems used to protect data [40], should be provided. In a UK cancer registry study, cancer patients opposed to current data collection processes were more concerned about lack of information about the registry and consent processes than privacy [41]. Information and education about database governance, including data storage, length of data accessibility, and use of data, should be clear at the time of consent, particularly for data held in disease registries [17].

Consent

Most of the articles included in this subset discussed the connection between issues of privacy, trust, and transparency, and consent. Specific issues of consent in relation to secondary data use and sharing are discussed in a separate publication. Broadly, seeking consent for the secondary analysis of health data, either anonymised or potentially identifiable, was seen as a way to build trust, respect, and transparency [26, 30] and address an individual’s privacy concerns [23, 27, 39].

Discussion

This systematic literature review highlights the ongoing complexity associated with secondary data analysis and linking health data. Data gaps identified included a paucity of information specifically related to our primary area of interest, and the attitudes of breast cancer patients towards the secondary use and sharing of health administrative and clinical trial data. Interestingly, given the high rate of cancer more generally in society, this population was underrepresented in the results.

While respondents believed that the principles of data sharing were sound, significant concerns regarding privacy, information security, trust, and transparency remain. Further, the diversity of attitudes towards privacy suggests that there is little clarity on what predicts an individual’s attitudes towards privacy, highlighting an area for future study. Many respondents supported the use of health data for social benefit; however, this was not universal. The literature underscores the importance of communication between those who collect data or act as data custodians and health consumers. Health consumers should be provided clear information on how their data privacy will be maintained, how the data will be secured, and how access to their data will be regulated. Providing increased information to health consumers about how, when, and where their health data may be used, and with whom it may be shared, is essential in the development and maintenance of transparent data sharing systems and policies. Concerns relating to privacy and the misuse of data may be, in part, mitigated by increased education of health consumers regarding their national privacy laws and regulations. Providing information on penalties for breaches of privacy and how an individual’s health data can and cannot be used is important. This may reduce some specific concerns regarding inappropriate use of data, ‘big brother’ sentiments, and any perceptions of discrimination based on data. While not specifically discussed in the articles, it is important to note that as the use of artificial intelligence increases in healthcare, ensuring penalties for discrimination based on data analysis will become more complex. Examples of discriminatory algorithms, in society and in healthcare have been highlighted by researchers [50,51,52], and these need to be closely examined and tested as our reliance on data-driven healthcare increases. Further, health consumers need to be provided information about how any research is undertaken including anonymisation and aggregation processes, and the requirement for ethics committee oversight.

Our results suggest that trust is an important component in the discussion regarding the secondary analysis of health data: trust in the organisations, clinicians, and infrastructures used to maintain data. Onora O’Neill has written extensively on issues of trust in a modern society and in health and argues that despite the sentiment expressed by some that trust more generally in society has decreased, it has not; rather the culture of suspicion has increased [53]. Therefore, it is essential that organisations wishing to undertake secondary analysis on their datasets need to develop trust between themselves and health consumers.

Limitations

The papers included in this study were limited to those indexed on major databases; some literature on this topic may have been excluded if it was not identified during the ‘grey’ literature and hand searching phases. As the search was restricted to English language publications, some relevant literature may have been excluded from the search. Given the initial focus of this research being attitudes towards data sharing and reuse in breast cancer, individuals under 18 years of age were excluded from the analysis. A final limitation of this research is that much of the data was from research methods (surveys, interviews) that are not considered to be level 1 evidence; however, a randomised controlled trial methodology is not necessarily appropriate to this research subject.

Implications

Results of this systematic literature review indicate that while respondents identified advantages in health information data sharing, including post-market medication surveillance and the potential to decrease medical errors, concerns relating to trust, transparency, and the protection of privacy remain. Additional work is therefore required within these areas during the conception, design, and implementation phases of any health data sharing programmes to ensure the balance between public benefit and individual privacy is maintained.

Conclusion

The literature confirms that while consumers understand the benefits of health data sharing for research purposes, issues of trust, transparency, and privacy remain central to acceptance of health data sharing policies and programmes in the general community. Researchers and those undertaking secondary data analysis should work with consumer organisations to ensure consumer concerns are addressed.