Introduction

Many countries throughout the Global North currently invest in data intensive resourcing in healthcare (Hogle 2016; Hoeyer et al. 2019). Health data has become a kind of capital that is “collected, stored, and traded for the future benefits it is believed to bring” (Barilan and Brusa 2022, p. 2). Comprehensive health-data infrastructures are hoped to simultaneously improve medicine through strengthened research and capitalize on new data economies for pharmaceutical innovation in precision medicine (Tarkkala et al. 2019; Boniolo 2022). In the USA, for example, massive funding has been channeled into digitalization through the stimulus package enacted in the wake of the financial crisis. In the UK, there have been multiple initiatives to facilitate research, including the care.data initiative, the 100,000 Genomes Project and collaborations with Google DeepMind and more recently Palantir. In Australia, investments have been made into an infrastructure called My Health Record that gathers health data on a nationally integrated platform. Also in Europe, there are many national initiatives, such as the French Health Data Hub and the Finnish Findata. Recently, the European Union (EU) has also initiated an infrastructure for reuse of pharmaceutical data called EU Darwin. An even more ambitious initiative is the planned European Health Data Space (EHDS), a data infrastructure envisioned to facilitate data sharing and reuse of health data for citizens, health professionals, administrators, researchers, and industry across the union’s member states. Common to these diverse initiatives is a shared policy vision: to use digitalization to pave the way for integration and repurposing of health data for administration, political governance, public health surveillance, research, innovation, and economic growth (European Commission 2020b, c). Political strategies thereby seem to assume a straightforward compatibility between multiple uses of health data. But what does the political goal of repurposing of health data for non-clinical purposes entail for clinical work? Which costs and trade-offs may be involved for those who produce data in the first place?

The policy goal of data repurposing is fueled by a widespread discourse around health data being an untapped resource for bioeconomic policies (OECD 2013), as “repositories for data ready for statistical analysis” (Barilan and Brusa 2022, p. 2). In the Nordic countries, health data are often framed as an unexploited “goldmine”, the data being “gold” that could be extracted from the existing—and highly integrated—data sources, to promote health and wealth (e.g., Nordforsk 2014; see also Tarkkala et al. 2019; Tupasela 2021). In the UK, the Department of Health has stressed the need to “liberate” national health data for additional purposes, via new infrastructures that will improve the “flow of information between organizations” (UK Department of Health 2011: 48). In the US, the National Academy of Science has suggested that drug development can be speeded up by making clinical data immediately available for research via “digital commons” (NAS 2011). Similarly, the European Commission envisions that “a single market for data will allow it to flow freely within the EU and across sectors for the benefit of businesses, researchers and public administrators” (European Commission 2019: introduction; see also European Commission 2010, 2016, 2020a, b, c, p. 9). The European Health Data Space is, for example, framed in these documents as “unleashing the potential” of health data—as if the many objectives can be reached at no cost, simply by removing “barriers.”

The metaphor of “data flow,” often used in policy reports, suggests that integration and reinterpretation of data are about ensuring that nothing “stops” the flow, as if data were water moving in pipes. However, a growing literature in philosophy of science and social science demonstrates that data integration and repurposing are far from straightforward but require meticulous data work and expertise to succeed (e.g., Hogle 2016; Leonelli 2014; 2016; Bossen et al. 2019; Gabrielsen 2020; Pine et al. 2020; Hoeyer 2023). While ethical debates about data reuse have raised important points about privacy, autonomy, discrimination, and inequality (see below), the reframing of health data as “assets” for administration, research, and innovation can also include costs and trade-offs in need of ethical attention (see also Hunt et al. 2017; Vezyridis et al. 2017; Birch et al. 2021; Pinel 2021). We contend that to minimize the friction between clinical needs and the aim of data repurposing, policy makers need to set priorities early on, in the planning stages of new infrastructural initiatives. The main aim of our paper is to explore what such costs and trade-offs consist of. We do so by unpacking the “invisible” data work (Star 1991; Bowker and Star 1999) in clinical practice that is associated with new strategies aimed at using health data for an increasing number of purposes.

The functions of health records have expanded significantly in the last decades. From being primarily a tool for clinical record keeping and communication, electronic health records increasingly also serve purposes such as quality analysis and management, financial administration, as well as research and innovation (Winthereik et al. 2007; Vezyridis and Timmons 2021). Political visions to reuse health data for multiple purposes are facilitated by digitalization of healthcare systems, but they also shape the digitalization process itself through choices of data infrastructure design. Infrastructure design, in turn, impact working conditions for the users. Zuboff documented already in 1988 how digitalization and automation via “smart machines” can improve information processing and work productivity but also negatively disrupt working conditions and collaborations among employees (Zuboff 1988). Zuboff stresses that digital technologies can “take many forms depending upon the social and economic logics that bring it to life” (Zuboff 2019, p. 15). As in Zuboff’s work, the target of our critical analysis is not technologies or digitalization as such but the “logic that imbues and commands it into action” (Ibid, p. 15). We are interested in understanding the consequences of using large-scale digitalization and data integration as means for various purposes. We therefore focus on strategies that are motivated and shaped by the political desire to create uniform and reusable data via a system that simultaneously can cater for multiple purposes.

The study builds on ethnographic studies in the Danish healthcare system. Denmark is a small welfare state with a population of 5.8 million citizens. The Danish healthcare system is particularly apt for studying what data repurposing involves because it is highly digitalized and integrated (Schmidt et al. 2019; United Nations 2020). Denmark is considered an international pacesetter in the use of electronic health records, which are mandatory in both primary and secondary care, and there is a high degree of public trust in data handling by health officials. Healthcare is universal and accessible to all registered citizens with a CPR-number, a personal identifier that is also used to link information on all citizens’ encounters with public and private health providers (Ministry of Health 2017). Since 1968, the personal identifier has been assigned to all Danes at birth and enables the establishment of lifelong data trajectories at the individual level and across sectors—a feature that has made Denmark internationally known as “the epidemiologist’s dream” (Frank 2000; Bauer 2014; Schmidt et al. 2019). Tupasela (2021) describes the political strategies in Nordic countries, including Denmark and Finland, as population branding that reframes healthcare systems, health data, and populations as assets for developments of and investments in genomic medicine. These include centralized health data registers, wide healthcare coverage, the relatively high genetic homogeneity of the populations, as well as the high public trust in data collection. These features make Nordic health databases significantly different from countries with more diverse populations and more fragmented and less inclusive healthcare systems, such as the US (Dawes 2020). Because of these features, Denmark has been promoted as a “digital frontrunner” for the European Health Data Space (Digitaliseringspartnerskabet 2021, p. 19). Since other countries may pursue similar paths, the experience gained in Denmark is therefore likely to be of wider international relevance.

The paper is structured as follows. First, we briefly introduce some of the academic discussions about repurposing of health data before we present the Danish case and our methods. In the analysis, we outline five types of clinical data work that proliferate with increased emphasis on data repurposing. In the discussion and conclusion, we highlight the need for political priority-setting when planning large-scale infrastructures for data repurposing.

The practical ethics of repurposing health data

Ethical debates about the repurposing of health data focus prominently on principal values related to the rights of data subjects, like privacy and autonomy. For instance, in an influential review of ethical debates on Big Data practices, Mittelstadt and Floridi (2016) highlight the issues of informed consent, privacy, data ownership, epistemological challenges, and inequality in power as key concerns when data are used for profiling and surveillance. These are important themes. Yet, other kinds of ethical issues may also be at stake in the daily work of those who are to produce data. Rather than being a question of principle, such issues may be expressed as trade-offs or frictions that are to be handled as part of everyday practices; what other scholars have referred to as practical or empirical ethics (Hoffmaster 1992; Arribas-Ayllon et al. 2011; Pols 2015). By attending to practical ethical concerns related to the production and use of data in clinical practice, it is possible to point out challenges associated with, for instance, incompatible data formats and limited interoperability of IT solutions developed for different needs (Kruse et al. 2016). Scholarly attention has been already given to the repurposing of health data for research (Tempini 2020) and for commerce (Birch et al. 2021; Pinel 2020; Vezyridis et al. 2017); and the use of clinical data for administrative or juridical purposes has also been addressed (Hunt et al. 2017; Wiener 2000). These studies point out that the alleged benefits of data purposing also come with potential costs. We show how frictions may arise when additional data work is required in clinical settings as a precondition to facilitate repurposing of health data.

The concept of data work refers to the skilled and distributed labor involved in producing, documenting, curating, storing, and disseminating data, as well as the efforts required to make sense of them (Bonde et al. 2019). Berg and Goorman’s (1999) seminal paper on the repurposing of health data highlighted how the amount of data work increases with the number of and distance between different uses of data:

The further information has to be able to circulate (i.e., the more diverse contexts it has to be usable in), the more work is required to disentangle the information from the context of its production (Berg and Goorman 1999, p. 51).

Berg and Goorman termed this statement the “law of medical information.” Similarly, Leonelli (2016) has highlighted the requirements of advanced infrastructures, long-term planning and skilled labor to “package data for travel” in the context of biological and biomedical research. Rather than the metaphor of “data flow,” Leonelli prefers the notion of “data journeys” to highlight that data integration is often delayed, disrupted, or retransferred due to a lack of resources for data curation (Leonelli 2014). Data curation requires expertise in the local context of data production, as well as the epistemic interests of new users, and thus involves tasks that are not easy to automate (see also Akrich 1992; Leonelli and Tempini 2020; Tempini et al. 2020).

Using electronic health information is considered an important component of evidence‐based medicine, and non-clinical purposes can benefit clinical work, e.g., through quality control and strengthened biomedical research and innovation for development of future treatments. At the same time, digitalization often does not improve patient care or clinical practices in a straightforward manner or without costs (Fiander et al. 2015). While the introduction of new data infrastructures is intended to improve information flow and reduce data work in the clinic, the experience in practice is often the opposite (Vikkelsø 2005; Downing et al. 2018). Because of existing discontinuities in data formats and local differences in reporting standards, new infrastructures often redistribute, rather than eliminate, data work. New forms of data work are required to collect, check, clean, store, and reformat data in ways that comply with new systems and additional users to make data meaningful. Data work in clinical practice following the introduction of new IT systems has been described as “invisible” or “hidden” (Star 1991), in the sense that it is often taken for granted, or not included, in rationalized models of how IT systems influence work tasks or measurements of hospital productivity (Bowker and Star 1999, p. 245; Timmermans and Berg 2003; Bonde et al. 2019; Fiske et al. 2019; McVey et al. 2021). When data work is experienced as draining resources from other tasks in scientific or medical practice, it can lead to what has been described as data friction (Edwards 2010; Edwards et al. 2011). The notion of data friction highlights that the transformation and movement of data always consume energy. Data friction, like physical friction, can be productive in the sense that the energy consumed can be converted into new possibilities (Bonde et al. 2019). Yet, in some cases, the frictional cost of data transformation may exceed the resources available and lead to a decline in productivity or system collapse. Disruptions in data work can therefore be considered as a practical ethical problem for the functioning of healthcare systems.

Methods

Individually and collectively, we have explored data-integration aimed at repurposing data in the Danish healthcare system through observations of data work, interviews and informal discussions with clinicians, data analysts, and policy makers, as well as analysis of policy debates as they are expressed in strategy papers, health policy magazines, and the bulletins of health professionals’ associations. Wadmann has undertaken 36 h of observation and conducted nine semi-structured interviews with clinical staff and managers in two psychiatric centers. Hillersdal has observed patient consultations and the daily activities of oncologists and nurses at the cancer research unit for experimental drug trials. In addition, Hillersdal has observed clinical practice and interviewed nine clinical staff, five research nurses, three data consultants, three industry partners, and 16 participating patients. Holt has carried out observation and semi-structured interviews with 23 infection prevention and control nurses, 10 clinical microbiologists and a doctor in 14 infection control units. Hoeyer has interviewed data analysts, administrators and policymakers working with data integration across municipal, regional, and state levels, which has complemented our understanding of the strategy papers, though we do not specifically quote these interviews here. Moreover, we have all participated in public and politically organized meetings, where the organization of and ambitions for the future Danish healthcare system have been discussed.

This paper draws on ethnographic studies carried out by the authors towards various individual research ends (Wadmann and Hoeyer 2018; Holt 2020; Hillerdal and Svendsen 2022; Hoeyer 2023). We found similar issues arising in different contexts and decided to begin comparing systematically across case studies. For example, particular large-scale investment in a new electronic health record system from the American supplier EPIC came up in all our studies as presenting similar challenges. In analyzing our material, we categorized the types of data work involved through an iterative process of identifying themes (Madden 2010) and revisiting materials to look for differences and similarities across sites. All translations from Danish to English were made by the authors. In Denmark, qualitative research is not subject to approval from an ethics committee. The collection and use of empirical examples comply with the requirements of the EU’s General Data Protection Regulation (GDPR).

Context: policy visions and data work in a highly integrated data infrastructure

While Denmark is ahead of many other countries in terms of integrating infrastructures for health data, issues related to the prioritizing of financial investments into data repurposing are of relevance far beyond the Danish setting. Such investments relate, for instance, to the development and implementation of digital equipment and shared communication standards. As mentioned in the introduction, an ambitious example is the European Health Data Space initiative, which involves the harmonization of standards for electronic health records (European Commission 2020c). Harmonization is needed to enable automated data transfers, and automation is needed to ensure seamless data availability, completeness, and ease. Such efforts challenge the old distinction between “primary” and “secondary” data use (e.g., Markus 2001), because data are intended to take formats that work equally well for multiple purposes. The Danish national strategy for digitalization from 2018 states illustrates this by highlighting that “with the new data-driven technologies, the [clinical and non-clinical] purposes increasingly supplement each other,” fostering a growing “reciprocity” in the use of health data (Sundheds- og Ældreministeriet, Finansministeriet, Danske Regioner and KL 2018, p. 4, our translation). Similar views are expressed in ambitions to develop infrastructures enabling health data to be used for research and innovation in addition to clinical purposes (Danske Regioner 2015; Danske Regioner og Dansk Industri 2019; Digitaliseringspartnerskabet 2021; Ministry of Health 2016; Regeringen 2021). The most recent Danish digitalization strategy, from May 2022, highlights a vision for “Better use of health data for the benefit of Danish patients, as well as research and development of innovative life science solutions through, among other things, realizing a vision for better use of health data, [and for] one common access point to health data for research and innovation, etc.” (Regeringen 2022: 37, our translation). Thus, the clinical aims for producing and using data no longer hold primacy in defining data standards.

Political strategy papers brand Denmark as an ideal context for life science research and drug development (Danske Regioner 2015; Sundhedsministeriet og Danske Regioner 2021; Ministry of Foreign Affairs 2014). Like other Nordic countries, such as Finland, it is a key policy vision to use health data to attract international commercial investments in research and innovation (Tupasela 2021). To attract biomedical companies, Denmark established an infrastructural project termed Trial Nation in 2018. It is a merger of previous initiatives to attract global investments in clinical trials in Denmark, by offering pharmaceutical companies a single-entry point to health data, thus making it easier to identify candidate patients for trials.Footnote 1 Another milestone was the launching of the National Genome Center in 2019 to facilitate population-wide collection and integration of genomic and health data (Novo Nordisk Foundation 2018; Danish National Genome Center 2019). Such initiatives received further support with a new Life Science Strategy, published by the Danish government (Regeringen 2021). It is a common feature, globally, of data integration initiatives that countries compete against each other (Vezyridis and Timmons 2021) and increasingly use health data as assets aimed at “branding” (Tupasela 2021). The ambition to reuse health data for research, innovation, and administration installs new demands on data quality and availability, as well as on the standardization and completeness of datasets (Petersen 2019).

Record keeping of health data has been mostly digitized for decades in Denmark, but ambitions to develop a nationwide platform for electronic health records have not yet materialized. Danish hospitals are managed in five regions with their own political levels of management. In 2016, the IT system Sundhedsplatformen (Danish for “the health platform”) was delivered by the American EPIC company in two of these five Danish regions. It was presented as a move to further integrate a range of different systems already in use, and to facilitate effective and fast data repurposing (Bentzon and Rosenberg 2021, p. 22). The system was also said to improve continuity and transparency in patient information and patient safety, as well as to optimize workflows and data reporting by requiring hospital doctors to write directly into the patient record (Drachman and Davidsen-Nielsen 2018). This was the largest IT investment in Danish healthcare (2.8 billion DKK, or 458M US$) and therefore it is worthwhile studying what it entailed in more detail. We use this IT-system to study the consequences of implementing a centralized data infrastructure, which was presented as a way to improve documentation and quality control for clinical purposes, but whose design is shaped by the desire to create uniform and reusable data for functions beyond these. A highlighted virtue of the new system was the availability of automation functions to ease the so-called documentation burden and time spent on data reporting for clinicians. This “automation” was simultaneously expected to introduce standards that could facilitate reuse. Yet, the practical ethical concerns experienced by healthcare staff were not adequately addressed via the suggested automation practices. To explain why, we now turn to the empirical analysis where we outline the five types of data work we identified and the frictions they entailed. The data work relates to the production, completion, validation, sorting, and recontextualization of data. We feature examples from the introduction of the EPIC system because of the high political expectations of, and the vast investments into, this IT solution.

Production: data work proliferates through parallel registrations

The first observation we made across our ethnographic studies was that the repurposing of clinical data, via new software, increases the time that healthcare staff spend on data work. IT systems that prioritize features for data repurposing often come with complicated procedures for data registration, and moreover, an increasing amount of data are to be registered. Yet, despite more data being registered, the systems do not always make it easier for healthcare staff to find the information needed for clinical care, because the system is not necessarily designed with the “primary” user in mind. We observed how parallel documentation systems tended to emerge, when new user interfaces contained too little or too much information, making it difficult for healthcare professionals to get an overview of the patient’s current health issues. For example, Wadmann observed how nurses in a psychiatric center not only registered the results of electroconvulsive therapy in the IT system (Sundhedsplatformen) but also printed and displayed the graphs on a paper card for each patient. Because the software platform did not provide a chronological overview of the patients’ responses to a series of treatments, the health professionals continued with their own analogue solution in parallel to the digital platform. Other examples of what the healthcare staff termed “handheld data” were manually registered systems of information needed to keep an overview of treatment capacity and patient transfers. Although these types of data are central to the workflow of the clinics, the IT system did not provide an accurate overview of data to inform clinical decisions. Despite being more centralized, Sundhedsplatformen provided a more fragmented picture for the clinical user.

Although parallel data work is often necessary for clinical purposes to fix a problem introduced by a new infrastructure, this work remains largely invisible for data users outside the clinical setting. They see the data they request, not the work it takes to produce handheld registrations. The observation of Wadmann in the psychiatric center resonates with experiences of clinicians in other psychiatric units, who also reported on difficulties of retrieving clinically relevant information and a substantial increase in data work after the implementation of Sundhedsplatformen (Overlægerådet i Region Hovedstadens Psykiatri, 2018). This type of “frictional cost” associated with implementing the EPIC system was substantial also beyond psychiatric care. It should here be mentioned that some of the challenges may relate to the lack of knowledge among some users about functions supported by the system, and that improvements to Sundhedsplatformen have been continuously made to optimize clinical functions. However, the need for additional training in how to report and use data in new IT-systems, as well as the development of and training in additional functions to meet user needs, can also be considered a type of data work that implies a “production cost”. In this case, according to a national audit, the implementation of the EPIC system increased data work and reportedly led to a decline in productivity, concerns about patient safety, and staff burnout (Rigsrevisionen 2018; Bentzon and Rosenberg 2021).

Completion: extending data work to support research and administration

Non-clinical purposes of data use, like research and administration, are often more dependent on data completeness than clinical work. A second form of data work observed in the clinical sites was therefore related to data completion. Clinical research has always depended on data. However, the increasing use of health data as assets to attract investments (Vezyridis and Timmons 2021) means that data production in clinical settings is taken to another level. Completeness of health datasets become an end in itself, because it is considered as a resource for economic growth via pharmaceutical investments and innovation. Danish cancer treatment trials exemplify the attempt to brand Danish clinics as the ideal sites for investments by the life science industry. In return for sponsoring clinical trials, companies gain access to very detailed, high-quality datasets. This involves extensive questionnaires, repeated testing and sampling with increased precision, as well as reporting according to the standards relevant for research. This granularity and complexity of data go beyond traditional clinical trials. We illustrate this through observations by Hillersdal, who studied data work in a cancer clinic that specializes in early phase-one drug trials of targeted treatments.

Patients enrolled in one of the approximately 150 open trial protocols were meticulously monitored, such as through electrocardiograms (ECG), blood samples and the registration of performance status and symptoms. External data monitors were hired by the pharmaceutical company to control data quality while the generation of clinical data was undertaken by clinical research nurses in the unit. In interviews, the physicians and nurses in the clinical unit emphasized that a substantial amount of their time is spent on delivering complete data to qualify the unit for future trials. “Complete” datasets are crucial for securing upcoming “slots” in the competitive market of investments in cancer trials by big pharma corporations. This can give Danish cancer treatments new experimental treatment options. But this strategy also has a cost. The work to complete the datasets for research purposes was experienced as a drain on resources for both health personnel and the cancer patients. Clinical nurses spent considerable time translating the trial protocol into a clinical “work sheet” and retrieving data or test results that were required to fulfill the demands of the research protocol. Physicians commented on the drain of resources in response to regulatory demands for standardization procedures and for more and repeated testing on an increasingly narrow patient population. The physicians stated that the increasing resource requirements for running the clinical research trial meant that the unit could treat fewer patients at a time, that is, that fewer terminal cancer patients could be offered a place in the experimental treatment protocol. Moreover, the physicians found it ethically challenging to expose these patients to the increasing test demands, and to underline the workload, trail participation was presented to patients as a part-time job! The example highlights how the ambition of the so-called “reciprocal” use of health data to attract research investments involves a substantial increase in data work, imposed not only on healthcare staff but also on patients. If this data work is not accounted for, the branding of health data can put additional pressure on clinical units. This is particularly the case in a context of high-speed global competition to attract commercial investments, and where the clinic must adapt to the industrial research agenda (Hillersdal and Svendsen 2022).

In the handheld data production discussed in the previous section, additional data work is required to support clinical tasks, because some functions of the IT system do not prioritize the needs of clinical users. Data completion involves additional reporting in existing systems because the use of health data for secondary purposes comes with a call for more data and of higher quality. Our example illustrates how what counts as “improved data quality” is dependent on the context of the user, as complete datasets are not always clinically relevant. The demand for data completion may be particularly evident in clinical research units. Nonetheless, it is a common experience that additional data are required to ensure completeness when health data are needed for non-clinical purposes, including also quality assessment and administration (Petersen 2019). For example, standardized IT systems for electronic health records often require the registration of vital signs (body temperature, blood pressure, pulse), because “completeness” of data is considered important for comparative data analysis. In the Danish electronic health record system, it is possible for health personnel to choose the “not relevant” option if such measurements are not deemed clinically relevant. Nevertheless, it still takes time to fill in all the mandatory entries. When data work consumes considerable time and energy, without adding value to what clinicians see as their primary work, data work can be experienced as “meaningless” (Hoeyer and Wadmann 2020). This “frictional cost” can become an ethical challenge when data work needs to be prioritized over clinically relevant tasks.

Validation: data work proliferates to ensure authentication

By data validation we refer to data work meant to ensure that the right data are reported in the relevant places for both clinical and non-clinical purposes. Data validation has always been part of record keeping in healthcare systems, but the increasing complexity of IT-infrastructures and additional uses of data also increase this type of data work. While new IT solutions often come with automation functions intended to reduce the need for manual data work, we have observed how automation can also generate new tasks of data validation.

Sundhedsplatformen offers auto-generated text and suggestions for data to be included in medical notes and records, for instance test results retrieved from other parts of the health record. This type of automation is intended to help clinicians include relevant data in the electronic health records without having to search for the information elsewhere. However, the time saved on retrieving the relevant data is often countered by a need to validate the automatically retrieved data. A physician in a psychiatric unit explained that it required extra work to find out where the data came from and whether they were relevant and valid for the specific patient encounter. The physician referred to the auto-generated data as “noise” because their clinical relevance could be questionable, or it was unclear whether the data were up to date. In such cases, the physician had to spend additional time to find out where the data originated from, when the data were registered, and judge their relevance for the particular patient. Thus, the implementation of an automation strategy intended to reduce manual data work instead created a need for additional tasks of data validation. These issues also have led to concerns about patient safety, as discussed in several articles in a special theme on Sundhedsplatformen in the journal of the Danish Medical Association.Footnote 2

The need for data validation also arose due to the redistribution of data work. To ensure “real-time data,” new IT systems are often designed to foster “direct registration” by health professionals during or immediately after patient contacts. In Denmark, this was also the case for Sundhedsplatformen. While medical secretaries previously had the tasks of transcribing dictated recordings, making requisitions, entering disease-specific codes, and making the bookings necessary for patient transfers, these tasks were redistributed to physicians. This change in the distribution of data work was envisioned to minimize delays in data registration and reduce the risk of errors, while also making the data work of medical secretaries obsolete. For political-administrative decision-makers, the possibility of removing the “extra layer” of data work performed by secretaries was part of the business case of the IT-investment, and the costs of implementing Sundhedsplatformen were expected to be paid off over 10 years due to increased productivity (Drachman and Davidsen-Nielsen 2018). Hundreds of medical secretaries were now officially made redundant, corresponding to about two percent of the total number of employees in the hospital sector (Bentzon and Rosenberg 2021). However, the redistribution of the data work from secretaries to physicians was not as seamless as envisioned.

Reporting errors grew as the physicians did not have the time required for careful reporting and identification of missing information, nor the administrative expertise required to code data correctly, for instance to link diagnostic codes to reimbursement codes.Footnote 3 As the financial consequences for the hospitals became clear (e.g., due to missing reimbursement), a re-hiring process of medical secretaries began. But the function of the secretaries changed: from a role as main data producers, the secretaries now had to verify the physicians’ data production. A secretary in a psychiatry unit described her new role as a “controller-function” to emphasize her primary task of data validation. Yet, she also commented that secretaries often took on administrative tasks assigned to the physicians (e.g., sending referrals or adding reimbursement codes), because physicians struggled to use the new system. Indeed, hospital administrators have recently highlighted that the need for employees to register and manage data is even higher than before the implementation of Sundhedsplatformen (Tiirikainen and Rasmussen 2021).Footnote 4

Sorting: data work proliferates to make data findable

The policy goal of repurposing health data comes with demands for the registration of increasing amounts of data, but also with suggestions for how to minimize data work through IT systems offering automation and easier access by multiple users. Integrated IT systems are intended to provide health personnel with the ability to access all data on a specific patient from one entry point. For example, Sundhedsplatformen was intended to provide more continuity in data access through a single entry, instead of having to log on several times to multiple systems. Yet, automation functions and access to more data now came with the trade-off of more data work related to sorting information.

Several of the physicians interviewed explained how Sundhedsplatformen provided a “tangle of notes” and resulted in “data overload.” A chief physician described the data as “unfiltered and unstructured” and explained that it took additional time to sort and find the relevant data to support clinical decision-making. The need for sorting arose when the complexity of user interfaces made it difficult for healthcare staff to find the relevant information. Moreover, some of the automation functions designed to ensure data completeness meant that healthcare staff were often presented with a volume of data that exceeded their needs. Ironically, this challenge was brought about by automation strategies that were envisioned to ease data reporting and improve information transfer across health units. For example, templates or “smart text” consisting of standard phrases (inserted via a shortcut key) were to be used in referral situations to ease data registration and reduce the loss of information across providers, such as between primary and secondary care. However, as the volume of data in referral letters increased, it took more and more time to get a “quick” overview of the patient’s current medical condition. For GPs to cope with the vast amount of data in referral letters from hospitals, a new algorithm had to be developed to highlight only the data of relevance to the GPs (Allen 2019). It is telling how this sorting algorithm had to be introduced to cope with the data overload produced by another automation strategy originally intended to reduce the need for data work.

The integration of multiple IT systems and the implementation of automation functions are envisioned to make more data available for clinical decision making, as well as for secondary users. However, automated data sharing and the increasing use of smart phrases and copy-paste functions have also been associated with a risk of note bloat, that is, user interfaces ending up containing too much (clinically irrelevant) information while the essential information gets buried in the details (Weis and Levy 2014; Wang et al. 2017; From et al. 2019). Thus, while automated solutions can save time on data work related to data production and transfer, they also risk generating new types of data work related to the sorting of information.

Recontextualization: the expertise needed to interpret health data

The political aim of repurposing health data presupposes that data can be analyzed, disseminated, and interpreted for use in new contexts. Leonelli (2014, 2016) has emphasized how the reuse of biological research data via large databases requires decontextualization and recontextualization of data. These processes involve data work such as reformatting data to comply with standardized annotation to minimize differences in data collected at different sites (decontextualization), as well as the compiling of additional information about a given dataset (metadata). These processes in turn enable the repurposing of data as evidence in different contexts (recontextualization). We include recontextualization as our fifth type of data work to highlight how medical expertise, contextual experience, and clinical resources are needed to ensure robust interpretation of health data beyond the original site of production.

From interviews, we have learned about the types of local expertise it takes to integrate and interpret seemingly simple data. Even something as straightforward as integrating test results from the measurement of blood cholesterol or blood pressure depends on clinical knowledge and awareness of local and historical contexts. Often, health data are not registered using the same digits or measured via the same instruments. Blood cholesterol has for instance been reported by some laboratories as being above or below a specific guideline level, rather than in absolute numbers, and guidelines for what is considered “normal” or “high” have changed over time. Similarly, disease and risk classification guidelines are continuously updated and changed, and diagnostic criteria for the diagnosis of, e.g., heart failure, type 2 diabetes, and hypertension, have changed over time (see also Ellingsen and Monteiro 2003).

Further, a clear illustration of the need for recontextualization arose in relation to Holt’s study of quality control in Danish hospitals. To prevent and control hospital-acquired infections, thereby both improving the cost-effectiveness and quality of care, and reducing patients’ suffering, a national automated incidence monitoring system was launched in 2015. The database called HAIBA (Hospital Acquired Infections dataBAse) is accessible online and exemplifies the political vision of ensuring greater transparency in quality improvement and patient safety for patients as well as professionals. Producing data on hospital-acquired infections was intended to document treatment trajectories for specific patients, as well as to monitor developments within specific units. However, administrators have also seen potentials for using HAIBA to compare the performance of different clinical units—a kind of benchmarking.

While infection monitoring may seem like a straightforward way to repurpose already available health data, Holt’s fieldwork reveals a more complicated picture. A physician specializing in clinical microbiology highlighted the risks of uncritically interpreting aggregated data without proper insight into how the data were produced. Reading a report that incorporated HAIBA data to evaluate infection control in different hospitals, the physician was surprised that a particular kind of infection seemed to be on the rise in his region. This was indicated with an alarming red arrow in the report, and the result was followed by political calls for immediate action to bring the numbers down. Surprised by the dramatic increase in infections, he decided to conduct his own analysis of the data. In this process, the physician discovered that data from the hospital he was employed at stood out with a rapid increase in infection rates. He noticed that the dates of the documented peaks were associated with two important changes in testing procedure and capacity at his hospital. During this period, the hospital had introduced a more sensitive testing procedure and increased the number of total tests, because they had taken over test analysis previously done by another lab. Without this contextual knowledge, however, the data misleadingly signaled that hospital-acquired infections were out of control.

It may be argued that the inclusion of metadata about changes in testing procedures could potentially have avoided the misunderstanding in our example—and thus the resources needed to discuss calls for action and the subsequent data work undertaken by the physician to question the findings in the report. If this is indeed the case, it would, however, only underscore the point that the resources required for repurposing of health data are substantial, as this typically requires the production of metadata—a cost that is often not accounted for in rationalized models of the benefits of reusing health data. Moreover, the availability of high-quality metadata is not always sufficient to ensure robust recontextualization of data. In this specific case, the physician was generally skeptical of the ability of non-clinical users to make sense of clinical data and emphasized the multitude of contextual factors hidden in health data: “There are so many parameters, and we only have to change a few for the numbers to change. It is therefore very difficult to say if this even reflects the underlying reality.” If these challenges arise in the context of infection prevention and control, one should not underestimate the resourced and contextual expertise required to recontextualize more complex health data.

Discussion: foreseeing the unintended consequences of hopeful policies

The phenomenon that IT-technologies that are intended to increase productivity instead result in a productivity decline is not rare, nor unique to IT-systems in healthcare. What is sometimes referred to as the Solow Paradox, or the Productivity Paradox, refers to an observation made already in 1987 by the economist Robert Solow that “You can see the computer age everywhere but in the productivity statistics” (Solow 1987, p. 36). Similarly, we have commented on how the IT-system Sundhedsplatformen was intended to improve clinical information “flows” and “productivity” but was often experienced by healthcare staff as time consuming. Importantly, our aim is not to be critical of digitalization as such, nor to dismiss the possibility that an increase in data work can be justified. Indeed, our informants recognize many benefits of digitalizing and integration of patient data, such as providing evidence-based strategies to improve quality of care and more cost-effective administration of healthcare systems for the future. The primary concern is rather the political expectation that IT-systems to facilitate multiple uses of data will be seamless and that data are already there—ready to be reused. We find that the documented challenges pose ethical concerns.

Some of the challenges we describe can be interpreted as trade-offs in the clinical usability of health data infrastructures designed to prioritize data repurposing (Hoeyer 2023). When optimizing the formatting and integration of data for one purpose, it often results in data friction elsewhere in the system. We have identified five types of data work related to the production, completion, validation, sorting, and recontextualization of health data. This typology partly overlaps with other studies of data work, some of which also examine data-related tasks conducted by patients and specialized data managers (Bossen et al. 2019; Fiske et al. 2019; Pine and Bossen 2020; Torenholt et al. 2020). Our ambition is not to establish an exhaustive list of types of data work, but rather to encourage more discussion of the consequences of the political aims of repurposing. We propose that the proliferation of data work in clinical practice is addressed not only as a practical problem but also as an ethical challenge, because it involves trade-offs in terms of prioritization of clinical resources, including the time spent with patients versus data documentation. Failing to acknowledge tradeoffs and the need to make priorities can have important consequences for patients and health professionals, such as reduced resources for patient care and occupational burnout among health professionals (Rigsrevisionen 2018; Downing et al. 2018).

One type of trade-off concerns divergence in what different users may view as “good data.” What counts as good data for health professionals and secondary users can differ, as seen in the sections on Completion and Sorting of data. Commenting on the introduction and widespread use of the Epic-system in the US, medical doctor Gewande (2018) criticizes that design choices are more politically than clinically motivated. He argues that doctors and administrators have different views on what functions and information should be prioritized. What is relevant for audit or research is not always clinically relevant, and vice versa (see also Hoeyer and Wadmann 2020). Moreover, the standardization required for data integration and reuse sometimes conflicts with the local needs of health professionals, such as when flexibility is required to account for the iterative and temporal aspects of disease diagnostics (Winthereik 2003) or information to account for the specific patient’s narrative (Hunt et al. 2017; Wachter 2017). New IT solutions, including the Epic system examined in this paper, often place constraints on free text spaces, because non-standardized terminology is not straightforwardly machine-readable and therefore conflicts with the aim of data repurposing (Pine and Bossen 2020). We must therefore acknowledge the trade-offs documented above and in other studies reporting how the use of predefined default options can negatively affect qualitative aspects of patient care, such as the inclusion of relevant information concerning the specific circumstances of the individual patient (Fogelberg et al. 2009; Petrovskaye et al. 2009; Robichaux 2019; Siegler and Adelman 2009).

Non-standardized information can also be essential for the reuse of data, as seen in the section on Recontextualization, because it is often required to validate structured data entries and to avoid misinterpretation when data are analyzed outside the context of data production (see also Schmidt et al. 2019; Weiskopf and Weng 2012). Health data cannot be interpreted without “human input to recontextualize knowledge” (Greenhalgh et al. 2009, p. 729), as the meaning and accuracy of data need to be understood in relation to the specific circumstances of production and use. That data gain meaning only when understood in their context of production is by no means a new insight nor unique to medicine (Latour and Woolgar 1979). Even seemingly standardized data, such as genomic data in biobanks, require a “learned intermediary” to become recontextualized in new settings (Reardon 2017, p. 135). It therefore takes additional work, medical expertise, and contextual knowledge to package health data for reuse (Leonelli 2016). With the COVID-19 pandemic, challenges of recontextualization became vividly clear, even to the public, through discussions of the limitations in the comparability of data from countries with different testing procedures, age distributions, containment measures, and levels of public trust (COVID-19 National Preparedness Collaborators 2022). Still, the resources needed for proper probing of health data are rarely addressed in political reports.

The policy of multiplication of purposes leads to multiplication of data work, which affects clinical practice. An evaluation of the benefits of data repurposing must therefore also include a consideration of the costs. While digitalization and data repurposing may be particularly comprehensive in Denmark, the issues of resource-demanding data work are not confined to this setting. Similar problems of fragmented patient information and needs for double registration and sorting of information have been described in other contexts (Ellingsen and Monteiro 2003; Sheikh et al. 2011; Gewande 2018; Pine and Bossen 2020), and many studies report on how the introduction of new IT systems to facilitate the reuse of health data increases the time spent by healthcare staff on data registration (e.g., Morrison et al. 2013; Kuhn et al. 2015; Downing et al. 2018; McVey et al. 2021). Nevertheless, the image of seamless data integration and repurposing keeps flourishing in policy reports. Though it is increasingly acknowledged by policy makers that repurposing requires resources for data curation, they typically focus on data work conducted at data repositories, thus leaving the data work conducted in clinical settings strikingly “invisible” (e.g., European Commission 2019, 2020c). What is more, some strategies envision how data curation at repositories can be minimized via the implementation of more comprehensive standardization strategies. An example is the proposal to develop pan-European standards for health data to facilitate easier data integration in the planned European Health Data Space (European Commission 2020c). Following this harmonization logic, standardized reporting is built into the infrastructure, and data do not need to be transformed or travel to be reused. However, given the challenges we have outlined, this strategy would have significant impact on data reporting in the clinic and is likely to involve substantial costs and trade-offs for the primary users in clinical settings.

We hope that our examples, and descriptions of the different types of data work, can help create awareness about possible trade-offs and resource demands to be considered in future analyses and business cases when developing strategies and infrastructures for digital healthcare systems. As a minimum, the requirement of skilled data work in clinical settings and counselling in reinterpretation must be considered in political strategies as a foreseeable cost. With this focus, we emphasize that the ethical questions to consider in relation to data repurposing should be expanded beyond the important issues of privacy, autonomy, and risk to include also issues of prioritization. Social science and medical humanities have an important role in making it possible for policy makers to balance costs and gains in a careful manner: only when the invisible work has been made visible—and brought into focus—can the costs be acknowledged and dealt with. This should be a key task for a practical ethics.

Conclusion

Political ambitions of data repurposing currently pull medicine in many different directions, because the benefits of integrated information technology come with costs in terms of extra data work and trade-offs in usability for some users. If the current modus operandi in healthcare digitalization ignores the need for data work in clinical settings, attention may be shifted away from patient needs and the validation of data may be undermined. We therefore propose that the trade-offs related to clinical data work should take a more prominent space in the ethical and political debates about the repurposing of health data. From the perspective of practical or empirical ethics (Hoffmaster 1992; Pols 2015), it is necessary to move close to the actual work practices and articulate the dilemmas at hand. In this article, we have pointed to examples where attempts to repurpose data drain resources from clinical care, where administrative needs consume clinical resources, and where the automation strategies can undermine data quality and validity. The analysis also suggests that the challenges do not stem from lack of investment, but rather from lack of acknowledgement of existing practices of data use in the clinic. If data integration is not just a means to enhanced efficiency, the critical question for policy makers is: which purposes should take priority? Depending on how different user needs are prioritized, there is a risk that secondary uses overrule primary ones: when and on which grounds can this be justified? Ambitions to repurpose health data raise fundamental questions about what counts as relevant information, for whom, and why. We therefore encourage a practically engaged form of ethics that can engage how to prioritize user needs and healthcare resources.