Introduction

Modern cardiovascular research has been associated with a rapid increase in the volume of data obtained by cardiovascular researchers as data are collected at ever finer levels of structural and functional complexity. At the same time, governments, funders, and journals have begun to encourage or require the sharing of research data in a findable, accessible, interoperable, and reusable (‘FAIR’) way [1,2,3,4,5].

However, there are significant barriers to effective and responsible research data management (RDM) in cardiovascular science that must be overcome. First, there is a lack of standardization in how data and metadata are collected, processed, and shared. Second, researchers often lack sufficient time, funding, or incentives to share their data. Third, the volume and complexity of data being collected makes the identification of meaningful and actionable data increasingly difficult. Finally, there are complex ethical and legal aspects of sharing data that researchers should understand, for example when dealing with sensitive data or sharing data across the borders of the European Union. Addressing these challenges will make it easier for scientists to use and understand their own data and the data of others.

The 3rd Joint German Cardiac Society (DGK) and German Centre for Cardiovascular Research (DZHK) Translational Workshop was held in Bonn, Germany in September 2022 to discuss the challenges and potential solutions associated with RDM in cardiovascular research; the topics, opinions, and findings discussed during the workshop are presented here. This position paper executive summary, written on behalf of and endorsed by the DGK and DZHK, identifies and describes challenges that scientists and clinicians currently face when collecting, using, and reusing data in the field of cardiovascular research and beyond. It then provides recommendations for improvements in RDM practices, informed by standardization efforts and guidelines from related domains. The full position paper can be found in the supplementary materials.

Data sharing and metadata

Effective RDM involves every step of the data lifecycle (Fig. 1) [6]. When planning a study, we recommend that researchers formulate a data management plan that considers how data and metadata will be collected, stored, annotated, analyzed, and shared [7,8,9,10]. Sharing data increases its impact [11] and allows researchers, peer-reviewers, and journals to understand exactly how work was carried out and accurately assess its validity even long after the conclusion of a study [10].

Fig. 1
figure 1

The data and metadata lifecycle. PID permanent identifier, QC quality control

Metadata (‘data about data’) provide essential information on the context, quality, structure, and condition of data [10, 12, 13]. Metadata and data should be linked together via unique and persistent identifiers and using standards such as FAIR digital objects (https://fairdo.org/) and the Research Object Crate (RO-Crate; https://www.researchobject.org) format [14], which aim to provide a mechanism to not only link data and metadata but also the associated analysis workflows, software, protocols, publications, presentations, and licensing information. Although metadata documentation is ultimately the responsibility of the investigators who collect the data, researchers may lack the time or expertise to generate good-quality metadata [13]. Training and education are needed, and the adoption or adaptation of community standards, such as the Recommended Metadata for Biological Images (REMBI) [15] and Investigation Study Assay (ISA) framework (https://isa-tools.org/) [16], is recommended to ensure consistent use of metadata. Where community standards do not yet exist, research teams are encouraged to self-organize and create common reporting formats for data and metadata, and to document them publicly [17]. Research teams may also enlist the services of specialist ‘data stewards’ to assist with data management, including metadata documentation; these roles require funds that are usually not available, however.

The FAIR principles

The FAIR principles for scientific data management and stewardship provide comprehensive and practical guidelines for ensuring data and metadata are FAIR (http://www.go-fair.org/fair-principles) [18]. Funders, journals, and policymakers are increasingly requiring the implementation of the FAIR principles for all research data and other related digital objects. While the FAIR principles provide a framework, it remains the responsibility of researchers to decide how they will ensure their data and metadata are FAIR. Fortunately, there are materials that provide guidance on implementing the FAIR principles at the project, group, and institutional levels (Table 1). However, to ensure widespread adoption of the FAIR principles, the threshold for implementation needs to be reduced substantially, such as with simple-to-use tools for easy data deposition and access via dedicated repositories.

Table 1 Examples of standards and tools that researchers may use to make their data FAIR

Data sharing and FAIR research to reduce the costs of scientific research

The financial costs of non-FAIR data can be quantified: a study by the European Commission estimated that non-FAIR research data costs €10.2 billion per year in Europe, with an additional estimated €16 billion impact on innovation; it is worth mentioning that these figures do not include the non-quantifiable benefits of making data FAIR [19]. Making data FAIR will increase the value of the data obtained, potentially accelerate progress in the improvement of therapeutics and diagnostics, and maximize the return on investment for funders.

Research data management challenges in cardiovascular science

Sharing data via data repositories and encouraging data sharing

Data repositories are centralized storage spaces where datasets can be deposited for access and reuse by other users (although authorisation and authentication may be required). There are several thousand data repositories, ranging from generalist repositories to specialist repositories for specific kinds of data. At minimum, repositories should automatically provide a globally unique and persistent identifier to every element of each dataset [20]. Repositories should also require deposition of sufficient metadata to allow other users to understand, process, and compare the data in a meaningful way. Services such as re3data (https://www.re3data.org) or FAIRsharing (http://www.fairsharing.org, [16]) provide a means to discover relevant repositories that meet FAIR requirements.

If data repositories are to realize their potential, it will be necessary to further encourage contribution as outlined below. Publications that are linked to the underlying research data are already cited more often [11]. Researchers and journals should ensure that data are credited or cited wherever they are used or reused, for example by crediting the investigators who collected the original data and citing the original dataset [21,22,23]. Researchers who share well-annotated datasets via repositories should be recognized and rewarded by funding bodies and universities in a suitable manner [22, 23]. Citing data sources would allow some academic recognition and reward for data sharing, and help researchers satisfy funding obligations to share their data [24]. Researchers should work with their community to define expectations for the management, sharing, and reuse of data and associated metadata, for example by joining an already existing initiative such as one of the many working and interest groups within the Research Data Alliance (https://rd-alliance.org/).

Managing data heterogeneity: standardization and harmonization

Sharing and combining datasets can be challenging due to the heterogeneity of the data involved, particularly if data are obtained using team-specific protocols and with limited standardization across laboratories. Greater standardization of terminology and better adherence to existing standards is needed across cardiovascular science. Standardized collection, processing, quality assessment, and analysis pipelines are also needed to ensure interoperability and comparability of data. In evolving fields, there may be a need to develop and adopt community-wide standards for the collection and preservation of data and metadata ‘on-the-job’. Community reporting guidelines (or minimum information standards) that describe how to report everything from sample quality to the data processing protocols used can facilitate data sharing, streamline workflows, and allow for the long-term preservation of and access to information [17, 25, 26]. Where reporting standards do not exist, research communities can self-organize and create community-centric reporting formats for data and metadata [17]. Large-scale collaborative initiatives like the National Sleep Research Resource (http://www.sleepdata.org) and the UK Biobank (https://www.ukbiobank.ac.uk) have shown it is possible for researchers to organize and collaborate on the collection and sharing of large volumes of health data for their mutual benefit, despite the challenges. Although it is important to recognize that there is unlikely to be a ‘one-size-fits-all’ solution, these initiatives may provide a model for similar efforts in cardiovascular science.

Identifying meaningful, actionable data

Given the volume of data that can be collected with modern high-throughput techniques and novel technologies like mobile health devices, it is important that researchers focus on data that are meaningful (e.g., relevant to the disease being studied) and actionable (e.g., useful for answering a specific research question or to inform a specific treatment decision). Better guidance from manufacturers is needed to ensure researchers and clinicians can effectively use the most appropriate available technologies, better identify useful meaningful and actionable data, and improve treatment decision-making and risk assessment. Collaborations between scientific researchers, healthcare providers, manufacturers, software developers, and insurance companies may provide an opportunity to influence and guide the development of new technology to improve the quality and utility of data collected.

Managing sensitive data

Sharing sensitive data is rightfully strictly regulated, but levels of regulation differ internationally [27, 28]. Education is needed to ensure that researchers understand when and how data may be shared, what researchers need to do to ensure that they are in compliance with applicable laws (e.g., GDPR), and what technology is available for secure data sharing.

In research involving patients, consent must be managed and documented appropriately. Patients often support sharing of their data if it will improve diagnostic and therapeutic options [29, 30], but lack of information on the exact parameters of consent may prevent reuse of data where consent exists but is not easily traceable. Digitization and automation of (remote) patient consent is increasing and may help to improve access to samples and data as details of patient consent can be more efficiently traced.

Who is responsible for RDM in cardiovascular science?

Researchers are ultimately responsible for ensuring that research data are suitably managed and shared according to the FAIR principles, including ensuring that data are adequately documented with metadata and made available for reuse as appropriate.

Journals and publishers should require authors to include links to all relevant raw or processed data, metadata, and other relevant materials in their submissions when publicly available. Where data are not publicly available, data sharing statements should indicate how the data can be accessed or requested, and authors or organizations should be expected to make data available upon request. Journals should also take greater responsibility for confirming that submitted materials include working links to the raw data, metadata, and other relevant materials, ideally via a persistent identifier (although the ultimate responsibility will continue to lie with authors to provide working links at submission).

Universities, funders, and government bodies should recognize and reward the collection and sharing of data in the same way that they recognize and reward publication activity. They should also support effective RDM via education and training programs, by defining and implementing data sharing policies, by employing data stewards, and by providing sufficient long-term funds for data storage and sharing. They should also establish data management training modules at the graduate level, with more advanced training at post-doctoral levels. In Germany, there is a chance to accelerate progress in the education of physicians in terms of the collection, use, quality assessment, and analysis of data via updates to the Approbationsordnung. Institutions, publishers, and state and national government bodies should commit to improving and future-proofing digital infrastructures for data storage and sharing, including funding for relevant personnel. Researchers, institutions, and journals should work together to develop low-threshold tools for data and metadata sharing during data acquisition (electronic records), processing (automated metadata annotation), and publication (low-level access to key data, such as contained in figures, via a ‘data container’). Coordination of RDM practices remains a challenge; currently, the German Research Foundation encourages RDM policy development by each ‘network grant’. Whether this is the most effective way forward remains to be seen, as parallel work, at times even within one university or faculty, would seem counterproductive. While the legal hurdles that need to be overcome when sharing data between, for example, EU- and non-EU-based research teams may be alleviated with technical solutions, a definitive solution will require the involvement of national funding bodies and governmental entities.

Finally, the cardiovascular research community should work to make sharing raw data and metadata the norm at all levels via the creation or adoption of cardiovascular reporting guidelines (an example of a well-intentioned and broadly endorsed—yet under-utilized—reporting guideline is MICEE [31]). As much cardiovascular research data and accompanying metadata should be made available via public repositories as possible to ensure the long-term and sustainable storage and reuse of data.

Conclusion

Data reuse should be factored in at every stage of scientific research, and researchers should foster a culture of open, FAIR science, through sharing good-quality, well-annotated data and metadata in repositories, defining and following agreed-upon standards, crediting and linking to the data of others, and publishing negative results. Community-driven standardization and harmonization at all stages of the data lifecycle is needed to reduce the heterogeneity of data and ensure good data quality. However, it is important to recognize that there is unlikely to be a one-size-fits-all solution for effective RDM in cardiovascular science, and the development, adoption, and application of RDM practices will require careful consideration at all levels and in all areas of cardiovascular research and should be part of the new Ärztliche Approbationsordnung. Standards should be considered living documents that need to be regularly adapted to new technologies or methods. Education, training, and funding are essential for widespread and enduring adoption of effective RDM.

It is not enough to simply recognize the importance of responsible and effective RDM: it must be put into practice. The authors encourage their professional societies and research organizations (including the DGK and DZHK), as well as funding and regulatory bodies, to spearhead a number of initiatives, including: (1) supporting initiatives and/or lobbying national funding bodies to aid a more concerted effort to develop relevant RDM processes and tools and FAIR data sharing approaches across the life sciences, including the development of and adherence to minimum reporting guidelines; (2) developing a generic (PDF- or HTML-compatible), pragmatic (focusing on data used to create figures in peer-reviewed publications), low-threshold (data container) tool to make a first but definitive step to data sharing that is independent of the research subject, methods used, and level of investigation involved; and (3) pushing for generalized ‘point-of-entry’ consenting of patients for the use of their data and any biological materials acquired in the process of diagnostic or therapeutic interventions that would otherwise be discarded, and probing the ethical acceptability of discarding healthy human donor tissue (the biological reference) that may not be used clinically (which must remain the primary aim of all donor organ utilization).