1 Introduction

The use of real-world data (RWD) and real-world evidence (RWE) derived from RWD has seen wide adopted by pharmaceutical developers and a variety of decision makers including doctors, payers, health technology authorities and regulatory agencies (Berger et al. 2015; Berger et al. 2017; Schneeweiss et al. 2016; Berger and Crown 2022; Daniel et al. 2018; Zou et al. 2021). Credible RWE can be created from good quality RWD from routine practice when investigated within well-designed and well-executed research studies (Schneeweiss et al. 2016; Berger and Crown 2022). Adoption and use of RWD is complicated by concerns regarding whether particular sources of RWD are of “good quality” and “fit-for-purpose”. These concerns have become more urgent as regulatory agencies are increasingly using RWD as external comparators for single-arm clinical trials and are exploring whether non-interventional RWD studies can provide substantial supplementary evidence of treatment effectiveness. While the recent emphasis on data quality (DQ) has focused on the use of RWD for assessing disease burdens and treatment effectiveness, evaluation of DQ and fitness-for-purpose is also required for safety studies. However, expanding the use of RWE in safety evaluation will probably require data sources beyond administrative claims (Dal Pan 2022).

The US Food and Drug Administration’s (FDA) guidance, “Assessing Electronic Health Record and Medical Claims Data to Support Regulatory Decision Making,” states that for all study designs, it is important to ensure the reliability and relevance of data used to help support a regulatory decision (FDA 2021a). Reliability included data accuracy, completeness, provenance, and traceability; relevance includes key data elements (exposures, outcomes, covariates) and a sufficient number of representative patients for the study. The FDA guidance “Considerations for the Use of Real-World Data and Real-World Evidence to Support Regulatory Decision-Making for Drug and Biological Products” (FDA 2023) emphasizes the need for early consultation with the FDA to ensure the acceptability of study design and analytic plans. With respect to data sources, it states that feasibility of data access is critical: “such evaluations of data sources or databases for feasibility purposes serve as a way for the sponsor and FDA to (1) assess if the data source or database is fit for use to address the research question being posed and (2) estimate the statistical precision of a potential study without evaluating outcomes for treatment arms.”

The European Medicines Agencies (EMA) in the European Union (EU) has also issued a draft “Data Quality Framework for EU medicines regulation” (EMA 2022). It defines DQ as fitness for purpose to user needs in relation to health research, policy making, and regulation and the data reflect the reality which they aim to represent (TEHDS EU 2022). It divides the determinants of DQ into foundational, intrinsic, and question specific categories. Foundational determinants are those that pertain to the processes and systems through which data are generated, collected and made available. Intrinsic determinants pertain to aspects that are inherent to a specific dataset. Question specific determinants pertain to aspects of DQ that cannot be defined independent of a specific question. It also distinguishes three levels of granularity of DQ: value level, column level, and dataset level. The dimensions, including subdimensions, and metrics of DQ are divided into the following categories: reliability, extensiveness, coherence, timeliness, and relevance.

  • Reliability (precision, accuracy, and plausibility) evaluates the degree to which the data correspond to reality.

  • Extensiveness (completeness and coverage) evaluates whether the data are sufficient for a particular study.

  • Coherence examines the extent to which different parts of a dataset are consistent in the representation and meaning. This dimension is subdivided into format coherence, structural coherence, semantic coherence, uniqueness, conformance, and validity.

  • Timeliness is defined as the availability of data at the right time for regulatory decision making.

  • Relevance is defined as the extent to which a dataset presents the elements required to answer a research question.

TransCelerate has issued a simpler framework entitled “Real-World Data Audit Considerations” that is divided into pillars of relevance, accrual, provenance, completeness, and accuracy (TransCelerate 2022). These frameworks are part of an ongoing dialogue among stakeholders from which international standards for RWD will eventually emerge.

There are a number of efforts to assess the utility of real-world data sources for a variety of purposes and settings. For example, Observational Health Data Sciences and Informatics (ODHSI) has open source tools, such as ACHILLES and Data Quality Dashboard, to be leveraged in the development of the DARWIN (Data Analytics and Real-World Interrogation Network) database in the EU. Other efforts have focused on the quality of prospective registries, which have different issues for DQ compared with the re-use of existing data sources including the Registry Evaluation and Quality Standards Tool (REQueST) developed by EUnetHTA (EUnetHTA 2019) and “Registries for Evaluating Patient Outcomes: A Users Guide: Fourth Edition,” developed by U.S. Agency for Healthcare Research and Quality (Glicklich et al. 2020).

In a systematic assessment of DQ evaluation, Bian et al. (2020) identified twelve DQ dimensions (currency, correctness/accuracy, plausibility, completeness, concordance, comparability, conformance, flexibility, relevance, usability/ease-of-use, security, information loss and degradation, consistency, and understandability/interpretability). They concluded that definitions of DQ dimensions and methods were not consistent in the literature; they called for further work to generate understandable, executable, and reusable DQ measures. To that end, we have developed a user-friendly set of screening criteria to help researchers of varying experience assess whether existing reusable RWD sources may be fit-for-purpose when their objective is to answer questions from regulatory agencies or to support claims regarding benefits and risks of therapies.

We took our cue on the definition of “fit-for-purpose” from the FDA draft guidance on selecting, developing, or modifying fit-for-purpose clinical outcome assessments (COAs) for patient-focused drug development guidance (to help sponsors use high quality measures of patients’ health in medical product development programs) which states that fit-for-purpose in the regulatory context means the same thing as valid within modern validity theory, i.e., validity is “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests,” and that a clinical outcomes assessment is considered fit-for-purpose when “the level of validation associated with a medical product development tool is sufficient to support its context of use”(FDA 2022). While the term validity is defined in epidemiology to be comprised of internal and external validity relating to study design and execution, we designed the RWD screening tool to focus on evaluation of the RWD itself within the larger framework of modern validity theory (Royal 2017).

After all, as Wilkinson notes, “good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process” (Wilkinson 2016). Wilkinson proposed the FAIR principles for the management of RWD generated by public funds (although they are also applicable to datasets created in the private sector) (Wilkinson 2016); data sources should be findable, accessible, interoperable, and reusable. These recommendations are complemented by the recommendations of the Duke-Margolis white paper “Determining Real-World Data’s Fitness for Use and the Role of Reliability” (Mahendraratnam et al. 2019) that explored whether RWD are fit-for-purpose by the application of rigorous verification checks of data integrity.

While experts in modern validity theory have not reached consensus on the attributes of validity, there are basic tenets that most such theorists have adopted (Royal 2017). Validity pertains to the inferences or interpretations made about a set of scores, measures, or in this case—data sources, as opposed to their intrinsic properties. As applied to evaluation of RWD sources, this means that they must be considered fit-for-purpose for generating credible RWE through well-designed and well-executed study protocols to inform decision making. Modern validity theory would suggest that the accumulation of evidence should be employed to determine if this inference regarding RWD quality is adequately supported. Hence, validity of a data source is a judgement on a continuum onto which new evidence is added and is assessed as part of a cumulative process because knowledge of multiple factors (e.g., new populations/samples of participants, differing contexts, new knowledge, etc.) are gained over time. This element of RWD source evaluation is not specifically recognized in the current recommendations by the FDA and the EMA.

As noted earlier, an obstacle to developing a consensus regarding evaluation of DQ is that many terms have been used to describe their dimensions and elements, and the terminology has been used inconsistently, despite efforts at harmonization (Kahn et al. 2016, Bian et al. 2020). Regardless, RWE derived from RWD that focuses on the natural history of disease and adverse effects of treatment have long been considered “valid” by decision-makers. Recently, the issue of data validity has urgently become a focus for regulatory initiatives as RWE derived from RWD is being expanded in its use to inform decisions about treatment effectiveness and comparative effectiveness. These decisions demand a greater level of certainty in order to trust the study results.

A crucial dimension in assessing the validity of data calls for the need for transparency through traceability and accessibility. The FDA reinforced this point in several recent guidance documents, as has the HMA-EMA (European Union’s Heads of Medicines Agencies-European Medicines Agency) Joint Big Data Taskforce Report (HMA-EMA 2019). The FDA guidance on “Considerations for the Use of Real-World Data and Real-World Evidence to Support Regulatory Decision-Making for Drug and Biological Products” states that “If certain RWD are owned and controlled by other entities, sponsors should have agreements in place with those entities to ensure that relevant patient-level data can be provided to FDA and that source data necessary to verify the RWD are made available for inspection as applicable.” (FDA 2023). The FDA noted in its “Data Standards for Drug and Biological Product Submissions Containing Real-World Data Guidance for Industry” that “during data curation and data transformation, adequate processes should be in place to increase confidence in the resultant data. Documentation of these processes may include but are not limited to electronic documentation (i.e., metadata-driven audit trails, quality control procedures, etc.) of data additions, deletions, or alterations from the source data system to the final study analytic data set(s)” (FDA 2021b).

Interests in the creation of “regulatory-grade” RWD and RWE in the US was spurred by the 21st Century Cures Act. In the EU, there is the Innovative Medicines Initiative (IMI) Get Real and the HMA-EMA Big Data Joint Taskforce. As noted in the Framework for FDA’s Advancing Real-World Evidence Program for early regulatory engagements to evaluate the potential use of RWE for the support of a new indication for a drug already approved or to help satisfy drug post approval study requirements, the strength of RWE submitted in support of a regulatory decision will depend on its reliability that encompasses more than transparency in data accrual and quality control, but also clinical study methodology, and the relevance of the underlying data (FDA 2018, 2023).

The National Institute for Health and Care Excellence (NICE) in the United Kingdom issued a real-world evidence framework in 2022 (NICE 2022). Among the elements to address data suitability were data provenance and governance, DQ including completeness and accuracy, and data relevance (data content, differences in patients and care settings, sample size, length of follow-up). They developed the DataSAT Tool that sought information on data sources, data linkages, purpose of data collection, description of data collected, time period of data collection, data curation, data specification (ex. data dictionary) and data management/quality assurance.

The European Medicines Regulatory Network strategy to 2025 includes the creation of the DARWIN [Data Analytics and Real-World Interrogation Network] (Arlett 2020; Arlett et al. 2021); it builds on the HMA-EMA Big Data Joint Taskforce Report (HMA-EMA 2019) that RWD is challenged by a lack of standardization, sometimes limited precision and robustness of measurements, missing data, variability in content and measurement processes, unknown quality, and constantly changing datasets. The report viewed the number of EU databases that currently meet minimum regulatory requirements for content and that are readily accessible, citing Pacurariu et al. (2018), as “disappointingly low”. The International Coalition of Medicine Regulatory Authorities (ICMRA) has called for global regulators to collaborate on standards for incorporating real-world evidence in decision making (ICMRA 2023).

In developing a user-friendly set of criteria, we attempted to find the right balance between the granularity of requested information and the response burden. We defined the dimensions of data suitability (e.g., sufficiency of quality and fitness-for-purpose) in plain English terms consistent with existing frameworks as discussed above. Although we primarily focus on the US and EU, the tool may have relevance to other jurisdictions as well, with the understanding of various local data privacy protection requirements for data-access.

2 Materials and methods

Our literature review did not focus on tools or frameworks that primarily addressed study design and/or execution and/or statistical analysis (ex. Wang et al. 2021, 2023; Gatto et al. 2022; Gebrye et al. 2023; Campbell et al. 2023). Rather, we focused on frameworks and proposals that addressed the foundational and intrinsic qualities (as defined above by the EMA) of real world data. Our starting point was the widely cited articles such as Kahn et al. (2016) and the well-executed systematic review reported by Bian et al. (2020) that included relevant articles published through February 2020. We supplemented this with a search of the PubMed-listed articles through August 31, 2023 using the following inclusion criteria: any language; type of articles included reviews only, or reviews or systematic reviews. There were no exclusion criteria. Boolean operators were used for specifying texts or text strings such as “quality”, “real world data”, and “real world evidence”. The PubMed search engine was used to conduct the search twice, one for reviews only, while the other for either reviews or systematic reviews. In addition, we also separately accessed the websites of the FDA and the EMA.

The literature was summarized in tables to delineate criteria employed to assess the potential reliability and relevance of RWD. To make our tool user-friendly, we grouped the criteria into 5 broad dimensions that, based upon our experience, conform to the general categories of considerations that are top-of-mind for most researchers in evaluating potential RWD sources. Because of the large number of criteria and the variability in terminology used, we limited the number of criteria within the tool to manage its response burden; and since this is intended as a screening tool for existing RWD sources such that researchers will need to do additional investigation to determine whether a RWD source is fit for their intended purposes, we did not view this as a problem. The criteria included here in this tool are consistent with those in the NICE DataSAT tool, with the exception of our inclusion of criteria regarding the track-record of RWD source use.

3 Results

The framework of the screening criteria included the following dimensions: authenticity, transparency, relevance, accuracy, and track-record. We defined these dimensions as follows:

  1. A.

    Authenticity A data source is considered authentic when its provenance is well-documented, and its authenticity can be verified with its component data sources as needed.

  2. B.

    Transparency A data source is considered transparent when the processes used in data acquisition, curation, editing, and linkage are described adequately.

  3. C.

    Relevance A data source is considered relevant when it contains the population of interest in adequate numbers and sufficient length of follow-up; and it contains the data elements required in order to implement an analysis plan of a real-world protocol.

  4. D.

    Accuracy A data source is operationally considered accurate when it can be documented that it depicts the concepts they are intended to represent, insofar as can be assured by DQ checks (i.e., the data is of sufficient quality to investigate specific research questions). Further, key data elements can be audited as deemed required.

  5. E.

    Track-record A data source will be considered more trustworthy in general when it can document its track-record of use cases that resulted in the creation of credible real-world evidence.

Using this framework, we developed the ATRAcTR (Authentic Transparent Relevant Accurate Track-Record) screening tool to evaluate real-world data sources (Table 1).

Table 1 ATRAcTR (Authentic Transparent Relevant Accurate Track-Record)

Authenticity is a foundational and necessary requirement for considering using an RWD source in real world evidence clinical study. This is consistent with FDA and EMA criteria requesting confirmation of data provenance and data traceability. While this may be obvious, recent experience during the COVID pandemic indicates that it cannot be taken for granted (Ledford and Van Noorden 2020). For some commercially available data sources, documentation alone may not be considered adequate. Verification via access to the data may be expected by regulators. Provenance documentation should provide information on what data was collected, why was it collected, the sources of the data, how the data was collected and the timeframe over which it was (or continues to be) collected. Any changes to the database over time should be disclosed (ex. addition of laboratory results, biomarkers, etc.). Disclosure is important to demonstrate that data was appropriately de-identified and/or describe the procedures that are in place to protect patient confidentiality. Of course, the provenance of publicly supported datasets may be directly known by regulatory agencies via their participation in their creations (ex. SENTINEL and FDA, DARWIN and EMA) (Arlett 2020; FDA 2020; EMA 2021).

Transparency in the description of the processes and procedures employed to acquire, curate, transform, edit, and link data is essential to assessing whether planned analyses will be performed on reliable good quality data. Documentation should describe what extract, transform and load (ETL) procedure was used, whether the data were transformed according to specified standards into specified formats (e.g., a common data model), how the data dictionary can be accessed, and whether there were any imputations or adjustments for incorrect or missing data. If more than one data source were combined to create the dataset, the processes for linking them should be described (ex. unique patient ID, tokenization). If the source data contained free-text contents, the processes for extracting information from the free text should be described (e.g., manually, natural language processing). How changes in coding conventions (e.g., ICD-9 to ICD-10) via General Equivalence Mappings (Center for Medicare and Medicaid Services—CMS) over time were handled should also be described. For data sources that are still adding new data, the latency of data accrual and the refresh cadence should be described. Furthermore, whether the data sources contain any synthetic data should be disclosed including the reason for its inclusion.

The relevance of the data source is essential in determining whether a data source is fit-for-purpose and is a bespoke process for a particular real-world clinical study. It includes assessing the sample size of the population of interest it contains, the observation period and length of follow-up. Documentation must be provided to allow researchers to assess whether data elements required for analysis are captured (e.g. population, intervention, comparator, outcomes, time, and confounders). Additional useful information is provided by demographic characteristics of the data including age, gender, ethnicity, and geography in the dataset. How this request is interpreted (e.g., overall database, study population) will depend on how the data source provider understands the needs of the researcher. The interpretation of data source representativeness will depend on its intended use. It is useful to disclose whether the data source was designed to be representative of the population, or if it is a convenience sample, and it should be stated why it can represent the population of interest.

We define accuracy broadly here as an assessment of the integrity of the data, most frequently characterized by conducting DQ checks for conformance to a common data model, plausibility and completeness; it also includes the concepts of reliability, extensiveness, and coherence as discussed in the EU framework. We separated accuracy from transparency since this requires its own distinct review procedures while recognizing that both are components of reliability and in deciding whether a data source is fit-for-purpose. A substantial literature (Kahn et al. 2016; Blacketer et al. 2021; Dreyer et al. 2010; Dreyer 2018; Girman et al. 2019; Hall et al. 2012; Kahn et al. 2015; Liaw et al. 2021; Miksad and Abernathy 2018; Qualls et al. 2018; Razzaghi et al. 2022; Reynolds et al. 2020; Schmidt et al. 2021; Simon et al. 2022; Zozus et al. 2015) describes the need for DQ checks and their specifics. Individual data curators generally set their own standards, making disclosure important. Because of the longer experience with administrative claims data, there are more generally accepted standards than there currently exist for EHR data. Software programs (e.g., Open-source software programs such as R or Python packages or functions, open source and commercial DQ assessment tools, as well as software packages such as Aetion Evidence Platform (Aetion.com) or Instant Health Data (Panalgo.com) have been developed to make the process operationally feasible (Liaw et al. 2021; Schmidt et al. 2021). Plausibility checks include an assessment of external consistency and internal consistency (e.g., temporal value violations, unexpected distributions or combinations, out-of-range value anomalies, data contradictions, and how null values were handled). There should also be disclosed whether there were DQ checks performed to evaluate if any unique patient contributed data under more than one identifier. Moreover, it should be disclosed whether there was any interruption or significant changes in data collection and/or processing over time that impacted data continuity (e.g., who contributed data, what data was collected, or how it was processed).

As discussed earlier, and consistent with modern validity theory, one gains insight into the general quality of a data source based upon its performance track-record of use cases where a data source successfully or failed to result in credible RWE. Performance track-record would include the publications that have employed the data source (especially for similar research questions to the study of interest) as well as a listing of unpublished use cases. If the data source was used in a study designed to emulate a randomized controlled trial (Franklin et al. 2021) and was successful, this may provide additional confidence in the trustworthiness of the dataset. It may not provide greater confidence across all situations, but it provides the likelihood that the data source will be fit-for-purpose for a variety of scenarios or study designs. Use cases and/or publications that were incorporated in responses to regulatory questions or to support regulatory decisions should be highlighted. For newly available data sources, previous performance history may be meager; however, such datasets can be provisionally viewed as suitable based on their providing evidence that they satisfy the criteria of authenticity, transparency, relevancy, and accuracy. As noted earlier, examining the track records of RWD sources has not been explicitly identified in current frameworks; we suspect that this reflects the fact that evaluating relevance or fitness-for-purpose is a bespoke judgement relating to the particular decision facing a regulator.

3.1 Pilot-testing on hypothetical data

With the goal of a user-friendly set of criteria, we tested the response burden. A highly experienced author (WHC) who held leadership positions for US RWD sources (e.g., MarketScan and Optum Clinformatics Data Mart) completed the tool for a hypothetical data source in two to three hours (Appendix A). This suggests that data source providers will not face an onerous response burden. Note that, for a given data source, the information requested remains the same across different studies. Therefore, the response burden will likely decrease for a data source provider across multiple requests, as well as the familiar with data, as well as the list of DQ criteria. Appendix A offers an example of what is a reasonable granularity of detail that provides a good picture of DQ for screening purposes. We expect that whoever responds to the tool will determine what is reasonable in discussion with researchers seeking to obtain access to the data source.

4 Discussion

Regulatory bodies such as the FDA and EMA, as well as others, have turned their attention to RWE to complement the information from randomized controlled trials to assess the benefit-risk profiles of therapeutics. They have long used evidence derived from real-world data sources to assess safety profiles and have begun moving from passive to active surveillance through the use of networks such as SENTINEL. Increasingly, standing cohorts are being leveraged for signal detection and confirmation (Huybrechts et al. 2021).

On a limited basis, RWE was considered adequate to support labelling changes when randomized controlled trials have not been considered feasible (FDA 2023). Recently, RWE was accepted by the FDA as the sole supplementary information to support labeling changes (FDA 2021c). Tacrolimus was approved for prophylaxis of organ rejection in patients receiving liver transplants based upon a Supplemental New Drug Application supported by a non-interventional RWE study. Tacrolimus had also been approved for use in kidney and heart transplants. The data source used in the RWE study was the U.S. Scientific Registry of Transplant Recipients. There were minimal concerns regarding relevance since the registry contained data on all lung transplants in the U.S., death and graft failures were adjudicated, and most necessary information was captured. There were low concerns on reliability since there was transparent RWD management, there was a low percentage of missing variables and few non-plausible lengths of hospitalization; most variables were coded accurately. The FDA determined that this non-interventional study with historical controls to be adequate and well-controlled and noted that, outcomes of organ rejection and death are virtually certain without therapy. The dramatic effect of treatment helped to preclude bias as explanation of results.

With the widespread digitization of medical information, how to expand the use of RWD to assess clinical effectiveness has become a priority both in the U.S. and in Europe. Indeed, the FDA has utilized RWE on a limited basis to evaluate the effectiveness of treatments for rare diseases and cancers. However, RWD submitted to support effectiveness claims have been frequently found deficient due to issues of relevancy (e.g., representativeness of population, small sample size) and accuracy (e.g., missing data) (Mahendraratnam et al. 2022; Bakker et al 2022). Government funded efforts to create “fit-for-purpose” RWD have begun in both the U.S. and Europe including SENTINEL, PCORNet, and DARWIN. Use of SENTINEL has been restricted thus far to safety issues but its extension to its application in treatment effectiveness studies is on the table as part of the FDA’s Advancing Real-World Evidence Program, as well as FDA’s Real-World Evidence Program more broadly. Progress in this endeavor will depend on changing the culture, behaviors, and standards applied by researchers and data source providers; in turn, the impetus for such change will come, in large part, by changing requirements of regulatory bodies as they expand the use of RWE to address perceived weaknesses of real-world observational studies in comparison to randomized controlled trials (Berger and Crown 2022; Simon et al. 2022).

Currently, for commercially available non-Medicare or Medicaid data sources, there is limited disclosure (e.g., transparency) describing how datasets are assembled, nor is there much disclosure and/or published literature on assessments of their DQ. Regulators and researchers have had to rely on their experience with specific data sources to judge their reliability rather than a formal set of criteria. For data sources that are new to the regulators and researchers, or for researchers with less experience or knowledge of RWD, it may be difficult to judge the quality of the data sources without sufficient disclosure from the data providers.

The screening tool described here will be immediately helpful to researchers in evaluating the quality of existing RWD sources that they intend to reuse to generate new RWE; it will help them gain understanding of the data provenance and how patient confidentiality is protected, how data are extracted and curated, what DQ checks have been performed, and the performance history of the data source. It is not expected that all data source providers will be necessarily willing or able to provide all of the information requested in ATRAcTR due to concerns regarding protection of intellectual property. However, the greater number and more granular the information they are able to provide will increase confidence in those requesting access to the dataset that it has the potential to generate regulatory-grade RWE. We note that the RCT-DUPLICATE Initiative (Wang et al. 2022) was able to reproduce the results of a large number of published comparative safety or effectiveness studies employing several commonly accessed real-world data sources (e.g., Medicare, Marketscan, Optum Clinformatics Data Mart, and Clinical Practice Research Datalink); this suggests that the providers of these data sources are likely be able to provide complete and adequate responses to the ATRAcTR.

We would also caution that using ATRAcTR is not sufficient in of itself to assess whether a particular RWD source is fit-for-purpose. Fit-for-purpose judgements will still require further careful considerations based on the context and the specific scientific question of interest. Moreover, the set of criteria does not address issues of study design and analysis that are critical to regulatory agencies in evaluating the robustness and credibility of the real-world evidence generated. Unlike prior data quality frameworks, the track record dimension of the tool adds the consideration of experience with RWD sources consistent with Modern Validity Theory.

There are some limitations in our efforts to develop the ATRAcTR. We did not use a formal process to group quality criteria into dimensions, nor did we further identify subdimensions, unlike the EMA’s DQF approach. We used this as a starting point, but sought to reduce the number of dimensions by broadening their scope to make the tool more user-friendly; moreover, the criteria included in the five dimensions cover most of issues raised by Bian and colleagues. Indeed, the tool format is compatible with existing frameworks, with the exception of our addition of reviewing the track record for a particular RWD source. Neither did we perform a formal validation study, since this is intended to be a screening instrument and researchers will clearly be in a better position with this information. We did not create a scoring manual to recommend the cut-off points as benchmark values for high, medium or low DQ, since there are no existing benchmarks available that lends credibility to the idea of a uniform threshold for DQ. Moreover, the RWE generated will be assessed by regulators not only on source DQ and its fitness-for-purpose but on the appropriateness and rigor of study design and statistical analysis. In the end, researchers still need to judge using the totality of DQ assessment results how to interpret the information received. In addition, we did not try to establish what are minimum standards to conclude that a specific RWD source is of “good quality” and “fit-for-purpose”. Nevertheless, we are confident that the ATRAcTR will assist researchers in identifying potential existing reusable RWD sources for regulatory purposes.