ATRAcTR (Authentic Transparent Relevant Accurate Track-Record): A Screening Tool to Assess the Potential for Real-World Data Sources to Support Creation of Credible Real-World Evidence for Regulatory Decision-Making

Background: Adoption and use of RWD for decision-making has been complicated by concerns regarding whether RWD was t-for-purpose or was of su�cient validity to support the creation of credible RWE. This has greater urgency as regulatory agencies begin to use real world evidence (RWE) to inform decisions about treatment effectiveness. Methods: We developed a practical screening tool to assess the quality of RWD sources using the framework of Modern Validity Theory. While there has been some convergence of conceptual frameworks, consensus has yet to emerge regarding how to speci�cally evaluate whether RWD is reliable and t-for-purpose. We developed a screening tool consistent with the current frameworks and how researchers generally evaluate existing RWD sources for research that they intend to submit to regulatory agencies. Results: The tool has �ve dimensions: authenticity, transparency, relevance, accuracy, and reliability. Based upon these dimensions, we developed what more detailed information should be sought by researchers to screen potential RWD sources. Conclusions: Using a hypothetical example of a medical claims data source, we showed that responding to the tool would not require an extraordinary burden or a lengthy document. This RWD screening tool, which is ready for immediate use, is consistent with current conceptual frameworks to assess whether RWD is t-for-purpose and adds the additional consideration of experience with RWD sources consistent with Modern Validity Theory.


Introduction
The use of real-world data (RWD) and real-world evidence (RWE) derived from it has been widely adopted by pharmaceutical developers and a variety of decision makers including providers, payers, health technology authorities and regulatory agencies (Berger et al. 2015;Berger et al. 2016; Berger and Crown 2022; Daniel et al. 2018;Zou et al. 2021).Credible RWE can be created from good quality RWD when investigated within well-designed and well-executed research studies (Berger and Crown 2022).
Adoption and use of RWD has been complicated by concerns regarding whether particular sources of RWD are of "good quality" and " t-for-purpose".These concerns have become more urgent as regulatory agencies are increasingly using RWD as external comparators for randomized clinical trials and are exploring whether non-interventional RWD studies can provide substantial supplementary evidence of treatment effectiveness.
The US Food and Drug Administration (FDA) draft guidance "Assessing Electronic Health Record and Medical Claims Data to Support Regulatory Decision Making" stated that for all study designs, it is important to ensure the reliability and relevance of data used to help support a regulatory decision (FDA 2021a).Reliability included data accuracy, completeness, provenance, and traceability; relevance includes key data elements (exposures, outcomes, covariates) and a su cient number of representative patients for the study.
The European Medicines Agencies (EMA) has issued a draft framework for data quality for EU Medicines Regulation (EMA 2022).It de nes data quality as tness for purpose to user needs in relation to health research, policy making, and regulation and the data re ect the reality which they aim to represent (TEHDS EU 2022).It divides the determinants of data quality into foundational, intrinsic, and question speci c categories.Foundational determinants are those that pertain to the processes and systems through which data are generated, collected and made available.Intrinsic determinants pertain to aspects that are inherent to a speci c dataset.Question speci c determinants pertain to aspects of data quality that cannot be de ned independent of a speci c question.It also distinguishes three levels of granularity of data quality: value level, column level, and dataset level.The dimensions and metrics of data quality are divided into the following categories: reliability, extensiveness, coherence, timeliness, and relevance.Reliability (precision, accuracy, plausibility) evaluates the degree to which the data correspond to reality.
Extensiveness (completeness and coverage) evaluates whether the data are su cient for a particular study.
Coherence examines the extent to which different parts of a dataset are consistent in the representation and meaning.This dimension is subdivided into format coherence, structural coherence, semantic coherence, uniqueness, conformance, and validity.
Timeliness is de ned as the availability of data at the right time for regulatory decision making.
Relevance is de ned as the extent to which a dataset presents the elements required to answer a research question.
TransCelerate has issued a simpler framework entitled "Real-World Data Audit Considerations" that is divided into pillars of relevance, accrual, provenance, completeness, and accuracy (TransCelerate 2022).These frameworks are part of an ongoing dialogue among stakeholders from which international standards for "regulatory-grade RWD" will eventually emerge.
In the meantime, there is an immediate need for researchers with varying levels of RWD experience to have a screening tool to help them assess whether potential RWD sources are t-for-purpose when designing studies whose purpose is to answer questions from regulatory agencies or to support claims regarding bene ts and risks of therapies.To this end, we developed such a tool, consistent with above frameworks, that conforms to how researchers generally approach this critical issue and also incorporated concepts from Modern Validity Theory.
We took our cue on the de nition of " t-for-purpose" from the FDA draft guidance on selecting, developing, or modifying t-forpurpose clinical outcome assessments (COAs) for patient-focused drug development guidance (to help sponsors use high quality measures of patients' health in medical product development programs) which states that t-for-purpose in the regulatory context means the same thing as valid within modern validity theory, i.e., validity is "the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests," and that a clinical outcomes assessment is considered t-for-purpose when "the level of validation associated with a medical product development tool is su cient to support its context of use"(FDA 2022).The term validity has been de ned in epidemiology to be comprised of internal and external validity relating to study design and execution.We designed the RWD screening tool to focus on evaluation of the realworld data itself within the larger framework of modern validity theory (Royal 2017).After all, as Wilkinson notes, "good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process" (Wilkinson 2016).Wilkinson proposed the FAIR principles for the management of RWD generated by public funds (although they are also applicable to datasets created in the private sector) (Wilkinson 2016).Data sources should be Findable, Accessible, Interoperable, and Reusable.These recommendations are complemented by the recommendations of the Duke-Margolis white paper "Determining Real-World Data's Fitness for Use and the Role of Reliability" (Mahendraratnam et al. 2019) that explored whether RWD are t-for-purpose by the application of rigorous veri cation checks of data integrity.
While experts in modern validity theory have not reached consensus on the attributes of validity, there are basic tenets that most modern validity theorists have adopted (Royal 2017).Validity pertains to the inferences or interpretations made about a set of scores, measures, or in this case -data sources, as opposed to their intrinsic properties.As applied to evaluation of RWD sources, this means that they must be considered t-for-purpose for generating credible RWE through well-designed and wellexecuted study protocols to inform decision making.Modern validity theory would suggest that the accumulation of evidence should be employed to determine if this inference regarding RWD quality is adequately supported.Hence, validity of a data source is a judgement on a continuum onto which new evidence is added and is assessed as part of a cumulative process because knowledge of multiple factors (e.g., new populations/samples of participants, differing contexts, new knowledge, etc.) are gained over time.This element of RWD source evaluation is not speci cally recognized in the current recommendations by the FDA and the EMA.
One obstacle to developing a consensus regarding evaluation of data quality is that many terms have been used to describe their dimensions and elements, and the terminology has been used inconsistently, despite efforts at harmonization (Kahn et al. 2016).Despite this, RWE derived from RWD that focuses on the natural history of disease and adverse effects of treatment have long been considered "valid" by decision-makers.Recently, the issue of data validity has become more urgent and has been a focus for regulatory initiatives as RWE derived from RWD is being expanded in its use to inform decisions about treatment effectiveness and comparative effectiveness.These decisions demand a greater level of con dence and certainty in study results.
A crucial dimension in assessing the validity of data is the need for transparency and traceability/accessibility.This has been reinforced by the FDA in several recent draft guidance documents and the HMA-EMA (European Union's Heads of Medicines Agencies-European Medicines Agency) Joint Big Data Taskforce Report (HMA-EMA 2019).The FDA guidance on Considerations for the Use of Real-World Data and Real-World Evidence to Support Regulatory Decision-Making for Drug and Biological Products states "If certain RWD are owned and controlled by third parties, sponsors should have agreements in place with those parties to ensure that all relevant patient-level data can be provided to FDA and that source data necessary to verify the RWD are made available for inspection as applicable" (FDA 2021b).The FDA noted in its Data Standards for Drug and Biological Product Submissions Containing Real-World Data Guidance for Industry that "during data curation and data transformation, adequate processes should be in place to increase con dence in the resultant data.Documentation of these processes may include but are not limited to electronic documentation (i.e., metadata-driven audit trails, quality control procedures, etc.) of data additions, deletions, or alterations from the source data system to the nal study analytic data set(s)" (FDA 2021c).
Interest in the creation of "regulatory-grade" RWD/RWE has been spurred by the 21st Century Cures Act in the US and by the ongoing initiatives in Europe including the Innovative Medicines Initiative (IMI) Get Real and the HMA-EMA Big Data Joint Taskforce.As noted in the Framework for FDA's Real-World Evidence Program to evaluate the potential use of RWE for the support of a new indication for a drug already approved or to help satisfy drug post approval study requirements, the strength of RWE submitted in support of a regulatory decision will depend on its reliability that encompasses more than transparency in data accrual and quality control, but also clinical study methodology, and the relevance of the underlying data (FDA 2018).In developing the screening tool, we attempted to nd the right balance between the granularity of requested information and the response burden.We de ned the dimensions of data suitability (e.g., su ciency of quality and tness-for-purpose) in plain English terms consistent with existing frameworks as discussed above.Although we focus on the US and EU, the tool may have relevance to other jurisdictions as well.

Materials and Methods
Since the authors have extensive experience with a wide variety of real-world data sources, the literature was assessed through a "snowballing" approach starting off with the identi cation of key references.This was supplemented by English language google searches applying the terms "real-world data" and "real-world evidence" and searches that combined these terms with variations on the terms "regulatory authorities", "FDA", and "EMA".Additional searches combined via Boolean operators the terms "real-world data" with terms including "quality", "validity", and "veri cation".FDA and EMA websites were also accessed.
The literature was summarized in tables to delineate the terms and concepts employed to assess the reliability (or quality) and relevance (whether it was t-for-purpose for a proposed RWD study).We consolidated the concepts into 5 dimensions that, based upon our experience, conform to the general issues that must be clari ed for most researchers to evaluate potential RWD sources.

Results
The framework of the screening tool included the following dimensions: authenticity, transparency, relevance, accuracy, and track-record.We de ned these dimensions as follows: A. Authenticity: A data source is considered authentic when its provenance is well-documented, and its authenticity can be veri ed with its component data sources as needed.
B. Transparency: A data source is considered transparent when the processes used in data acquisition, curation, editing, and linkage are described adequately.
C. Relevance: A data source is considered relevant when it contains the population of interest in adequate numbers and su cient length of follow-up; and it contains the data elements required in order to implement an analysis plan of a realworld protocol.
D. Accuracy: A data source is operationally considered accurate when it can be documented that it depicts the concepts they are intended to represent, insofar as can be assured by data quality checks (i.e., the data is of su cient quality to investigate speci c research questions).Further, key data elements can be audited as deemed required.
E. Track-Record: A data source will be considered more trustworthy in general when it can document its track-record of use cases that resulted in the creation of credible real-world evidence.
Using this framework, we developed the ATRAcTR (Authentic Transparent Relevant Accurate Track-Record) screening tool to evaluate real-world data sources (Appendix A).The component concepts of each of these dimensions were delineated in preparation for the creation of ATRAcTR (Table 1).Authenticity is a foundational and necessary requirement for considering using an RWD source in real world evidence clinical study.This is consistent with FDA and EMA criteria requesting con rmation of data provenance and data traceability.While this may be obvious, recent experience during the COVID pandemic indicates that it cannot be taken for granted (Ledford and Van Noorden 2020).For some commercially available data sources, documentation alone may not be considered adequate.Veri cation via access to the data may be expected by regulators.Provenance documentation should provide information on what data was collected, why was it collected, the sources of the data, how the data was collected and the timeframe over which it was (or continues to be) collected.Any changes to the database over time should be disclosed (ex.addition of laboratory results, biomarkers, etc.).Disclosure is important to demonstrate that data has been appropriately de-identi ed and/or describe the procedures that are in place to protect patient con dentiality.Of course, the provenance of publicly supported datasets may be directly known by regulatory agencies via their participation in their creations (ex.SENTINEL and FDA, DARWIN and EMA) (Arlett 2020; FDA 2020; EMA 2021).
Transparency in the description of the processes and procedures employed to acquire, curate, transform, edit, and link data is essential to assessing whether planned analyses will be performed on reliable good quality data.Documentation should describe what ETL (extract, transform, load) procedure was used, whether the data were transformed according to speci ed standards into speci ed formats (ex. a common data model), how the data dictionary can be accessed, and whether there were any imputations or adjustments for incorrect or missing data.If more than one data source were combined to create the dataset, the processes for linking them should be described (ex.unique patient ID, tokenization).If the source data contained free-text contents, the processes for extracting information from the free text should be described (e.g., manually, natural language processing).How changes in coding conventions (ex.ICD-9 to ICD-10) over time were handled should also be described.For data sources that are still adding new data, the latency of data accrual and the refresh cadence should be described.Furthermore, whether the data sources contain any synthetic data should be disclosed including the reason for its inclusion.
The relevance of the data source is essential in determining whether a data source is t-for-purpose and is a bespoke process for a particular real-world clinical study.It includes assessing the sample size of the population of interest it contains, the observation period and length of follow-up.Documentation must be provided to allow researchers to assess whether data elements required for analysis are captured.Additional useful information is provided by demographic characteristics of the data including age, gender, ethnicity, and geography in the dataset.How this request is interpreted (e.g., overall database, study population) will depend on how the data source provider understands the needs of the researcher.The interpretation of data source representativeness will depend on its intended use.It is useful to disclose whether the data source was designed to be representative of the population, or if it is a convenience sample, and it should be stated why it can represent the population of interest.
We de ne accuracy broadly here as an assessment of the integrity of the data, most frequently characterized by conducting data quality checks for conformance to a common data model, plausibility and completeness; it also includes the concepts of reliability, extensiveness, and coherence as discussed in the EU framework.We separated accuracy from transparency since this requires its own distinct review procedures while recognizing that both are components of reliability and in deciding whether a data source is t-for-purpose.A substantial literature (Kahn et  Plausibility checks include an assessment of external consistency and internal consistency (e.g., temporal value violations, unexpected distributions or combinations, out-of-range value anomalies, data contradictions, and how null values were handled).There should also be disclosed whether there were data quality checks performed to evaluate if any unique patient contributed data under more than one identi er.Moreover, it should be disclosed whether there was any interruption or signi cant changes in data collection and/or processing over time that impacted data continuity (e.g., who contributed data, what data was collected, or how it was processed).
As discussed earlier, and consistent with Modern Validity Theory, one gains insight into the general quality of a data source based upon its performance track-record of use cases where a data source successfully or failed to result in credible RWE.Performance track-record would include the publications that have employed the data source (especially for similar research questions to the study of interest) as well as a listing of unpublished use cases.If the data source has been used in a study designed to emulate a randomized controlled trial (Franklin et al. 2021) and was successful, this may provide additional con dence in the trustworthiness of the dataset.Use cases and/or publications that were incorporated in responses to regulatory questions or to support regulatory decisions should be highlighted.For newly available data sources, previous performance history may be meager; however, such datasets can be provisionally viewed as suitable based on their providing evidence that they satisfy the criteria of authenticity, transparency, relevancy, and accuracy.As noted earlier, examining the track records of RWD sources has not been explicitly identi ed in current frameworks; we suspect that this re ects the fact that evaluating relevance or tness-for-purpose is a bespoke judgement relating to the particular decision facing a regulator.
At rst glance, the response burden for this tool may appear to be somewhat overwhelming.However, we assessed the burden by creating a set of prototype responses to the tool for a hypothetical claims data source (Appendix B), and it was not an undue burden when you have the appropriate knowledge.Note that, for a given data source, the information requested remains the same across different studies.Appendix B offers an example of what is a reasonable granularity of detail that provides a good picture of data quality without requiring an exhaustive response burden.We expect that whoever responds to the tool will determine what is reasonable in discussion with researchers seeking to obtain access to the data source.

Discussion
Regulatory bodies such as the FDA and EMA, as well as others, have turned their attention to Real-World Evidence to complement the information from randomized controlled trials to assess the bene t-risk pro les of therapeutics.They have long used evidence derived from real-world data sources to assess safety pro les and have begun moving from passive to active surveillance through the use of networks such as SENTINEL.Increasingly, standing cohorts are being leveraged for signal detection and con rmation (Huybrechts et al. 2021).
On a limited basis, RWE has been considered adequate to support labelling changes when randomized controlled trials have not been considered feasible (FDA 2021a).Recently, RWE has been accepted by the FDA as the sole supplementary information to support labeling changes (FDA 2021d).Tacrolimus was approved for prophylaxis of organ rejection in patients receiving liver transplants based upon a Supplemental New Drug Application supported by a non-interventional RWE study.Tacrolimus had been approved for use in kidney and heart transplants.The data source used in the RWE study was the U.S. Scienti c Registry of Transplant Recipients.There were minimal concerns regarding relevance since the registry contained data on all lung transplants in the U.S., death and graft failures were adjudicated, and most necessary information was captured.There were low concerns on reliability since there was transparent RWD management, there was a low percentage of missing variables and few non-plausible lengths of hospitalization; most variables were coded accurately.The FDA determined that this noninterventional study with historical controls to be adequate and well-controlled and noted that, outcomes of organ rejection and death are virtually certain without therapy.The dramatic effect of treatment helped to preclude bias as explanation of results.
With the widespread digitization of medical information, how to expand the use of real-world data to assess clinical effectiveness has become a priority both in the U.S. and in Europe.Indeed, the FDA has utilized RWE on a limited basis to evaluate the effectiveness of treatments for rare diseases and cancers.However, RWD submitted to support effectiveness claims have been frequently found de cient due to issues of relevancy (ex.representativeness of population, small sample size) and accuracy (e.g., missing data) ( Currently, for commercially available data sources, there is limited disclosure (e.g., transparency) describing how datasets are assembled, nor is there much disclosure and/or published literature on assessments of their data quality.Regulators and researchers have had to rely on their experience with speci c data sources to judge their reliability rather than a formal set of criteria.For data sources that are new to the regulators and researchers, or for researchers with less experience or knowledge of RWD, it may be di cult to judge the quality of the data sources without su cient disclosure from the data providers.
The screening tool described here will be immediately helpful for researchers evaluating RWD sources to gain understanding of the data provenance and how patient con dentiality is protected, how data are extracted and curated, what data quality checks have been performed, and what has been the performance history of the data source.It is not expected that all data source providers will be necessarily willing or able to provide all of the information requested in ATRAcTR due to concerns regarding protection of intellectual property.However, the greater number and more granular the information they are able to provide will increase con dence in those requesting access to the dataset that it has the potential to generate regulatory-grade RWE.We note that the REPEAT Initiative (Wang et al. 2022) has been able to reproduce the results of a large number of published comparative safety or effectiveness studies employing several commonly accessed real-world data sources (e.g., Medicare, Marketscan, Optum, and CPRD); this suggests that the providers of these data sources are likely be able to provide complete and adequate responses to the ATRAcTR.
We believe that the ve dimensions of ATRAcTR represents a logical and intuitive approach for researchers with a range of experience in using RWD sources and that they are consistent with the current advice being provided by the FDA and EMA.We expect that as criteria for evaluating RWD moves toward international harmonization that the information sought in our tool will be captured in any future recommendations.
There are some limitations in our efforts to develop the ATRAcTR.We did not use a formal process to delineate its dimensions since our framework is compatible with existing conceptual frameworks, with the exception of our addition of reviewing the track record for a particular RWD source.Neither did we perform a formal validation study, since this is intended to be a screening instrument and researchers will clearly be in a better position with this information.In the end, they will have to judge for themselves how to interpret the information received.In addition, we did not try to establish what are minimum standards to conclude that a speci c RWD source is of "good quality" and " t-for-purpose".Nevertheless, we are con dent that the ATRAcTR can be helpful to researchers in identifying potential regulatory-grade RWD sources. Declarations Network strategy to 2025 includes the creation of the DARWIN [Data Analytics and Real-World Interrogation Network] (Arlett,2020, Arlett et al. 2021); it builds on the observation HMA -EMA Big Data Joint Taskforce Report (HMA-EMA 2019) that RWD is challenged by a lack of standardization, sometimes limited precision and robustness of measurements, missing data, variability in content and measurement processes, unknown quality and constantly changing datasets.The report viewed the number of European databases that currently meet minimum regulatory requirements for content and that are readily accessible, citing Pacurariu et al. (2018), as "disappointingly low".The International Coalition of Medicine Regulatory Authorities (ICMRA) has called for global regulators to collaborate on standards for incorporating realworld evidence in decision making (ICMRA 2022).

Table 1 :
Dimensions and Concepts Captured in ATRAcTR Historical performance of dataset in creating credible real-world evidence Has the data source been used to answer regulatory questions or support regulatory decision making al. 2016; Blacketer et al. 2021; Dreyer et al. 2010; Dreyer 2018; Girman et al. 2019; Hall et al. 2012; Kahn et al. 2015; Liaw et al. 2021; Miksad and Abernathy 2018; Qualls et al. 2018; Razzaghi et al. 2022; Reynolds et al. 2020; Schmidt et al. 2021; Simon et al. 2022; Zozus et al. 2015) describes the need for data quality checks and their speci cs.Individual data curators generally set their own standards, making disclosure important.Because of the longer experience with administrative claims data, there are more generally accepted standards than there currently exist for EHR data.Software programs (ex.Open-source R or Python packages or functions, open source and commercial data quality assessment tools) have been developed to make the process operationally feasible (Liaw et al. 2021; Schmidt et al. 2021).
(Berger and Crown 2022;Simon et al. 2022; in turn, the impetus for such change will come, in large part, by changing requirements of regulatory bodies as they expand the use of RWE to address perceived weaknesses of real-world observational studies in comparison to randomized controlled trials(Berger and Crown 2022;Simon et al. 2022).
Mahendraratnam et al. 2022;Bakker et al 2022).Publicly funded efforts to create " tfor-purpose" RWD have begun in both the U.S. and Europe including SENTINEL, PCORNet, and DARWIN.Use of SENTINEL has been restricted thus far to safety issues but its extension to its application in treatment effectiveness studies is on the table as part of the FDA Real-World Evidence Program.Progress in this endeavor will depend on changing the culture, behaviors, and standards applied