Research in pediatric kidney disease can be fraught with challenges that limit the timeliness by which medical discoveries are implemented. It is often difficult to recruit, study, and follow patients on the scale needed to draw meaningful conclusions, particularly in interventional studies. The age-specific complexities of clinical trials or other interventional studies in pediatrics can further limit knowledge generation for a community that strives to advance care for our patients.

The use of real-world data, defined by the US Food and Drug Administration as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources,” has gained substantial traction in meeting the need for data as regulatory agencies across the world have started to accept these data in decision-making [1, 2]. The potential benefits of evidence derived from real-world data are substantial and include capture of populations usually under-represented in research, higher sample size in rare disease populations, evaluation of off-label use of medication, ability to evaluate effectiveness (vs. efficacy evaluated in clinical trials), and lower cost compared to traditional cohort studies and trials.

In their publication, Kohlhas et al. [3] use data from the International Pediatric Dialysis Network to further the use of real-world data assessing both the safety and effectiveness of continuous erythropoietin receptor activators (C.E.R.A.). Their report works to address some of the value propositions of real-world data. Specifically, the authors evaluate post-market data for hemodialysis patients and both off-label and post-market data for peritoneal dialysis patients.

While these data can provide additional insights into the way C.E.R.A. is used in pediatric dialysis patients, the strengths, limitations, and nuances must be considered when comparing or integrating data across sources and methods. Most often, data cannot be compared directly and requires context and analysis for interpretation.

Sources of real-world data

Sources of real-world data are varied in the data elements available, completeness of data, and timeliness of data available. For example, electronic health record data could include comprehensive data on clinic visits, medications prescribed, and laboratory results but may be limited to a single institution while the majority of children with kidney disease likely get their care at multiple different institutions or clinics. Alternatively, administrative claims data as are often available in national registries, such as the United States Renal Data System, can obtain patient data across multiple institutions or providers but is limited to the claims made to insurance [4]. This type of claims data can include clinic visits, procedures completed, and medications filled at pharmacies but will not capture laboratory data, clinic notes, or vital signs. Traditional cohort studies rely on study visits focused on a specific longitudinal evaluation of a disease state and will be limited by the initial scope of the studies and by the patient’s health status at research visits only. The data used by Kohlhas et al. [3] include a hybrid of chart abstraction and administrative data which could improve data quality but also faces issues ensuring the data elements are integrated accurately.

Limitations of real-world data

The challenges of developing evidence with real-world data must be addressed to provide actionable data in a regulatory or clinical capacity. In general, real-world data are routinely collected for other uses such as billing, clinical care, and research unrelated to the current questions. While challenges can vary based on the specific data utilized, common issues include missing or incomplete data, discrepancies in similar data elements, multiple types of bias (measurement, selection bias), and methodologic issues (multiple hypothesis testing).

In the study by Kohlhas et al. [3], data missingness played a key role in how the authors were able to analyze the data and to what extent conclusions could be drawn. The authors needed to account for the possible bias in the number of serum hemoglobin measurements available per patient (corresponding to the number of visits available for each subject). Despite being a central outcome of the analysis, the authors note a proportion of patients who did not have more than one hemoglobin measure in their data. Issues with unequal follow-up duration and data missingness must be accounted for to address potential causes of bias.

Addressing the limitations of real-world data

As the limitations of using real-world data are multifactorial, so are the ways to try and mitigate their challenges. These efforts can include evaluation of data provenance, examination of data quality and completeness, ensuring accurate data integration and management, and application of appropriate statistical methodologies.

Kohlhas et al. [3] used data from the International Pediatric Dialysis Network which aggregates dialysis information from institutions world-wide. Evaluating data provenance simply requires understanding where each data element in the data came from and why it is there. In large registries, this is vital as data is often being collected from different locations and health systems. These differences can lead to variability in how the data elements are measured, collected, and entered into the database that is finally used. Regardless of the data source, robust reporting standards are required to ensure real-world data can be utilized for clinical research and regulatory decisions.

When issues with data quality, such as missing data, are noted, they can be addressed in a few different ways. First, efforts can be made to find missing data elements. For many, this can become too resource-intensive to be feasible. Depending on the nature of the data collected, statistical methodologies such as multiple imputation can be used to complete data elements that remain missing. In Kohlhas et al. [3], these statistical methods were limited by low sample size and high levels of missingness. The authors addressed this by completing a sensitivity analysis which was limited to patients with at least 2 observations. The impact of missing data in this study seems to be minimal; however, the interpretation is difficult given the descriptive nature of the report.

Data linkages are another method of handling gaps in real-world data. While such linkages are becoming more commonplace, additional hurdles of data privacy, institutional policy, and national (and international) law exist. Furthermore, the cost of such linkages varies greatly. In the USA, the United States Renal Data System provides linkages to kidney failure registries at minimal or no cost providing electronic health record data or cohort studies information on longer-term outcomes than is usually feasible as well as outcomes for subjects lost to follow-up. Multiple other linkage systems are run via for-profit entities and require substantial funds.

Comparing real-world and clinical trial data

Real-world data are becoming important components of evaluating new therapies and care. Sources such as electronic health record data can play a role in evaluating off-label use and post-marketing safety. Increasingly, electronic health record data, often linked with administrative claims data, are used for the design and conduct of trials to screen and recruit patients as well as to capture follow-up data. All these uses require assurances that the real-world data source use can overcome its limitations [5, 6]. Finally, real-world data can be incredibly useful in the evaluation of how new therapies and discoveries are implemented into clinical practice. Once discoveries are known to be effective, real-world evidence can help investigators determine if those discoveries are leading to improved patient outcomes—the overall goal of all clinical investigations. Despite this promise, caution must be taken when directly comparing the results of such real-world evidence to that of clinical trials. The direct comparison of data collected and analyzed in different circumstances may lead to the wrong conclusions being drawn. Real-world data in clinical trials must be designed a priori to make more confident comparisons, which most current registries and other real-world data sources are not organized to do.

Looking forward

Researchers are working to incorporate real-world data to provide actionable evidence for regulatory and clinical decision-making. For broader uptake of these data sources and methodologies within pediatric nephrology research, further work is needed to ensure standards in reporting real-world data and evidence, both generally and disease-specific. Some standards are emerging in multiple networks, however, to build the infrastructure needed, reporting data and evidence need to be uniform across networks and groups [7, 8]. Such standards will also need to include data integration among data sources. Finally, pediatric-specific issues to overcome include standardization of derived variables for consistent analysis across studies and data privacy concerns for minors. With continued innovation in these domains, real-world data and evidence can continue to make progress in harnessing the potential they have to accelerate progress in kidney disease research.