FormalPara Key Points

This systematic review identified 29 studies published in 2016 using real-world evidence from observational studies to examine pediatric medication safety or effectiveness.

Studies varied in population, age groups, diseases or conditions, medications, study designs, data source, and methodologic rigor, and most relied on research-driven data collection rather than leveraging electronic health records or administrative claims databases.

Real-world evidence has not been fully applied to questions of pediatric medication safety and effectiveness.

1 Introduction

Real-world evidence (RWE) has the potential to supplement information derived from traditional clinical trials (those submitted to regulatory bodies), providing generalizable data in a shorter time frame and at lower costs [1, 2]. This is especially relevant to pediatrics, where the evidence base to guide medication use in children may be insufficient; however, the use of RWE in pediatrics has not been described [3,4,5]. Real-world data (RWD) are the data relating to patient health status and/or the delivery of health care that are routinely collected from a variety of sources, including electronic health records (EHRs), claims and billing activities, product and disease registries, and patient-generated data, including data from home-use settings and data gathered from other sources that can inform on health status, such as mobile devices [6]. RWE is clinical evidence regarding the usage and potential benefits or risks of a medical product derived from RWD analysis. RWE may be generated through randomized clinical trials or observational studies [6].

Efforts to increase the number of clinical trials enrolling children have had an impact on the evidence base as they have led to pediatric drug labeling; however, the challenges of traditional clinical trials with respect to time, cost, sample sizes, generalizability, and ethical considerations heighten the desirability of fully exploiting the potential of RWE [7,8,9,10]. A growing literature and methodology have emerged addressing the uses of RWE, increasing confidence in the evidence generated [11,12,13,14,15,16,17]. Until now, the degree to which these methods have been applied in pediatric studies has been unclear. The aim of this review was to characterize the state of RWE derived from observational studies focused on either safety or effectiveness in children published during a one-year period (2016) to identify the most current work available to us when we began the project. Because of the rapidly evolving nature of the field, there was no intention to compare to earlier years.

2 Materials and Methods

We conducted a systematic literature review to describe observational studies that used RWD to assess medication safety or effectiveness in pediatric populations, and to describe the studies by country, disease, medication, pediatric age group, safety and effectiveness endpoints, study design, and data source. Our review followed a prespecified protocol (available in the Electronic supplementary material, ESM). The term “pediatric” was defined as under 18 years of age. No funding was received for this work.

2.1 Search Strategy

The electronic search combined two strategies: a search of all PubMed journals for terms relating to the concepts of RWE and pediatrics, and an additional search through three prominent, high-impact general pediatric journals (Pediatrics, JAMA Pediatrics, and the Journal of Pediatrics) using the search term “medications.” This second strategy was designed to capture publications that met our general definition of RWE, but may not have referred to RWE or related concepts in their title, abstract, or keywords (both search strategies were combined into one, as shown in eAppendix 1 in eSupplement 1 of the ESM). The initial strategies cast a wide net, and the screening process narrowed the search to articles that met our specific criteria (see below). The PubMed search was augmented by an extended search of references within systematic reviews and expert suggestions from within the working group. The electronic search was conducted on July 5, 2017.

2.2 Study Selection

Individual titles and abstracts were screened by two team members (all nine team members participated in screening). If either screener coded the citation as eligible, the citation proceeded to full-text screening. Two team members screened each full-text article and reconciled their results to reach agreement on inclusion (eight team members participated in full-text screening). When necessary, the working group co-chair (TL) was available to adjudicate and assure consistency. Records were kept of individual screening, reconciled results, and adjudications where applicable. Studies were included if they reported primary research, reported on pediatric populations (all participants were under age 18 years, or results were reported separately for participants under 18 years, or there was a study population with a mean or median age of < 18 years), had medications as the exposure in the infant or child, assessed safety and/or effectiveness, specified a comparison or control group (including historical controls), and were published in English in 2016. These criteria excluded pragmatic and explanatory clinical trials; letters, guidelines, and case reports or case series; studies of vaccines, devices, or procedures as exposure; exposure during pregnancy or lactation; drug adherence as outcome variables; cost studies; and animal studies.

2.3 Data Extraction and Quality Assessment

Data were extracted to standardized forms by one team member and verified for accuracy by a second team member. Discrepancies and questions were resolved through consensus, as was done for screening. These processes conform to the recommendations of the University of York’s Centre for Reviews and Dissemination [18]. All team members participated in data extraction and review. Study quality was assessed using the Good Research for Comparative Effectiveness (GRACE) Checklist for Rating the Quality of Observational Studies, a validated assessment tool for observational studies of comparative effectiveness [19, 20]. The GRACE checklist was applied by one team member and reviewed by another. Differences were recorded and reviewed by team pairs and adjudicated by TL. All team members participated in applying the GRACE checklist.

2.4 Statistical Analysis and Reporting of Findings

Summary statistics were used to describe the studies and were calculated using Stata 12.1 and Excel. No tests for heterogeneity were done, as it was not the purpose of the review to combine estimates in a meta-analysis. Findings are reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement [21].

3 Results

3.1 Description of Included Studies

The electronic search of PubMed yielded 900 citations. Hand searching of systematic reviews and expert recommendations yielded 16 additional citations. After removing one duplicate, 915 citations were screened, and 29 studies met the eligibility criteria [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] (PRISMA flow chart provided in eAppendix 2 in eSupplement 1 of the ESM). For a description of the studies included, see eTable 1 in eSupplement 2 of the ESM.

Most studies were conducted in North America or Europe (24; 83%) (Table 1). In 23 (79%) studies, the entire study population was under 18 years old at baseline or first exposure to the studied drug. The six remaining studies included patients 18 years and over; these studies met the eligibility criteria because their mean or median age was under 18 years, or they reported data on a subgroup that was less than 18 years old (Table 2). In the 23 (79%) studies in which the entire population was under 18 years old, the study sample size ranged from 25 to 734,114 with a median of 367. The smallest study sample size was in a study of traumatic brain injury and the largest was in a study of asthma [40, 44]. Age groupings varied across the studies. Among the 23 (79%) studies in which the entire population was under 18 years old, 20 (69%) studies reported age ranges (Fig. 1). Seven studies focused on narrowly defined age groups of neonates or preterm infants, but other studies reported on age groups that spanned infancy to adolescence.

Table 1 Characteristics of the 29 studies that met the eligibility criteria
Table 2 Studies with participants 18 years and over and reasons for inclusion in the review
Fig. 1
figure 1

Age ranges in 20 studies with participants under 18

The studies varied with respect to the disease or condition defining the patient population. Psychiatric conditions were the most common group (5; 17%) and included diverse psychiatric diagnoses such as attention deficit hyperactivity disorder (ADHD), autism, and depression. Two of the five studies enrolled patients with psychiatric conditions without specifying the condition(s). Four studies reported on children with juvenile idiopathic arthritis, and four studies reported on preterm/low birth weight infants. The remaining 16 studies reported on conditions such as asthma, congenital heart disease, human immunodeficiency virus (HIV), infantile spasms, migraine headache, nephrotic syndrome, and nocturnal enuresis (see eSupplement 2, eTable 1 in the ESM).

3.2 Data Sources and Statistical Methods

Nineteen studies relied on manual medical record review and/or primary data collection at single or multiple institutions, with medical record review and extraction (12; 41%) being the most common method, followed by primary data collection on study forms (7; 24%). Four studies presented analyses from administrative claims databases. Four studies used electronic health record (EHR) databases; two used EHR databases from multiple institutions and two used EHR databases from single institutions. Two studies used data from registries. Two studies reported on power calculations. Nineteen studies (66%) reported the use of multivariable methods to control for confounding, followed by eight studies that reported no statistical method to control for confounding and two that relied on stratification to control for confounding.

3.3 Study Design

The most frequently used study design was the prospective cohort study (14; 48%), followed by the retrospective cohort study (9; 31%), the case–control study (5; 17%), and the self-controlled design (1; 3%). Thirteen studies compared a group with the treatment of interest to a group that was unexposed/untreated, and 12 (41%) studies compared one treatment to another treatment. Three studies compared the treatment dose to other doses of the same drug. One study used a self-controlled comparison to compare time on treatment to time unexposed to treatment.

3.4 Medication Exposure

Two approaches to the categorization of medication exposure were used. Sixteen (55%) studies reported on a specific medication or medication combination(s). The remaining 13 (45%) studies categorized exposure more broadly by drug class or other grouping. In studies using the first approach, no specific medication predominated. Three studies had methylphenidate as the medication exposure; others reported on combinations, including artemether/lumefantrine, oral desmopressin/oral oxybutynin, and histidine/tryptophan/ketoglutarate combinations. There was one study each of digoxin, midazolam, methotrexate, vigabatrin, aminophylline, and etanercept as the exposure. The studies of drug classes or groups included the following: antibiotics (2 studies; 7%), asthma medications (2; 7%), antirheumatic drugs (2; 7%), antipsychotics (1; 3%), and antidepressants (1; 3%).

3.5 Safety and Effectiveness Endpoints

Thirteen studies (45%) reported on safety endpoints that included serious adverse events (mortality, neurodevelopmental impairment, malignant tumors, cardiovascular events, and emergency department visits for adverse events) and other important medical events (hippocampal growth, fracture, blood acetaldehyde concentrations, visual field loss, necrotizing enterocolitis, pulmonary embolism, psoriasis, and obesity). These studies did not report on effectiveness outcomes and none reported on pharmacogenomic markers of safety.

Thirteen studies (45%) reported on effectiveness endpoints. Two measured effectiveness as a reduction in mortality, and most other studies reported on measures of improvement (e.g., behavior change, remission of infantile spasms, reduction in body mass index, reduction in hospital length of stay, drug impact on cognition/behavior/quality of life, and impact on disease after treatment) or prevention of a condition (e.g., prevention of nocturia for a minimum of 14 nights, prevention of uveitis). These studies did not report on safety outcomes and none reported on pharmacogenomic markers of effectiveness.

Three studies reported on both safety and effectiveness endpoints. One reported on treatment for primary nocturnal enuresis and had nights of dryness as an effectiveness endpoint; a second reported on treatment for juvenile idiopathic arthritis with disease progression as an effectiveness endpoint; and a third reported on nephrotic syndrome with prevention of relapse as an effectivenesss endpoint [26, 48, 50]. These three studies did not specify primary safety endpoints.

3.6 Quality Assessment

The GRACE checklist was used to assess the 29 studies for quality [19, 20]. The 11 item checklist assesses data attributes (items D1–6), and methods (items M1–5), with items scored as “sufficient” or “insufficient” based on a qualitative judgment by assessors. The number of items scored as “sufficient” for each study ranged from 4 to 11, with half of the studies scoring “sufficient” on at least eight items (eTable 2 in eSupplement 2 of the ESM). Over 90% of the studies scored as “sufficient” for items D2–D5 assessing data attributes of outcomes such as recording, objective measurement of clinical outcomes, validation or adjudication of outcomes, and consistent measurements of outcomes across treatment and comparison groups (Table 3). A smaller number (21; 72%) were judged to have provided adequate information about exposure (item D1), and an even smaller number (18; 62%) were judged to have adequately recorded known confounders or effect modifiers (item D6). Twenty-four studies were assessed as sufficient in the use of concurrent controls or the justification of historical control groups (item M2). Eleven studies were assessed as sufficient with regard to restricting the study population to new initiators of treatment (item M1) and sensitivity analyses to test key assumptions of the primary results (item M5); 17 studies were assessed as sufficient in regard to accounting for confounding and/or effect-modifying variables in design and/or analysis (item M3); and 15 studies were assessed as sufficient with regard to avoiding immortal time bias (item M4).

Table 3 GRACE assessment scores by item

4 Discussion

Our review of 29 observational studies published in 2016 reveals variation in the types and quality of RWE used to assess medication safety or effectiveness in children. The studies addressed diverse diseases, medications, and safety and effectiveness endpoints. While several studies reported on conditions that occur frequently (e.g., asthma, ADHD, depression), others addressed less common conditions (e.g., juvenile idiopathic arthritis, congenital heart disease, and nephrotic syndrome). One explanation of this finding might be that the use of RWD/RWE provides the opportunity to study rare conditions, although none of the studies included in this review reported that they were conducted to satisfy regulatory requirements. This issue remains to be explored more fully [51,52,53,54]. The medications studied included widely used groups such as antibiotics and antidepressants, but also included an array of less frequently used medications, corresponding to the less common diseases or conditions covered by these studies. In almost half of the studies, medication exposure was reported at the class level, with no details given about the specific drug entities used.

The studies varied in their data sources and data collection methods. Notably, fewer than a third used administrative claims or electronic health records databases, while the majority used manual chart review and primary data collected in single institutions. The contrast between data collected in large databases and data collected in research settings in terms of the delivery of health care has been the subject of much discussion [55, 56]. While most concede the higher accuracy of data collected in a research setting, many accept that databases collected for non-research purposes may offer larger sample sizes and a more generalizable study population. Others have also noted limitations of the databases that have pediatric populations, citing a lack of clinical detail (and basic information such as birth weight and gestational age) and validation of pediatric outcomes [57,58,59,60,61,62]. Some scientists have noted that RWE studies can have sample sizes orders of magnitude larger than those in RCTs [63]. In our review, half the studies had sample sizes of less than 365, which may be considered small for RWE studies, although sample sizes for RWE studies have not been documented and this may be a subjective assessment. These smaller sample sizes are consistent with the reliance on manual medical record review, primary data collection, and studies conducted at single institutions. This suggests that some RWE studies to assess medication safety and effectiveness in children have not fully realized one of the potential strengths of RWD: larger sample sizes.

One-quarter of the studies reported no statistical methods to control for confounding, which is concerning due to the well-understood biases introduced into observational studies by a lack of random allocation to treatment groups [64]. Recent decades have seen intensive efforts to develop a range of statistical methods to adjust for confounding and to estimate unmeasured confounding [11, 65,66,67]. These methods include regression-based approaches, propensity score based approaches, and other methods that have become well established and widely used.

We evaluated the quality of the studies by using the GRACE checklist because it has “been developed for noninterventional studies of comparative effectiveness to determine which studies are sufficiently rigorous to be reliable enough for use in health technology assessments” [19]. The GRACE checklist was developed with the intention of evaluating individual observational studies. The checklist developers identified the presence of sensitivity analyses as the feature most predictive of high quality, and speculate that this may be because “sensitivity analyses allow quantitative or semiquantitative estimates of how much a study’s results are dependent on any key assumptions.” [19]. The checklist developers noted two other attributes predictive of quality: use of concurrent comparators and restriction to new initiators of treatment. It is thus of some concern that only slightly more than a third of the studies we reviewed conducted sensitivity analyses or restricted study populations to new initiators.

Age is an especially important variable in pediatrics. In almost one-quarter of the studies in this review, we could not ascertain the number of study participants under 18 years old or the age range of the study participants. A lack of information about the age of a study population inhibits our ability to apply study results in a clinical or regulatory context and hampers reproducibility. The recent change in NIH policy requiring researchers to report the ages of participants is a step towards correcting this deficiency in studies funded by NIH [68].

Strengths of this systematic review include an a priori protocol, an electronic search of PubMed combined with an extended search of references cited in systematic reviews and expert suggestions, screening by two reviewers, and a review of data extraction by a second reviewer. The electronic search strategy included terms related to RWE such as comparative effectiveness, safety, and pharmacoepidemiology applied to all journals, and an additional search of three high-impact pediatric journals to identify studies not found by the first strategy. An additional strength was use of the GRACE checklist, an instrument that was developed for use in pharmacoepidemiology and emphasizes regulatory and policy decision-making.

The purpose of this systematic review was limited to a description of the studies published without assessing the impact of the studies on policy or clinical practice. Limitations of the search strategy include the restriction of the search to one calendar year (2016), one electronic database (PubMed), and articles published in English. While the restriction to one year reduced the number of studies included in this review, it ensured assessment of the most recent studies at the time the project was planned. We recognize that the time-intensive nature in which our review was conducted and reported has meant that our results do not reflect the state of the literature in the most recent calendar year. It is also possible that the restriction to articles published in English may have caused us to overlook RWE studies. Because RWE is a relatively new term, the indexing of publications on RWE is not yet consistent, and there is a need to continually refine search strategies for RWE as the field evolves. For example, our search strategy omitted the MeSH term “patient generated data,” a term that was introduced in 2018 (after our search was conducted) and that might be useful in future studies. The search strategy included the term “medications” but did not include terms for specific drug classes or specific drugs and may therefore have missed some relevant studies. We did not use terms for specific observational study designs such as cohort or case–control studies or methods such as chart review, so, again, we may have missed some relevant studies. Our search strategy concept for RWE included the term “medical records,” which includes the terms “medical record linkage,” “medical records systems, computerized,” and “electronic health records.” We did not include the terms “observational research” or “outcomes research,” so we may, again, have missed relevant studies. Harmonized definitions and MeSH terms will ensure greater efficiency and completeness when using search strategies to identify RWE studies in the future. Despite the limitations of our search strategy, we note that our yield (29 eligible studies out of 915 unique citations) was 3.2%, comparable to that obtained in a recent systematic review on a similar topic. Dukanovic et al., in a review of comparative effectiveness studies in children, found that 4.2% of the studies were eligible (164/3926; the large denominator here reflects their inclusion of studies from the inception of Embase and Medline) [69]. While comparative effectiveness is not the same topic as RWE, it may be similar enough to provide an indication of a reasonable yield for most search strategies. We also searched the Sentinel website and found no publications in 2016 that used the Sentinel system to study drug effectiveness or safety in children. The Sentinel site lists studies describing disease prevalence or drug utilization in children, but none studying drug effectiveness or safety published in 2016.

We did not specifically search for studies that were conducted as part of FDA or EMA post-marketing commitments.

Data extraction regarding exposure was limited to describing the drugs of interest and did not describe whether exposure was defined by prescribing information, dispensing data, administration, or other measures. Future studies can contribute by characterizing the types of exposure data collected in samples of RWE studies. Further limitations were the exclusion of pragmatic trials from the review and the exclusion of studies without control groups or comparators. Pragmatic trials were excluded because of the complexity involved in distinguishing pragmatic from explanatory trials. However, it is important to note that a clinical trial can generate RWE without having all pragmatic design features by capturing RWD rather than relying solely on research-generated data. Studies without control groups or comparators were excluded to identify higher-quality studies. Both groups of studies meet the definition of RWE and may need to be characterized in future reviews. Future works with a larger number of studies might investigate any potential differences in these areas. We did not summarize data on follow-up times because of heterogeneity in the study designs, disease groups, and patient populations. Finally, our quality assessment was administered by the nine members of our working group, and while this may have introduced some variability when applying the checklist, all members of the working group are trained pharmacoepidemiologists and/or pediatricians and bring expertise in pediatric pharmacoepidemiology to the GRACE checklist.

5 Conclusions

A small body of observational studies published in 2016 were categorized by the study team as using RWD to assess medication safety or effectiveness in children. Studies varied in age groups, diseases or conditions, and methods, and may not have fully met the FDA definition of RWE. Most studies relied on data collected at single institutions and did not use the growing number of administrative or electronic health record databases available for analysis; one-quarter of the studies did not use well-established statistical methods to control for confounders. This indicates that the use of RWE is not fully developed in pediatrics, and suggests an opportunity to further develop capabilities and more fully leverage administrative and electronic health record databases to study medication safety and effectiveness in children. As far as we are aware, regulatory guidance addressing the use of RWE in pediatrics has not yet been issued. Our systematic review appears generalizable to pediatrics broadly, and documents that the high level of activity in RWE in general has had less of an impact on pediatrics. Our study, the first of its kind, may form a basis for comparison moving forward.

The practice of pediatrics has long been hampered by the lack of a strong evidence base providing dose, efficacy, and safety information about the use of drugs in children, and this is observed across medical specialties to the present day [4, 70,71,72,73,74,75]. As noted in the AAP policy statement, “The performance of research studies to evaluate drugs in children is critical for determining the safety and efficacy of medications in children. Without this type of research, medication use in children will be limited to extrapolation from adult studies or off-label use for indications that have not been studied in children, thereby putting children at increased risk of adverse effects.” [70]. Studies using RWD offer the opportunity to assess long-term effects and the safety of medications used to treat chronic conditions (especially rare events) as well as the chance to study rare conditions [76]. Given the ethical and logistic constraints on the conduct of traditional pediatric clinical trials, it is incumbent upon us to foster the optimal use of RWD/RWE for the benefit of children. Reviews such as ours may help identify needs and areas for future development.