FormalPara Key Summary Points

Why carry out this study?

The identification of cancer progression events (i.e., disease worsening) is important for assessing therapeutic benefit. Measuring cancer progression using real-world data (RWD) requires distinct methodology from measuring progression in a clinical trial setting

We previously developed a novel method to reliably ascertain real-world cancer progression (rwP) in a cohort of patients with advanced non-small cell lung cancer (aNSCLC) using de-identified data from an electronic health record (EHR)-derived database (base case). We conducted this methodological study to determine whether the same method could identify cancer progression in five additional solid tumor types with a range of disease characteristics: metastatic breast cancer (mBC), advanced melanoma (aMel), small cell lung cancer (SCLC), metastatic renal cell carcinoma (mRCC), and advanced gastric/esophageal cancer (aGEC)

What was learned from this study?

Our results show that, with disease-specific additions to the base case for mBC, aMel, and SCLC, derivation of rwP from EHR documentation is feasible across the five additional cancers, despite differences in tumor biology

In addition, rwP can be used in endpoint analyses to produce clinically meaningful information that may be valuable in research

We believe our approach for reliably measuring rwP in multiple tumor types is a key metric to assess interventions to improve outcomes and enhance survival and quality of life for patients with cancer


Real-world data (RWD) are data collected as part of routine health assessment and care delivery from sources outside of a typical clinical trial setting, such as electronic health records (EHRs), medical claims and billing data, or disease registries [1, 2]. While not gathered for research purposes, RWD can generate real-world evidence (RWE) that may be leveraged to help address issues of external validity and generalizability for drug development programs [1,2,3,4]. RWE has also been used in regulatory and clinical decision-making, outcomes research, and safety surveillance [4,5,6,7,8,9]. Insights gained from RWE can be especially useful in scenarios where conducting a randomized controlled trial (RCT) is difficult or unethical, such as in rare, genomically defined cancers [4, 6, 10]. In order to support research efforts that use RWE, the US Food and Drug Administration (FDA) has developed a framework that outlines current use cases for RWE as well as opportunities in which RWE may have the potential to supplement or complement clinical trial data [5, 11]. In these contexts, robust and well-characterized outcome measures are crucial.

While improved overall survival (OS) remains a standard threshold to demonstrate the therapeutic benefit of an anticancer intervention, the identification of cancer progression events (and the evaluation of associated endpoints) is also important, especially to understand the durability of treatment effect [12,13,14,15]. In clinical trials of solid tumors, tumor burden changes are typically measured by applying Response Evaluation Criteria in Solid Tumors (RECIST) to imaging in order to determine response or progression events based on changes in tumor size [16, 17]. However, in a previous study, we found that it was not feasible to routinely use RECIST to retrospectively assess tumor size changes based on community oncology EHR-derived documents because of the lack of consistent documentation in imaging reports [18]. Accordingly, the analysis of non-randomized RWD to ascertain real-world progression (rwP) requires specific methodological considerations and abstraction approaches that account for issues such as data sourcing, quality, and completeness in order to generate meaningful and actionable insights [10, 19, 20].

Previously, we demonstrated that rwP can be identified from unstructured EHR-derived documents (e.g., clinician notes) using a novel abstraction approach that was applied to a cohort of patients with advanced non-small cell lung cancer (aNSCLC) [18]. This approach was found to be reliable, clinically meaningful, and scalable to a cohort of more than 30,000 patients; rwP-based endpoints (e.g., rwPFS, real-world time to progression [rwTTP]) were associated with downstream clinical events and were correlated with treatment-based endpoints (e.g., real-world time to next treatment [rwTTNT]) and with rwOS [21, 22].

Curating rwP from EHR documents for other tumor types would enable new uses of RWD, such as evaluating new therapies in a rapidly changing treatment landscape. Given that different cancer types have different biologies as well as progression patterns and dynamics, the overall objective of this methodological study was to assess the performance of the previously developed rwP abstraction approach in aNSCLC (the base case) for five additional solid tumor types. First, we evaluated feasibility of the base-case approach for each tumor type and if disease-specific additions to the base case were needed. We hypothesized that different source evidence may be referenced in routine clinical care in various disease settings. For the resulting abstraction approach for each tumor type, we repeated the characterization analysis performed for the aNSCLC base case; we assessed the feasibility, reliability, and performance (e.g., likelihood of a downstream clinical event relative to rwP and association of rwP-based endpoints with rwOS). Comparisons to treatment-based endpoints provided additional clinical context. Finally, we determined the suitability of implementing each abstraction approach in very large cohorts through a holistic, qualitative assessment of the rwP variable’s characteristics. We hypothesize that ascertaining rwP from EHR documentation using a previously developed approach with disease-specific additions as needed is feasible for five additional tumor types that demonstrate a range of disease characteristics.


Data Source

This retrospective study used data from the Flatiron Health database, a nationwide longitudinal, de-identified database derived from EHR data containing patient-level structured and unstructured data curated via technology-enabled abstraction [23, 24]. The analyses utilized data available at the time each experiment was conducted; by the end of the observation period used for the last set of analyses (December 31, 2018), the database originated from approximately 280 cancer clinics (ca. 800 sites of care) in the USA and Puerto Rico. The majority of patients in the database were drawn from community oncology settings and the rest from academic medical centers. All data were drawn from EHR documentation in these care settings (i.e., care received elsewhere not documented in the source EHR is not available).

The previously published rwP abstraction approach studied in aNSCLC was defined as the “base case”. Five additional tumor types were evaluated in this study: metastatic breast cancer (mBC), advanced melanoma (aMel), small cell lung cancer (SCLC), metastatic renal cell carcinoma (mRCC), and advanced gastric/esophageal cancer (aGEC). These tumor types were selected on the basis of the availability of a curated dataset and because assessment of disease burden was typically based on imaging obtained at a regular cadence, similar to aNSCLC in the base case. The specific inclusion and exclusion criteria for the individually analyzed cohorts varied by tumor type owing to disease-specific diagnostic and treatment nuances; however, all patients were required to have confirmation of diagnosis with the relevant tumor type through review of unstructured EHR documents and structured visit data. All diagnoses were confirmed by trained abstractors following standard procedures and protocols [23]. In addition, at the time of study entry all patients had initiated systemic therapy as indicated by oncologist-defined, rule-based line of therapy comprised of structured antineoplastic medication orders, structured antineoplastic medication administrations, and/or oral therapy data from unstructured EHR documents, depending on the tumor type (Table 1).

This observational study was performed in accordance with the Helsinki Declaration of 1964 and its later amendments. Institutional review board (IRB) approval of the study protocol, with a waiver of informed consent, was obtained prior to study conduct, and covers the data from all sites represented. Approval was granted by the WCG IRB. (“The Flatiron Health Real-World Evidence Parent Protocol”; IRB registration number IRB00000533; Protocol approval ID tracking number 420180044.)

Table 1 Definition of rwP and inclusion/exclusion criteria by disease

Feasibility and Assessment for Disease-Specific Additions to the Base Case

The base-case approach for aNSCLC leverages documentation in the EHR of the clinician’s interpretation of source evidence (i.e., radiographic, pathologic, or clinical assessment only) [18]. For initial feasibility assessment of the base-case approach, rwP events were abstracted for 20–30 randomly selected patients for each of the five additional solid tumor types, starting at the cohort entry date (diagnosis date, advanced diagnosis date, or metastatic diagnosis date, depending on the tumor type). This feasibility step also explored whether additions to the base case to incorporate disease-specific clinical and documentation factors were necessary. Research scientists with relevant oncology expertise performed a qualitative assessment of the evidence sources referenced in EHR documentation to support determination of worsening disease that are not captured by the base-case approach (e.g., physical exam, tumor markers, etc.). The disease-specific approaches were modified as relevant in an iterative fashion (Fig. 1).

Fig. 1
figure 1

Patient capture and rwP abstraction approach. EHR, electronic health record; rwP, real-world progression

Inter-Abstractor Agreement for and Availability of rwP Variable

The resulting disease-specific rwP abstraction approaches were applied to a larger cohort of randomly selected patients who had received treatment. Descriptive statistics were used to characterize baseline demographic and clinical characteristics for each cohort. We calculated median time to the patient’s first, second, and third imaging assessment, indexed to the initiation of first-line therapy unless otherwise specified, by identifying radiology document titles for individual unstructured notes. We also assessed median time to the patient’s first, second, and third clinician note to understand if patients were being followed as well as evaluated at anticipated time points consistent with expert clinical opinion of real-world practice patterns. For these median time-to-event analyses (e.g., time to first, second, third (1) imaging assessment or (2) clinician note), patients were also required to have started first-line systemic therapy for the tumor type at least 1 year prior to the end of the observation period, with the exception of aMel which was anchored to second-line systemic therapy. In addition, the imaging assessment/clinician note date was considered the event date and the censor date was the date of death (or end of the observation period for patients without a date of death).

In order to evaluate whether the capture of rwP aligned with clinical expectations, we calculated the proportion of patients by cancer type for the mBC, SCLC, mRCC, aGEC cohorts with at least one rwP event occurring at any point during the observation period and at least 14 days after the start of the patient’s index systemic therapy. We also assessed whether a rwP event was further corroborated by the occurrence of a clinically relevant downstream event in the curated dataset (i.e., new antineoplastic systemic therapy start, antineoplastic systemic therapy end, or death) within 15 days prior to or up to 60 days after the rwP event. We hypothesized that patients who started second-line therapy are likely to have progressed on prior therapy; therefore, we evaluated the percentage of patients with at least one rwP event in a subgroup of patients with evidence of second-line therapy. In the case of aMel, this evaluation was conducted on the full analytic cohort since all patients were required to have evidence of second-line systemic therapy.

Duplicate abstraction by two independent abstractors was conducted on a random subsample of at least 100 patients (ranging from 20.8% to 66.4% of the analyzed cohorts) for each tumor type to assess inter-abstractor agreement. Final sample sizes reflected study-specific factors at the time of each study. In the subset of patients abstracted in duplicate, the reliability of the capture of the first rwP event observed was evaluated by calculating (1) event agreement (i.e., the proportion of instances where both abstractors agreed on whether a rwP event did or did not occur, regardless of date documented for the event) and (2) date agreement (i.e., among patients for whom both abstractors agreed that at least one rwP event occurred, the proportion of instances where there was agreement on the date documented for the first event: exact date, within a ± 15-day window, and within a ± 30-day window).

Time-to-Event Analyses and Correlation with rwOS

Consistent with the base-case abstraction approach evaluation, the use of rwP as an outcome in time-to-event analyses (rwPFS and rwTTP) was compared to rwOS and to treatment-based real-world endpoints (rwTTNT and real-world time to treatment discontinuation [rwTTD]) (Supplementary Material Table 1) in order to pressure test performance of the rwP-based endpoints (i.e., do they behave as would be expected?). Dates of death for rwPFS and rwOS calculations were based on a previously described composite mortality variable [25]. Endpoint estimates were assessed using the Kaplan–Meier method and indexed to the start of first-line systemic therapy for mBC, SCLC, mRCC, and aGEC. In contrast, patients with aMel were required to have at least two lines of therapy to be included in the analysis, and analyses were indexed to the start of second-line systemic therapy. Kaplan–Meier curves and median time-to-event estimates were reported. The correlation between each real-world endpoint and rwOS was calculated using Spearman’s ρ and restricted to patients with a death and the real-world endpoint event of interest.

Assessment of Suitability for Implementation in Large Cohorts

For the final step of the evaluation, we determined the suitability of the rwP abstraction approach for each tumor type to be implemented in large cohorts (i.e., thousands or tens of thousands of patients) through a qualitative, holistic assessment. Research scientists with clinical and quantitative expertise considered the variable’s feasibility, inter-abstractor agreement, availability, correlation results, and performance in time-to-event analyses along with correlation results. Researchers with the relevant expertise were assembled for each disease evaluated for a cross-functional scientific team approach. First, the source evidence for progression (e.g., imaging, clinical exam) had to be sufficiently documented in the EHR to allow abstraction of rwP events consistent with the disease-specific expectations. Second, the capture of rwP had to be sufficiently reliable based on the duplicate abstraction output. Third, the presence of clinically relevant downstream events had to corroborate that the rwP event aligned with related measures (antineoplastic systemic therapy stop, antineoplastic systemic therapy start, or death) as expected. Supplemental in-depth chart reviews were conducted on a random subset of patients that did not appear to have a downstream event in curated data for confirmation and hypothesis generation. Finally, the observed correlations and time-to-event estimates needed to align with clinical expectations as well as with the published literature as applicable given the difference in populations of the largely clinical trial-based literature.


Across the five tumor types evaluated, the cohort sizes ranged from 152 (aMel) to 884 (mBC) and the observation period end dates spanned from August 31, 2017 (mBC) to December 31, 2018 (aGEC; Table 1). Cohorts (see key patient characteristics, Supplementary Material Table 2a–e) predominantly comprised patients from community sites (range 80.9–100.0%) and proportions of patients with later-stage disease (stage III/IV) at initial diagnosis ranged from 46.7% to 71.6%.

Assessment of Disease-Specific Additions

Disease-specific additions to the base-case abstraction approach were necessary for evaluating rwP in mBC, aMel, and SCLC, but not in RCC or aGEC (Table 1).

Inter-Abstractor Agreement for and Availability of rwP Variable

The first imaging assessment was observed within a median of 2.0 months after the start of first-line systemic therapy (range of medians across tumor types 1.1–2.0 months). For the analyses indexed to first-line systemic therapy (i.e., excluding aMel), at least one rwP event following the patient’s first-line systemic therapy, start date was identified in the majority of patients (range 55–72%). When the analyses were restricted to patients with evidence of having started second-line systemic therapy, the proportion with a rwP event identified at some point following the patient’s first-line systemic therapy start increased (range 89–94%) (Table 2).

Table 2 Assessments of feasibility and performance by disease for a random sample of patients meeting inclusion/exclusion criteria

For all tumor types, inter-abstractor agreement as to whether the patient had at least one rwP event or not, irrespective of date, ranged from 88% to 97% (Table 2). When both abstractors identified at least one rwP event, date agreement was lower for the exact date (agreement range 60–73%) compared with expanded time windows of ± 15 days (agreement range 75–81%) and ± 30 days (agreement range 77–86%).

Time-to-Event Analyses and Correlations with rwOS

We observed variability by tumor type in the occurrence of clinically relevant downstream events corresponding with a rwP event in the curated database. For example, 72% of patients with mBC who had a rwP event had a new antineoplastic systemic therapy start, antineoplastic systemic therapy end, or death occur within the specified time window relative to the rwP event, while this was observed in only 59% of patients with SCLC who had a rwP event (Table 2).

Median rwPFS ranged from 3.7 (aMel) to 7.7 (mBC) months while median rwTTP ranged from 4.6 (aMel) to 8.3 (mRCC) months (Fig. 2a–e). Correlations between rwOS and rwPFS ranged from a moderate association of 0.52 (aMel) to a strong association of 0.82 (SCLC) (Fig. 3a–e) [26]. In contrast, the range of correlations between rwTTD and rwOS (0.40–0.62) consisted, overall, of lower values.

Fig. 2
figure 2

Forest plot of medians (95% CI) for rwOS, rwPFS, rwTTP, rwTTD, rwTTNT by disease. A mBC. B aMela. C SCLC. D mRCC. E aGEC. aIndexed to second-line start date. aGEC, advanced gastric/esophageal cancer; aMel, advanced melanoma; mBC, metastatic breast cancer; mRCC, metastatic renal cell carcinoma; rwOS, real-world overall survival; rwP, real-world progression; rwPFS, real-world progression-free survival; rwTTD, real-world time to discontinuation; rwTTP, real-world time to progression; rwTTNT, real-world time to next treatment; SCLC, small cell lung cancer

Fig. 3
figure 3

Forest plot of Spearman’s correlations (95% CI) with rwOS by disease. A mBC. B aMela. C SCLC. D mRCC. E aGEC. aIndexed to second-line start date. Note: Only patients who had both an event for death and the endpoint of interest were included in the correlation analysis. aGEC, advanced gastric/esophageal cancer; aMel, advanced melanoma; mBC, metastatic breast cancer; mRCC, metastatic renal cell carcinoma; rwOS, real-world overall survival; rwP, real-world progression; rwPFS, real-world progression-free survival; rwTTD, real-world time to discontinuation; rwTTP, real-world time to progression; rwTTNT, real-world time to next treatment; SCLC, small cell lung cancer

Assessment of Suitability for Implementation in Large Cohorts

The final rwP abstraction approach for each of the five solid tumor types was determined to be suitable to implement across a large cohort of patients (i.e., at scale) on the basis of the researchers’ qualitative, holistic assessment of the variable. Additional chart reviews provided supplemental information. For example, independent hand-review of randomly selected charts of patients who had an abstracted rwP event without a downstream event captured in the curated database identified four primary reasons: (1) subsequent treatment was not a systemic therapy captured in the curated dataset (e.g., radiation therapy for brain metastases); (2) clinical decision was made to continue on the same treatment; (3) patient choice to not change therapy; and (4) loss to follow-up (e.g., hospice referral).


The current methodological study demonstrates that deriving clinically meaningful rwP events from EHR documentation is feasible and reliable across multiple tumor types, despite differing underlying cancer biology [22]. Consistent with prior evaluation of variable characteristics using the base-case approach previously in aNSCLC, results from the rwP-based time-to-event analyses (using the same approach with some disease-specific additions) were aligned with clinical expectations for each tumor type when considered in the context of rwOS. Although direct comparisons were not possible because of differences in patient populations, the association observed between rwOS and rwPFS by disease was directionally similar to the clinical trial literature for RECIST-based PFS [27,28,29,30,31,32,33,34,35,36,37,38,39,40].

Contextualization of rwP-based endpoint characteristics relative to treatment-based endpoints demonstrated that correlations with rwOS for rwTTD and rwPFS differed by tumor types as generally expected on the basis of the literature [27, 29,30,31,32,33,34,35,36,37,38,39,40,41]. For example, in SCLC the correlation between rwOS and rwTTD was lower compared to rwP and rwOS because at the time of the analysis, first-line therapy for SCLC consisted of a fixed number of therapy cycles [42]. In these types of situations, interpretation of the relationship between TTD and OS is most interpretable in the subset of patients who progress before the planned finish of the fixed treatment time course. In other settings, patients may switch therapies for reasons other than worsening of disease (e.g., treatment side effects); in contrast, patients may appear to continue on the same therapy despite evidence of disease progression, such as in cases of suspected immunotherapy-related pseudoprogression. These observations of nuanced clinical treatment decisions suggest potential challenges in using changes in treatment as direct indicators of the presence or absence of progression and, therefore, as an efficacy endpoint (e.g., time to next treatment).

Overall, rwPFS and rwTTP results were directionally similar to reported clinical trial results; the largest gap was observed in mBC, with shorter PFS for real-world patients. Potential hypotheses include selection bias (greater presence of aggressive disease subtypes [visceral disease or triple-negative] in the real-world cohort compared to clinical trials), and documentation of progression based on clinical assessment (e.g., worsening pain from bone metastasis) at an earlier time point than images, demonstrating changes in target lesion sizes consistent with RECIST-based progression are available [43,44,45,46,47,48,49,50,51,52,53,54,55].

As with all RWD, several considerations may apply to this work. First, disease-specific rwOS estimates observed in these real-world patient cohorts were generally shorter than in relevant clinical trials. This finding is unsurprising given that trial eligibility criteria tend to select for healthier patients, such as exclusion of patients with organ dysfunction [56]. Second, rwPFS estimates can be subject to heterogeneity in the frequency and type of outcome assessments in routine practice as compared with protocol-dictated clinical trial assessments. Third, while inclusion and exclusion criteria for this study were minimal, those implemented may have introduced bias. For example, in order to identify progression in the context of treatment, the analytic cohort was limited to patients who lived at least 14 days after starting index treatment, which may have resulted in survival bias. Fourth, results in this study provide researchers critical context for knowing whether the data are sufficient for potential future use cases. For example, inter-abstractor date agreement for a rwP event was lower for exact date agreement than for the ± 15-day and 30-day windows. Accordingly, researchers should consider what date agreement thresholds are necessary for answering a particular research question. The level of acceptable uncertainty (and differential between comparator groups) for the date of the event and the potential effects of censoring should be considered for time-to-event analysis. Fifth, as with all EHR-derived RWD, miscategorization (e.g., of advanced/metastatic diagnosis) is possible as a result of inaccurate documentation or abstractor error.

One of the strengths of our approach to ascertain rwP is the ability to revisit the source EHR evidence to identify gaps. This access to source data unlocks opportunities for quality check procedures and may facilitate enhancement of explanatory portions of abstraction policies. For example, abstractor disagreements emerging from differential interpretation of unstructured documents (or because of a lack of explicit documentation) triggered improvements in quality assurance/control procedures in how documents were reviewed. This standard quality approach across disease databases results in clearer, more streamlined and robust workflow practices. In addition, chart review provided insights into how the concept of rwP fits into individual patient journeys, including reasons why a downstream event was not observed in curated data around the time of a rwP event.

Clinicians routinely assess and document progression of cancer on the basis of multiple factors, which may or may not include imaging results. As previously shown [18], limitations in EHR data (e.g., radiology report documentation patterns lack the level of tumor measurement detail required for RECIST) constrain the ability to consistently derive RECIST-defined progression from routine EHR written documentations, regardless of tumor type. Access to raw files of imaging captured during routine care would enable concordance analyses at the patient level between the application of RECIST to radiographic images and the derivation of rwP from EHR documentation. The comparability of endpoints captured in the real-world versus the trial setting is a key scientific question, particularly in the context of using RWD in external comparator cohorts.

This study focuses on solid tumors; derivation of progression events for liquid/hematologic malignancies may require distinct processes, given differences in how disease burden is ascertained (i.e., often by labs or pathology, rather than by radiographic imaging) [57,58,59,60]. Even within the realm of solid tumors, this study highlights the need for tumor-specific considerations regarding cited source evidence, which in turn can inform the methodology behind the approach for ascertaining rwP for specific cancers. Therefore, cohort characteristics and idiosyncrasies in disease-specific standards of care warrant particular attention when the intention is to produce generalizable results.

Future steps to support development of robust and reliable variables and endpoints for RWE include standardization of approaches across various RWD sources. Appropriate deployment of novel endpoints also requires alignment in the field on the minimum set of metrics needed for characterization frameworks and on the criteria and standards for quality evaluation (e.g., completeness). Additionally, a system that enables continuous variable and endpoint performance reappraisal over time is needed to account for the evolution in documentation or data capture in real-world settings, changes in data sources, and changes in patterns of care. Finally, future work could define the dimensions for a checklist, which may include pre-specified thresholds, to determine “fitness for use” of a given variable (e.g., minimum date agreement at the cohort level) for specific research questions. Defining thresholds for real-world endpoint assessment requires further study because published literature may not always be relevant or available.


Derivation of a rwP variable from EHR documentation is feasible and reliable across the five solid tumors evaluated. Endpoint analyses show that rwP reflects clinically meaningful information, as demonstrated by the correlations of rwP with downstream events and other endpoints. On the basis of these findings, we determined that our rwP abstraction approach is suitable for implementation in large, real-world EHR datasets and may enable outcome analyses.