Background

Significant advances have been made in the treatment of early-stage breast cancer through the performance of large, randomised controlled trials (RCTs). However, traditional, large RCTs comparing adjuvant interventions are challenging and expensive to conduct, with the average per-study cost for a Phase III US oncology trial estimated to be $22.1 million USD [1]. Historically in breast cancer trials, the gold standard for collecting well established efficacy endpoints [2] related to disease progression, including recurrence, disease-free survival (DFS), invasive disease-free survival (iDFS), overall survival (OS), progression-free survival (PFS), and invasive breast cancer-free survival (iDFS) has been prospective, individual patient follow-up. However, low occurrence rates of these outcomes often result in limited numbers of events over long follow-up periods, high financial costs of study staff and testing regimens, and significant burdens on trial participants. Novel trial designs may reduce some of these challenges and the expense of intensive individual patient follow-up. One potential strategy is the exploitation of routinely collected healthcare data (e.g., administrative data) from one or more source databases [3,4,5,6,7].

Routinely collected healthcare data (RCHD; or real-world data) are data that have been systematically collected for reasons other than research or without specific research questions. Examples of RCHD include information from electronic health records (EHRs), health administration data, disease registries, and epidemiologic surveillance systems [6]. Administrative data have been shown to be a compelling source of long-term comparative effectiveness data in registry based RCTs, demonstrating minimal losses to follow-up for rare outcomes that require long follow-up periods [8]. In cardiovascular research, investigators have used RCHD by integrating administrative data and EHRs to create a cardiovascular-specific database that supports data analytics in their field [9]. Similarly, harmonized data sets such as the administrative datasets housed at the Institute for Clinical Evaluative Sciences (ICES) in Ontario, Canada, have been shown to have the potential to improve the economy and quality of data collected in clinical trials, while minimizing data collection burdens on patients [10, 11]. Furthermore, with patient consent, prospective linkage of personal data with health administrative records was both feasible and accurate [11, 12]. Another study of the utility of health administrative data to identify breast cancer recurrence in reproductive-aged women found that recurrence could be detected with moderate validity using a case definition of greater than or equal to 10 months between original diagnosis date and the subsequent appearance of two or more cancer diagnosis codes [13]. The validity of detection of breast cancer recurrence in administrative datasets may be further improved using computer-coded algorithms of high sensitivity and specificity.

Breast cancer recurrence is not directly captured in RCHD sources with a diagnostic code [14], such as an ICD-9/10 code, and instead must be inferred from the accumulation of other diagnostic codes and health system contacts made by the patient during the investigation of potential disease progression. The presence of diagnostic codes—including additional diagnoses, laboratory tests, imaging evaluations, and drug prescriptions—in a patient’s electronic health information and their timing relative to the initial breast cancer diagnosis may be analysed to identify patterns indicative of incident disease progression. Detection of such codes or patterns can identify a patient for follow-up by the study team to confirm if a disease progression event has occurred. The targeted follow-up of only trial participants that likely have experienced disease progression reduces the high costs and burdens of scheduled follow-up at regular intervals required of all RCT participants in order to collect data on both disease progression and survival. Additionally, case definitions for disease progression based upon diagnostic codes in RCHD can be used to inform outcome identification for the purposes of data analysis. This strategy can also reduce costs and provide an advantageous source of long-term follow-up information.

The use of RCHD also has limitations [10, 11, 15, 16]. Generally, only quantitative data are available for specific outcomes, such as survival or hospital visits, which limits the scope of research objectives that can be addressed. Qualitative health behaviours and other endpoints of importance in oncology studies, such as the occurrence of and date of disease recurrence for calculations of DFS or PFS, cannot be routinely analysed. Systematic reviews of the use of administrative data for non-cancer conditions such as sepsis, heart failure, and neurologic conditions have shown that the diagnostic performance of endpoint detection algorithms can vary notably in relation to the number of codes used [17,18,19]. We will extend previous review strategies to breast cancer by conducting a scoping review to map the features and diagnostic performance of existing case definitions and algorithms using RCHD that have been used to define recurrence, DFS, iDFS, OS, and breast-cancer-free survival (BCFS) in early-stage breast cancer patients (i.e., neo/adjuvant patients), as well as PFS and OS in metastatic breast cancer patients.

Methods

This scoping review will be performed in consideration of methods guidance from JBI (formerly the Joanna Briggs Institute) [20, 21]. This protocol has been registered with the Open Science Framework (https://doi.org/10.17605/OSF.IO/6D9RS); given the iterative nature of scoping reviews, protocol amendments with their rationale will be documented in the completed review. This protocol has been reported with consideration of the Preferred Reporting Items for Systematic Review and Meta-Analysis Extension Statement for Protocols (PRISMA-P) [22, 23] and reporting of the final review will be guided by the PRISMA Extension Statement for Scoping Reviews (PRISMA-ScR) [24]. We will address the following review questions:

  1. 1.

    What existing case definitions in terms of diagnostic and billing codes within RCHD sources, their timing relative to original breast cancer diagnosis, or other features have been studied to identify disease progression and survival events in breast cancer patients?

  2. 2.

    What was the diagnostic performance of these case definitions compared to a reference standard, as measured by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and/or other measures?

Study eligibility criteria

We will use the following selection criteria to identify relevant studies for the planned review, guided by the Population – Concept – Context (PCC) framework.

Population

Women with either early-stage or metastatic breast cancer managed with established breast cancer therapies (e.g., chemotherapy, radiation, repeat surgery). Sample of mixed cancer populations (e.g., breast cancer and colon cancer) will be excluded unless separate findings have been reported specific to breast cancer.

Concept

Primary studies examining the diagnostic accuracy of one or more case definitions or algorithms of disease progression (i.e., recurrence, PFS, DFS, or iDFS) or survival (i.e., BCFS or OS) compared with a reference standard measure (e.g., chart review, clinical trial dataset). Diagnostic accuracy is anticipated to be reported as sensitivity, specificity, PPV, NPV, or an estimate of area under the curve (AUC), however other measures of agreement will also be considered. Case definitions and algorithms must have been applied to RCHD/administrative data sources or EHRs, with the goal of detection or estimation of time of occurrence of one or more of the progression or survival events above. Studies involving the use of machine learning methods such as natural language processing to process unstructured data (e.g., clinician notes from electronic health records) for use in case definitions will also be of interest. Algorithms or models developed to predict future survival or another endpoint will be excluded. Studies focusing on differences in algorithm diagnostic accuracy with different data sources will be excluded.

Context

Studies from any geographic region will be of interest. Only studies published in English will be sought, without restriction on date of publication.

Information sources and searching the literature

Literature search strategies will be developed for MEDLINE, EMBASE, and CINAHL using controlled vocabulary (e.g., MEDLINE subject headings) and free-text words by an experienced information specialist with input from the project team (see Appendix). A second information specialist will peer review the strategies using the Peer Review of Electronic Search Strategies (PRESS) Checklist [25]. Searches will be restricted to the English language and animal records will be removed.

Processes for study selection

Records will be downloaded and deduplicated using EndNote version 9.3.3 (Clarivate Analytics) and uploaded to the online systematic review software DistillerSR® (Evidence Partners Inc, Ottawa, Canada). Screening of citations will be conducted by two independent reviewers first using titles and abstracts (Stage 1 screen). The full texts of the potentially relevant citations identified at Stage 1 will be further screened by two independent reviewers (Stage 2 screen). A calibration exercise will precede both stages of screening to ensure consistency in the application of eligibility criteria by reviewers (batches of 50 to 100 citations at Stage 1 and batches of five full texts at Stage 2, until conflicts are less than 5% and all reviewers are comfortable with the screening criteria). Conflicts during screening will be resolved by discussion until consensus is reached or by consultation with a third review team member. In the final review report, we will document the study selection process using a PRISMA flow diagram and include a list of studies excluded at Stage 2 screening, with reasons for exclusion [26].

Use of artificial intelligence – stage 1 screening

The artificial intelligence/machine learning (AI/ML) feature of DistillerSR® will be used to perform prioritized Stage 1 screening [27]. The AI/ML algorithm will be trained by the reviewers, who will begin by screening a small number of known relevant citations along with an additional random sample of citations from the search results to a total of 200 citations. This will expose the AI/ML tool to both relevant and non-relevant citations. The AI/ML tool will subsequently generate relevance scores for the remaining citations (i.e., an estimate of the probability of relevance), which will be used to order the citations from high to low potential relevance as they are presented to the review team for screening. The tool will continue to learn and re-order citations throughout Stage 1 screening. The study team will monitor and resolve conflicts frequently throughout Stage 1 screening to ensure the AI/ML tool continues to be trained on accurate selection decisions. The study team will monitor the proportion of predicted relevant references that have been found (a measure approximated by the AI/ML tool) as well as the decline in new relevant citations identified over time. Once 95% of predicted relevant references have been identified and the yield of new relevant citations is minimal, the AI/ML tool will be used as a single reviewer to exclude all remaining unscreened citations. A single human reviewer will continue to screen all citations and will re-engage a second human team member at any time there is a disagreement with the AI/ML screener. This process will allow for efficiencies in Stage 1 screening, while ensuring two reviewers can still be involved as needed to minimize the risk of omissions related to use of the AI/ML tool. Members of the study team (BH, DW) have several years of experience in the use of DistillerSR’s AI/ML tool and will lead its implementation in this scoping review.

Data collection

Once all relevant studies have been identified, data extraction will be performed by two reviewers using a standardized extraction form in DistillerSR® software. A pilot extraction exercise will first be performed on a selection of three studies to ensure consistency between reviewers. Data collection will consist of gathering the following information from each included publication: study characteristics (e.g., authors, year/journal of publication, country of study performance, breast cancer population [early-stage versus metastatic]), treatment characteristics (e.g. endocrine therapy, chemotherapy, biological-targeted therapies), data source characteristics (e.g. name, location, type), study methods (e.g., description of algorithms/case definitions assessed, data linkage information, type of reference standard group, years of data studied, description of enrolment criteria of study population, statistical methods used to assess performance characteristics), data summaries related to diagnostic accuracy of each algorithm evaluated (e.g., sensitivity, specificity, PPV, NPV, or related data if these measures are not reported, but information to inform their calculation are available), and a summary of authors’ cited limitations and conclusions. No risk of bias appraisals will be performed, in alignment with common practice for scoping reviews [21].

Data analysis

Given the chosen scoping review design and data types extracted, we will employ a descriptive approach to synthesis to summarize the methods and findings of the included studies, supplemented by use of tables and figures to convey key data. Presentation of results will be stratified by outcome. We will use tables to present information regarding study populations (i.e., key clinical features, years of study data and geographic setting), study design characteristics (e.g., nature of reference standard, study size, relevant information regarding data linkages), and details regarding the algorithm employed to assess the outcome(s) of interest (e.g., ICD codes used and other pertinent information). Diagnostic accuracy for each of the algorithms or case definitions used per outcome (i.e., sensitivity, specificity, PPV, NPV) will be summarized both descriptively and in structured figures/tables as determined to be most intuitive by the research team. Comparisons with clinical trial data (if discussed) and study limitations will be similarly summarized. Quantitative results will also be described in tables and figures.

Discussion

Findings from this scoping review will be clinically meaningful for breast cancer researchers. Our research team, the RE-thinking Clinical Trials Program (REaCT; https://react.ohri.ca), is Canada’s largest pragmatic oncology trials program based in Ontario, Canada. In addition to performing pragmatic trials, the REaCT mandate is to identify feasible and accurate strategies to measure patient-important outcomes in ways that lessen burden for patients. To date this has included strategies such as implementing oral consent, avoidance of trial-mandated clinic visits and the use of virtual visit techniques to make trial participation available for patients irrespective of how far they live from a cancer centre [28]. Despite these strategies, long-term follow-up of patients remains a costly component of performing clinical trials and can be cost prohibitive for obtaining peer-reviewed funding for innovative studies that could significantly improve the care of cancer patients. Hence, if the using real-world data enables patients to be reliably followed for various clinical trial endpoints, it could provide a paradigm shift that will reduce study budgets and make study participation easier for patients and their families. Such benefit would be a major improvement both in Canada and globally.