What is new

Key findings

We established a comprehensive database on novel cancer drugs and therapeutic products approved by the US Food and Drug Administration (FDA) between 2000 and 2016. The current database will be used to describe the clinical trial evidence generated in the pre-marketing period, but the database can and will be updated and expanded for future meta-epidemiological analyses.

What this adds to what is known

Publicly available drug approval documents offer highly valuable information that is very useful for evidence syntheses and research-on-research projects. The Comparative Effectiveness of Innovative Treatments for Cancer (CEIT-Cancer) database transparently describes and characterizes such information.

What is the implication, what should change now

This database allows systematic analysis and assessment of early evidence on the benefits and harms of novel drug treatments and provides a solid basis for continuous meta-epidemiological analyses.

Background

Cancer drug development is characterized by a perceived urgency to find novel treatments that improve patients’ survival and quality of life. Timely access to such beneficial treatments is considered paramount for patients with cancer. Before granting approval and market access, health authorities such as the FDA review the available evidence on benefits and harms from clinical trials and the claims made by the pharmaceutical companies and sponsors of the trials. The FDA examines the submitted clinical trial results, re-analyzes the trial’s patient-level data, and evaluates whether the trials were conducted and analyzed in accordance with the original study protocols [1, 2]. For drugs and therapeutic biologics that receive approval, the FDA reviews are made publicly available in the Drugs@FDA database as “approval packages” [3]. These packages provide a wealth of information on the evidence on benefits and harms of innovative treatments at the time of approval.

With the introduction of new incentives and approval pathways, the FDA aimed to facilitate the development and approval process of drugs intended to treat serious or life-threatening conditions, including cancer [4]. For example, some policies focus specifically on orphan drugs for rare diseases [4]. Between 2000 and 2012, 46 out of 47 oncology drugs approved by the FDA underwent expedited approval [5]. In 2012, a further policy for so-called “breakthrough” therapies was introduced for drugs with highly promising clinical evidence [5].

However, there is increasing discussion about the impact of these regulations because they may leave evidence gaps regarding efficacy and safety and increase uncertainty in clinical decision making as expedited and orphan drug approvals are often based on smaller studies than used in traditional approvals [6]. At the time of approval, there may be a dearth of evidence on hard clinical outcomes and subsequent follow-up evaluations suggest that such evidence either may never become available or may end up showing limited or no benefits [7,8,9]. Oncology and hematology are probably the medical fields which are currently most affected by such developments.

Numerous meta-epidemiological studies aimed to better understand the evidence at the time of approval of novel cancer drugs and therapeutic biologics using data from the FDA and the European Medicines Agency (EMA). We give an overview of these studies and the research in context in Table 1 (details of the underlying search strategy are provided in Additional file 1). The first related investigation that we are aware of was published in 2009 [10], and the number of publications peaked in 2017 with 10 articles. Nonetheless, a major limitation is that many of these studies cover only certain types of cancer (for example, solid tumors). Overall, there are four studies [10,11,12,13] which describe regulatory characteristics and clinical trials and assess endpoints and effect sizes used for approval on all cancer drugs, but none of them covers the most recently approved drugs (for example, after 2013). This would not allow the assessment of newer policies such as the breakthrough program introduced in 2012. Thus, the current knowledge on approval evidence for cancer drugs is marked by not only a limited scope but also a great diversity in methods and approaches, reducing the interpretability of the findings.

Table 1 Publications with similar or overlapping research questions

To address such limitations, we intended to establish a comprehensive database allowing a continuous analysis of such regulatory developments in meta-epidemiological research. The ongoing “Comparative Effectiveness of Innovative Treatments for Cancer” (CEIT-Cancer) project aims to transparently describe and characterize the clinical trial evidence of novel cancer drugs. Our goal is to capture the relevant information required to systematically analyze and assess early evidence on benefits and harms of novel cancer drug treatments.

As a first step, we collected the pre-marketing clinical trial evidence using FDA approval documents with a specific focus on cancer drugs, randomized controlled trials (RCTs) and single-arm trials (SATs), and treatment effects on overall survival (OS), progression-free survival (PFS), and objective response rate (RR). However, the overall database structure is organized in a modular nature which allows continuous updating of the list of drugs, the addition of new variables, expansion of the number of topics, health authorities, and outcomes as well as linkage with other related datasets (for example, from post-approval evidence including non-randomized real-world studies).

Herein, we describe the rationale and design of the data collection process for the pre-approval evidence, including the organization of the data capture, the identification of clinical trial information, the assessment of trials for eligibility, and the data extraction.

Methods

Data collection

Project organization and database structure

The data collection consisted of three steps. In step 1, we made an inventory of novel FDA-approved drug products and acquired the corresponding FDA approval packages. In step 2, we made an inventory of RCTs and SATs reported in FDA approval documents, assessed their eligibility, and extracted trial design characteristics. In step 3, we extracted treatment effects on OS, PFS, and RR.

Steps 2 and 3 started with a planning and organizing phase (operationalization of concepts, drafting of an instruction manual for standardized data selection and extraction, setting up the extraction platform, pilot testing of the instruction manual and extraction platform, and training of reviewers) followed by an execution phase (independent data extraction and verification) and ended with a closing phase (documentation of activities). Specific project activities are described in greater detail in the following sections.

The clinical trial data were managed in a single database. The database consists of four data tables (with information about the drug, indication, trial, study groups and treatment comparisons, and treatment effects) that are linked in one-to-many (1:n) relationships (Fig. 1). The relational structure is indispensable because of the nature of the data (for example, multiple indications approved for a single drug, multiple clinical trials supporting approval of a single indication, and multiple comparisons within a single multi-arm clinical trial). We used both Microsoft Access as a local data extraction and management platform and Ragic [14] as a cloud-based equivalent.

Fig. 1
figure 1

Database structure used in the Comparative Effectiveness of Innovative Treatments for Cancer (CEIT-Cancer) project for data collection and management

Step 1: Inventory of FDA-approved drugs and acquisition of approval packages

The aims of this step were to identify and characterize all drugs licensed by the FDA for the treatment of cancer diseases and to download as well as prepare FDA approval documents for subsequent activities. This step was performed by a single reviewer (AL).

Inventory of FDA-approved drugs

In a first stage, we created a list of novel drugs and therapeutic biologics (referred to in this article as “drugs”) that were granted their first FDA marketing authorization between 1 January 2000 and 31 December 2016. (Technically speaking, we included so-called “new molecular entities” and “new therapeutic biologics” approved via either a “New Drug Application” or a “Biologics License Application”.) The drug names were collected from the “Annual drug and biologic approval activity” reports for new molecular and biological entities (2000 to 2016) [15] as well as the “FDA reports on drug innovation” (2011 to 2016) [16]. Information on therapeutic biologics approved before 2004 is not available in these documents and therefore we reviewed the drug approval reports by month for the period of January 2000 to December 2003 obtained from the Drugs@FDA database [3].

Selection of cancer indications

In a second stage, drugs were considered for inclusion in the CEIT-Cancer database if the original approval (that is, the first-ever approved use of a novel drug) was for the treatment of a solid tumor or hematological malignancy. Drugs without presumed cancer activity, such as supportive care drugs (for example, anti-emetics and hematopoietic stem cell mobilizing agents) or imaging drugs (for example, diagnostic radiopharmaceutical agents), were excluded. A medical oncologist (BK) was consulted in case of any doubts about eligibility.

Extraction of information on drug, indication, and regulatory characteristics

In the third stage, we collated information on drug, indication, and regulatory characteristics for each eligible drug and cancer indication (“drug-indication pair”; Table 2). The line of treatment was determined by a medical oncologist (BK). The remaining information was retrieved from various information sources as follows.

Table 2 Variables collected in step 1 for each cancer drug-indication pair

For drug-indication pair characteristics:

  • “Annual drug and biologic approval activity” reports for new molecular and biological entities (2000 to 2016) [3], “FDA reports on drug innovation” (2011 to 2016) [16], and a peer-reviewed publication [17] for drug and regulatory characteristics, and

  • the first-ever available FDA drug label from the Drugs@FDA database [3] for information about the FDA-approved indication(s).

For information on additional expedited programs and orphan status, we perused the following:

  • “FDA reports on accelerated approvals” to identify accelerated approved indications [18]; that is, indications approved on the basis of preliminary evidence that does not meet regulatory standards for traditional (full) approval [4];

  • “Breakthrough designation approval” reports [19] to identify indications that received a breakthrough therapy designation in the pre-approval period; that is, drugs that are expected to advance the treatment of certain diseases [4]; and

  • FDA database of orphan drug product designations to identify indications that received an orphan status [20]; that is, drugs intended for the treatment of rare diseases affecting fewer than 200,000 people in the US [21].

All documents were downloaded or accessed on 2 November 2015 (for the 2000 to 2012 approvals) and 2 March 2017 (for the 2013 to 2016 approvals). We relied on the information from the Drugs@FDA database in the case of discrepant information between information sources (for example, if there were different approval dates presented). We categorized the drug innovation class (first-in-class, advance-in-class, and addition-to-class) in accordance with the algorithm of Lanthier et al. [17]. Accordingly, first-in-class drugs can be seen as “true” therapeutic innovation and define a new drug class. Advance-in-class drugs may offer an important therapeutic advance (that is, they were granted priority review by the FDA) over existing drugs in the same class. Drugs that do not fall under either of these two categories are categorized as addition-to-class.

Approval packages

The FDA’s review of the pre-clinical and clinical information generated by a biopharmaceutical company during the course of drug development is summarized in FDA “approval packages” published in the Drugs@FDA database. We used a similar approach to retrieve the approval documents as described recently [22], and we provided practical details on how we navigated the documents elsewhere [23]. The following documents served as source documents throughout this project and were made suitable for text searching using Adobe Acrobat’s Optical Character Recognition (OCR) function:

  • Medical review (sometimes referred to as clinical review)

  • Statistical review

  • Drug label

  • Cross-discipline team leader review

  • Summary review

  • Multi-discipline review.

Step 2: Trial selection and characterization

The aims of this step were to identify eligible clinical trials in the medical review, assess their eligibility, and characterize their design characteristics. These activities were performed by teams of two independent reviewers. Trials include randomized and non-randomized studies (the latter within the category of SATs), and for each trial the database explicitly indicates whether a randomized design was used.

Identification of trials, eligibility assessment, and data extraction

Each reviewer was provided with a set of indications to identify potentially eligible trials. Reviewers independently searched the medical review document for randomized trials as well as for trials that were indicated as pivotal for approval (that is, the trial was described as “approval”, “registration”, “major”, “pivotal”, or similar) regardless of whether they were randomized or not. For each trial, the reviewers recorded variables presented in Table 3. In particular, they extracted the study identifier, name, or acronym and determined whether the following criteria were met (each criterion was assessed separately):

  1. (1)

    the trial was explicitly described as pivotal to approval,

  2. (2)

    the patients were randomly assigned to treatment arms,

  3. (3)

    the patients matched broadly in their disease characteristics with the approved target population,

  4. (4)

    the patients were randomly assigned to at least one control arm that did not contain the drug under review (regardless of dose or administration schedule),

  5. (5)

    as per the judgment of the reviewer, a trial could still be relevant even if none of the abovementioned criteria was met; for example, if the trial is extensively discussed or the only trial evaluated in the medical review (which is sometimes the case in accelerated approval settings, where such trials are often not explicitly labeled as “pivotal” but extensively discussed in the documents).

Table 3 Variables collected in step 2 for trials that were randomized or explicitly labeled as pivotal

After completion, the two independently generated datasets were compared and disagreements resolved by consensus. The inter-rater reliability for trial identification (as assessed with the Kappa statistic [24]) was good (74%). Ultimately, trials that met any of the following sets of criteria were deemed eligible:

  • the trial was described as pivotal (criterion 1 alone is met; categorized as “explicitly pivotal”)

  • the trial was not described as pivotal but was randomized (criterion 2), enrolled a population that matched the approved target population (criterion 3), and had a control arm that did not contain the intervention under review (criterion 4) (categorized as “likely pivotal RCT”)

  • the trial was not “explicitly pivotal” or a “likely pivotal RCT” but considered otherwise essential (criterion 5) for the approval decision (categorized as “other pivotal”). Such trials were typically single-arm studies in accelerated approval settings.

For each eligible trial, teams of two independent reviewers extracted information on variables presented in Table 4.

Table 4 Variables collected in step 2 for eligible trials only

Step 3: Treatment effect estimates on overall survival, progression-free survival, and response rate

The aim of this step was to retrieve treatment effect estimates on OS, PFS, and RR for each treatment comparison. This information was collected only for RCTs. This activity was performed by teams of two independent reviewers.

Data extraction

We preferred trial analyses conducted by the FDA over sponsors’ analyses, whenever both were available. Similarly, more recent data cutoff dates were preferred over older cutoff dates if there were multiple analysis results on the same endpoint available. We used the statistical review document (or any other FDA approval documents) if the medical review document was not available or was incomplete or not legible.

For each treatment comparison, two reviewers independently searched the FDA review documents for treatment effect estimates on OS, PFS, and RR and extracted information on variables presented in Table 5. For OS and PFS endpoints with incomplete or missing information (for example, no confidence interval), we approximated treatment effect estimates following the methods described by Parmar et al. [25] and Tierney et al. [26]. At the end of the data collection activities in this step, the datasets of the two reviewers evaluating the same set of treatment comparisons were compared, and disagreements were resolved by consensus.

Table 5 Variables collected in step 3 for eligible randomized controlled trials retrieved on comparison level

Discussion

We have successfully developed the CEIT-Cancer database, which transparently describes and characterizes information on the clinical trial evidence of novel cancer drugs at the time of their approval by the FDA.

Exploring characteristics of the evidence of novel cancer drugs at the time of their approval could greatly improve our understanding of the real-world clinical benefit and safety of such treatments. Importantly, it may also open new avenues of future research and regulation, leading to better-designed studies, reduced waste in research, and more rigorous criteria for health authorities and health systems to consider incorporating new interventions into the current cancer armamentarium.

The CEIT-Cancer database is a comprehensive, manually curated platform that captures regulatory, drug, indication, and clinical trial data from FDA approvals of novel cancer drugs. This database differs from previous investigations in three important ways. First, the CEIT-Cancer database covers a time frame of 17 years, substantially larger compared with most previous studies. Second, it assesses all types of cancers, including both solid tumors and hematologic malignancies. Third, the database encompasses the most recent FDA drug approvals. In addition, this database can be expanded to other medical fields and be linked with other databases. It can be augmented with post-approval evidence and also can be expanded for data extraction of approval documents from other health authorities, such as the EMA [11, 27].

We have set up the database and realized the project in a multidisciplinary team including experts in clinical trial methodology and conduct, clinical epidemiology, health technology assessment, biostatistics, clinical research, information management, public health, and medical oncology. The initial dataset covers a time period of 17 years. This allows us to investigate several regulatory developments over time and changes in the focus of drug development, such as the development of targeted agents and immunotherapy in contrast to classic cytotoxic chemotherapy. Following standardized and established data extraction procedures as in systematic reviews, we created a large evidence base on treatment effects and trial quality. This lays the foundation for our planned continuous meta-epidemiological analysis of novel drugs and therapeutic biologics within the CEIT-Cancer project. We are currently developing the infrastructure to make the database available and aim to obtain structural funding and support to provide a sustainable solution. Through the collaborating participation of other investigators, we aim to establish a data-sharing process to provide access to the database and foster further research.

Conclusions

Publicly available drug approval documents offer highly valuable information that is very useful for evidence syntheses and research-on-research projects. The CEIT-Cancer database transparently describes and characterizes this information on the clinical trial evidence of novel cancer drugs. It allows systematic analysis and assessment of early evidence on benefits and harms of novel drug treatments in meta-epidemiological research. The modular nature and structure of the database as well as the data collection processes permit continuous updates and expansions. Overall, the database provides a solid basis for meta-epidemiological research of the evidence on novel treatments in cancer.