FormalPara Key Points

Testing for EGFR mutations is important for the selection of appropriate therapy.

Herein, EGFR overall and actionable submutation prevalence was high, with regional differences.

These data support testing for EGFR gene mutations to inform treatment decisions.

1 Introduction

Lung cancer, of which the non-small cell type accounts for almost 85% of cases, is the most commonly diagnosed cancer and the leading cause of cancer-related deaths worldwide [1, 2]. Overall, it was predicted that in 2018 (the year for which the latest statistics are available), there would be 2.1 million new diagnoses of lung cancer and 1.8 million associated deaths [1]. Not all cases of non-small cell lung cancer (NSCLC) are created equal, and NSCLC can be further classified on the basis of histology as adenocarcinoma, which makes up about 40% of cases [3], squamous cell carcinoma, and large cell type, among other rare types [4]. Prognosis is primarily linked to the stage of disease, with the highest 5-year survival (61%) in those diagnosed with localized disease, which accounts for only about 30% of adenocarcinoma cases at diagnosis, falling to only 6% in those with distant metastatic disease, which encompasses about 50% of cases [5, 6]. In addition, a variety of tumor-specific genomic abnormalities have been identified that provide insight into prognosis and predict response to specific targeted therapies, particularly for adenocarcinoma [7].

The epidermal growth factor receptor (EGFR) is a transmembrane protein that serves as a tyrosine kinase receptor for a variety of ligands involved in regulating cell proliferation, differentiation, and survival [8]. Mutations in EGFR were the first targetable alterations discovered in lung cancer and are among the most common driver mutations in NSCLC [9]. Before the introduction of targeted therapies, NSCLC with overexpression of EGFR was associated with a greater risk of metastasis, poor tumor differentiation, and a high rate of tumor growth [8, 10]. The first drugs that targeted EGFR were approved without a complete understanding of the genomic mutations associated with EGFR positivity. These tyrosine kinase inhibitors (TKIs) function by competitively inhibiting the binding of adenosine triphosphate to the active site of the EGFR kinase. Since then, mutations have been identified that have been shown to be associated with sensitivity to EGFR TKIs, with the most common being in-frame deletions of exon 19 and L858R substitutions in exon 21 [11]. Tumor genotyping is now considered to be essential to guide treatment decisions for patients with NSCLC, and EGFR mutations are now listed among several mutations that should be routinely screened in patients with lung cancer with an adenocarcinoma component [12, 13]. Newer non-invasive analytical options, such as the analysis of circulating tumor DNA, offer high specificity as well as the testing of patients for whom biopsy sampling is not feasible [13].

Patients with advanced (regional and distant) disease, which totals 70% of cases, have few therapeutic options [5]. Historically, the standard of care has been systemic therapy involving platinum-based regimens; however, an overall survival of less than 2 years is associated with this modality in patients with advanced NSCLC [14, 15]. Clinical trial results have supported the advances in the genomics, showing significantly higher response rates and longer progression-free survival with EGFR TKIs compared with chemotherapy in patients whose tumors harbored activating mutations in EGFR, prompting the approval of these agents for first-line treatment of patients with EGFR-positive NSCLC and universal testing of tumors for EGFR mutations [16, 17]. As confirmed in clinical studies, epidemiologic and retrospective database investigations have found that testing for genetic mutations and the use of appropriate targeted therapies have led to better therapeutic outcomes in advanced NSCLC [18, 19]. Thus, the identification of geographically different EGFR gene mutation patterns in NSCLC is important for the selection of appropriate targeted therapies. However, current studies give an incomplete picture of regional differences in EGFR mutation and submutation prevalence. This meta-analysis was conducted to provide a robust and comprehensive overview of EGFR mutation and submutation (specifically exon 19 deletions, exon 21 L858R substitutions, and others) prevalence, and identify important covariates that influence EGFR mutation status in patients with advanced NSCLC worldwide to address this clinical data gap.

2 Materials and Methods

This systematic review and meta-analysis abided by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement guidelines [20]. A predefined protocol was followed.

2.1 Criteria for Study Inclusion

Studies included in this meta-analysis comprised phase II and III randomized controlled trials, real-world datasets, health record datasets, cohort studies, case-control studies, and cross-sectional studies. Case reports, preclinical studies, opinion pieces, letters, other systematic reviews, and phase I randomized controlled trials were excluded. Studies must have enrolled ≥ 50 adult patients with advanced NSCLC (stage IIIB/IV; locally [T3–T4] and/or regionally [N2–N3] advanced or distant metastatic [M1] disease [12, 21]) who tested positive for an EGFR mutation; up to 20% of patients could be stage I/II was added as an allowance after an initial search. Studies that did not explicitly state the stage were included if there were other indications suggesting that patients with advanced/metastasized disease were almost exclusively enrolled.

Studies must have had EGFR mutational data available, with clear distinctions between exon 19 deletions, exon 21 L858R substitutions, and other submutations. Studies where mutational analyses were performed on tissue were included, but studies in which only test results from blood or malignant pleural effusion were provided were excluded, as were studies that did not include patients with adenocarcinoma or if the study specifically looked at the T790M resistance mutation in patients who had undergone TKI therapy.

2.2 Search Strategy

Embase® and MEDLINE® in Ovid were searched for studies published between 2004 and 2019. A title and abstract screen was performed independently by a pair of the authors (JB, MM). An additional screen was performed by two reviewers (JB, AK), with disagreements resolved by consensus. Duplicates were removed, and then a full-text screen was performed by one of the authors (AK), with disagreements resolved by consensus (AK, JB, and MM).

2.3 Data Synthesis

Study-level EGFR mutation endpoints (All EGFR, exon 19 deletions, and exon 21 L858R substitutions) reported as percentages were converted into binomial probabilities prior to the meta-analysis. Missing study-level mutation counts were converted from percentages and vice versa. Where there was no study-level mutation information, baseline arm values were used to calculate study-level information. Covariate values were converted in a similar manner, with weighted averages employed for mean age. Where required, All EGFR mutation percentages were calculated using the number of patients evaluated for EGFR mutations as the denominator. Submutation percentages were calculated using the number of patients with any EGFR mutation as the denominator.

Associated EGFR mutation standard errors, used to weight each study, were derived using the log-odds approximation where “p” was the probability of EGFR mutation and “n” was either the number of tested subjects in the study (for the All EGFR analysis) or the number of patients with All EGFR mutations (for the submutation analyses).

$${\text{SE}} \approx \sqrt {\frac{1}{np} + \frac{1}{{\left. {n(1 - p} \right)}}}$$

Linear mixed-effects models were fitted to EGFR mutation endpoints using logistic transformation (logit) and assuming a binomial distribution (EGFR mutation ~ binomial [ni, pi], where ni is the number of tested subjects in the study or the number with All EGFR mutations depending on the endpoint and pi is the probability of the specific EGFR mutation endpoint in the study). The model included terms for an intercept reflecting European studies, further additive terms C1i–C6i for other study continents (categorical = 0 or 1), a between-trial random effect (ηiN[0, τ2]), and a residual random error term (εiN[0, σ2/ni]), where i is the study and θ is the model estimate:

$${\text{Logit}}\left( {p_{ij} } \right) = {\text{intercept}} + \theta_{1} *C1_{i} + \theta_{2} *C2_{i} + \theta_{3} *C3_{i} + \theta_{4} *C4_{i} + \theta_{5} *C5_{i} + \theta_{6} *C6_{i} + \eta_{i} + \varepsilon_{i}$$

Five potential covariates (age, percent male, percent Caucasian, percent adenocarcinoma, and percent stage I/II) were assessed visually for their relationship to the response. Only covariates with values for at least 70% of the studies and the majority of those values covering more than one level were included. Missing covariates were imputed as median percentages. Three covariates (age, percent male, and percent adenocarcinoma) were tested as additive terms in the model, each added as a single term. The covariates were centered on the mean for the logistic regression model; therefore, model estimates were assessed at the mean value of the covariate. Analysis was conducted in R [22], with the lme4 package [23], and figures produced using the package ggplot2 [24].

3 Results

3.1 Study Identification and Selection

Upon the initial title and abstract screen, 3969 potential studies were identified, of which 2974 were eliminated because they were duplicates or it was clear that they did not meet the prespecified criteria upon visual review. Of the remaining 995 studies reviewed in more detail, 914 were excluded because they did not meet inclusion criteria. Data extraction of the remaining 81 studies eliminated an additional 11 studies, including two studies that did not differentiate between exon 19 deletions and exon 21 L858R substitutions, four that did not examine any rare mutations, one that enrolled < 50 patients with an EGFR mutation, one that had only malignant pleural effusion specimens, and three that included > 20% of patients with stage I/II disease. Five additional studies were added: one that was not listed in the primary literature search and found by chance, and four that were initially incorrectly excluded. This left 75 studies, of which one had submutation population overlap that did not allow for individual percentages of patients with each submutation to be calculated and was therefore excluded. Of the final 74 studies that were included, 17 comprised populations that were non-representative of the typical overall NSCLC population (e.g., because of specific selection criteria) and were removed from the All EGFR mutation analysis, leaving 57 studies. The selection process of studies is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of the selection of studies included in this meta-analysis. EGFR epidermal growth factor receptor

3.2 Characteristics of the Study Populations

The 74 studies enrolled a total of 59,707 patients who were tested for EGFR mutations, with 16,746 patients in the European studies, 37,594 patients from Asia, 3332 patients from North America, and 1298 patients from more than one global region, which encompassed multiple regions. There was a paucity of data from some continents. No South American studies were included in the All EGFR mutation analysis, and the one study included in the submutation analysis had only 72 patients. Only one study was identified from central America (Mexico), encompassing 165 tested patients, and one study from Oceania (New Zealand) of 500 tested patients.

The unweighted mean age across all studies where age was recorded ranged from 53.0 to 71.4 years, with 25.0–75.1% male and 40.4–100% of patients having adenocarcinoma. Table 1 provides a summary of all of the studies included in this analysis [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98].

Table 1 Characteristics of included studies

3.3 All EGFR Mutation Analysis

The final model for the All EGFR mutation analysis included one covariate term for percentage of male patients (at the 0.001% level). The percentage of adenocarcinomas was investigated in the model as an additive covariate but was not statistically significant, and thus was not included in the final model. Estimates for the prevalence of All EGFR mutations ranged from 11.9% (95% confidence interval [CI] 6.7–20.5) for Global to 49.1% (95% CI 46.5–51.7) for Asia (Fig. 2). The model was a good fit for the data as evidenced by the minimal difference between observed values and predicted estimates (Electronic Supplementary Material [ESM]). An informal assessment of the effect of study percentage of male patients as a covariate found that as the percentage of male patients increased, the percentage of All EGFR mutations decreased for all continents (Table 2).

Fig. 2
figure 2

Final model estimates for the All EGFR mutation analysis with percent male covariate. Blue symbols indicate observed data. CI confidence interval, EGFR epidermal growth factor receptor

Table 2 Effect of study percentage of male patients on All EGFR mutation estimates

3.4 Exon 19 Deletions

There were no significant covariates. Estimates for the prevalence of the exon 19 deletion submutation, which were relative to the overall EGFR mutation population, ranged from 40.3% (95% CI 28.1–53.9) for Oceania to 66.8% (95% CI 51.7–79.0) for South America (Fig. 3). The CIs for the model estimates were not as precise as those for the All EGFR mutation model. This was because the study populations were smaller for this analysis, as only the number of patients with EGFR mutations was included (ESM).

Fig. 3
figure 3

Final model estimates for the exon 19 deletion EGFR submutation analysisa. Blue symbols indicate observed data. aRelative to the overall EGFR mutation population. CI confidence interval, EGFR epidermal growth factor receptor

3.5 Exon 21 L858R Substitutions

Similar to the exon 19 deletion analysis, there were no significant covariates for the exon 21 L858R substitution analysis, thus the base model was the final model. Estimates for the prevalence of the exon 21 L858R substitutions, which were relative to the overall EGFR mutation population, ranged from 27.7% (95% CI 17.3–41.2) for South America to 41.1% (95% CI 39.6–42.7) for Asia (Fig. 4). The CIs for the model estimates were not as precise as those for the All EGFR mutation model because the study populations were smaller for this analysis as only the number of patients with EGFR mutations was included (ESM).

Fig. 4
figure 4

Final model estimates for the exon 21 L858R substitution EGFR submutation analysisa. Blue symbols indicate observed data. aRelative to the overall EGFR mutation population. CI confidence interval, EGFR epidermal growth factor receptor

4 Discussion and Conclusions

This systematic review and meta-analysis showed that the prevalence of EGFR mutations in patients with advanced NSCLC differed with geographic region. The highest prevalence for All EGFR mutations was observed in Asian patients (49.1%) compared with other continents (11.9–33.0%). These results are similar to another systematic review that found the overall rate of EGFR mutations was lowest for Europe (14.1%) and highest for Asia (38.4%), with a combined North and South America region in the middle (24.4%) [99]. However, this study did not restrict the population to patients with advanced NSCLC, did not distinguish between specific EGFR submutations, comprising only 73% of patients with adenocarcinoma, and characterized regions more broadly. Other systematic reviews have also been published; however, these studies did not analyze EGFR mutation incidence according to the same criteria as in the present study [100,101,102,103,104]. Our study was unique in that it also examined the prevalence of the most prominent TKI-sensitizing submutations. Although there were regional differences in the distribution of submutations (exon 19 deletions and exon 21 L858R substitutions), these differences were less pronounced than for the overall EGFR mutation analysis.

For the overall EGFR mutation analysis, the percentage of male patients in the study population was identified as a significant covariate. Percent adenocarcinoma and age were not determined to be significant covariates. As the percentage of male patients increased, the percentage of overall EGFR mutations decreased. It is well recognized that not only do female patients with NSCLC have a decreased risk of progression and death, they also have a greater incidence of EGFR mutations and respond better to EGFR TKI therapy than male patients [51, 99, 101,102,103, 105,106,107]. Importantly, our study did not find any covariates, including percentage of male patients in the study population, that were meaningful in terms of individual submutations. It is concluded, therefore, that testing for mutations is crucial regardless of sex and other patient characteristics. However, our study did not investigate the influence of other covariates, such as smoking status, that have been shown to be associated with an increased incidence of EGFR mutations [108]. The studies included in our analysis used very different forms of categorization for smoking behaviors (e.g., some studies used “yes/no” only, while others used “heavy/light/former/never”), which made it difficult to standardize; furthermore, we believe an influence of smoking status on submutations was unlikely.

Although a strength of this analysis was that it investigated EGFR mutation and submutation status in a large meta-analysis on a worldwide basis, the number of patients in certain geographic regions was limited. The majority of studies came from Europe and Asia; there was only one study from South America included in the submutation analyses and this study was not included in the overall EGFR mutation analysis. This low number of studies from central and South America may be because EGFR mutation testing is low in Latin American countries, potentially as a result of lack of access [109]. A recent analysis of 4389 patients has shown that molecular testing is requested in only 76% of lung-cancer cases in Latin America, compared with 97%, 79%, and 90% in the USA, Europe, and Japan, respectively [110]. Moreover, specific regions may have high diversity in EGFR mutation prevalence, which was not captured in our analysis because data on race and ethnicity were scarce in many publications, thus geographical region was used. This has been seen in Asia, for example, where EGFR mutation frequency has been shown to range from 22% in those of Vietnamese ethnicity to 64% in those of Indian ethnicity [51]. Another potential limitation is that the patient populations in each of the studies included in the analysis may have been more likely to be selected for EGFR mutation testing based on demographic and/or clinical characteristics, availability of specimens for testing, or they may be from areas where testing is more common [111, 112]. Nevertheless, our model was a good fit for the data for the overall EGFR mutation analysis as evidenced by the minimal difference between observed values and predicted estimates, so that the different proportions of patients positive for EGFR mutations among the various regions should be upheld even if exact rates are indefinite. Linear mixed-effects logistic regression was utilized because it is an established meta-analysis methodology, which uses the totality of the data in a unified framework for more precise mean estimates and easier estimation of covariate effects. This approach allowed for continent and covariate effects for each EGFR mutation endpoint to be analyzed simultaneously.

Our analysis focused on exon 19 deletions and exon 21 L858R substitutions. We did not examine other submutations (e.g., exon 20) because of a lack of available data. Additionally, because the most common mutations are in-frame deletions of exon 19 and L858R substitutions in exon 21, we thought that these would be the most clinically relevant [8, 11]. Less common EGFR mutations and complex mutations represent a heterogeneous subgroup of patients, and differences in testing methods used for different studies may also introduce a bias, such as a false-negative result, when analyzing rarer mutations [12]. Although our analysis was based on NSCLC overall, generally most of the patients in the included studies had adenocarcinoma histology. Because of infrequent reporting of actionable mutations in other histologies, current guidelines focus mainly on testing patients with adenocarcinoma and advise that molecular testing is appropriate in NSCLC with nonadenocarcinoma histology when clinical features are atypical or there is an increased likelihood of a targetable mutation [12]. The prevalence of EGFR mutations and submutations may therefore differ between histological subtypes and data availability may be affected by differences in testing patterns and clinical features.

Understanding EGFR mutation prevalence in different geographic regions is important for physicians who need to make informed decisions for their patients that are based on sound medical evidence of benefit. This information is also critical so that policy and guidelines can be optimally developed to account for the EGFR genetic profile of local populations, which is not only important in resource-limited settings, but also around the globe where there is an increasing emphasis on personalized yet cost-effective practice of care [113,114,115]. This meta-analysis provided a robust and comprehensive overview of EGFR mutation and submutation prevalence, and identified an important covariate (percentage male) that influenced EGFR mutation status in patients with advanced NSCLC worldwide. These data show that despite differences among geographic regions, there is a considerable percentage of patients with either of the main types of actionable mutations (exon 19 deletions and exon 21 L858R substitutions) who could potentially benefit from targeted therapies. Thus, these data support the adoption of widespread routine testing in the advanced setting to improve therapeutic outcomes for these patients.