Background

There have been a number of studies that have reported a strong relationship of forced expiratory volume in one second (FEV1) to risk of lung cancer (e.g. [110]). However, apart from a review in 2005 by Wasswa-Kintu et al.[11] we are unaware of any previous attempt to meta-analyse the available data, and that review restricted its meta-analysis only to those four studies which reported results by quintiles of FEV1, although noting the existence of data from a larger number of studies. In order to obtain a more precise estimate of the relationship of FEV1 to lung cancer risk, and to study factors which might affect the strength of this relationship, this systematic review and meta-analysis combines separate quantitative estimates of the relationship from studies which have presented their findings in a variety of ways. For each available set of data we estimate the slope (β) and its standard error (SE β) of the relationship RR(diff) = exp(βdiff) where diff is the reduction in FEV1 expressed as a percentage of its predicted value (FEV1%P), and RR(diff) is the relative risk associated with this reduction. Our procedures allow us to incorporate results reported as quintiles, by other grouped levels or as regression coefficients and also to include results reported not only in terms of FEV1%P, but also in terms of associated measures such as FEV1, or the ratio of FEV1 to forced vital capacity (FEV1/FVC).

Methods

Inclusion and exclusion criteria

Attention was restricted to epidemiological studies of cohort design involving a follow-up period of at least three years, in which FEV1 was recorded at baseline, and which presented the results of analyses relating FEV1 (or related measures) to subsequent risk of lung cancer.

The following exclusion criteria were applied:

Patients

Studies of patients who had undergone, or were selected for, surgery; of patients with cancer or serious diseases other than COPD; publications describing case reports or reviews concerning treatment for cancer or surgical procedures.

Not cohort

Clinical studies; studies of cross-sectional design; studies involving a follow-up period shorter than three years.

Not lung cancer

Lung cancer not an endpoint; no lung cancer cases seen during follow-up.

Reviews not of interest

Review papers where the relationship of FEV1 to lung cancer was not considered, the papers typically only describing the relationship of an exposure (e.g. smoking) with FEV1 and separately with lung cancer.

Note that the four sets of exclusion criteria were applied in turn, and once one criterion was satisfied no attempt was made to consider the others.

Literature searching

A Medline search was first carried out using the search term (“Forced expiratory volume” [Mesh Terms] OR FEV1 [All fields] OR “Forced expiratory volume” [All Fields]) AND Lung cancer) with no limits. An Embase search was then carried out using the same search terms. Reviews of interest, including the earlier systematic review of Wasswa-Kintu et al.[11], were then examined to see if they cited additional relevant references. Finally, reference lists of the papers obtained were examined.

Identification of studies

Relevant papers were allocated to studies, noting multiple papers on the same study, and papers reporting on multiple studies. Each study was given a unique reference code (REF) of up to six characters (e.g. MANNIN or MRFIT), usually based on the principal author’s name. Possible overlaps between study populations were considered.

Data recorded

Relevant information was entered onto a study database and a linked relative risk (RR) database. The study database contained a record for each study describing the following aspects: relevant publications; study title; study design; sexes considered; age range; details of the population studied; location; timing; length of follow-up; definition of lung cancer, and whether mortality or incidence. It also contains details of the individual components making up the Newcastle-Ottawa study quality score [12], described in detail in Additional file 1: Quality.

The RR database holds the detailed results, typically containing multiple records for each study. Each record is linked to the relevant study and refers to a specific RR, recording the comparison made and the results. This record includes the following: sex; age range; race; smoking status; adjustment factors; type of lung cancer; source publication and length of follow-up. For studies which provided a block of results by level of FEV1%P (or by an associated measure, such as FEV1/FVC, FEV1 unnormalised or SDs of FEV1/height3 below average), the record also included the measure reported, the range (or mean if provided) of values for the comparison group, and for each level the range (or mean) of values, and the reported or estimated RR and 95% confidence interval (CI) relative to the comparison group. Also recorded was an estimate of the ratio of the number at risk in the comparison group to the overall number at risk, and the ratio of the number at risk to the number of lung cancer cases for the block, and information to distinguish between multiple blocks within the same study (e.g. for different sexes or smoking groups). For studies which only provided summary statistics for a block (such as the RR for a 1% decrease in the measure), the record contained details of the summary statistic and also the information to distinguish between multiple blocks. Although our main analyses are restricted to the most relevant estimates recorded in the RR database (e.g. data for FEV1%P if available, direct estimates of β rather than estimates derived from RRs by level, data for longest follow-up, or whole population data rather than data for small subsets of the population), all data were entered as available. However, most studies did not allow any choice.

Statistical methods

The basic model

The underlying model is that proposed by Berlin et al.[13], which we previously used to study the relationship of dose of environmental tobacco smoke exposure to lung cancer [14]. In this model, the absolute risk of lung cancer, R, in someone exposed to a given dose is expressed as

R = α exp β d

where α and β are constants. This implies that the relative risk RR(d2,d1) comparing dose d2 to dose d1 is given by

RR d 2 , d 1 = exp β d 2 d 1 or RR diff = exp ( β diff )

where diff is the difference in dose. This model implies that a fixed difference in dose increases risk by a fixed multiplicative factor.

When applying this model the dose, d, is the estimated mean level of FEV1%P, and the difference in doses, diff, is taken to be the reduction in FEV1%P compared to the highest level studied. As RRs tend to increase with decreasing level of FEV1%P, expressing diff in terms of reductions in FEV1%P ensures that estimates of β tend to be positive. Note that no attempt is made to estimate absolute risks or the parameter α, only the slope parameter, β, being estimated.

To use this method it was required to estimate β, and its standard error (SE β), for each block to be analysed. Three main situations were found in the blocks examined:

  1. a)

    Some studies actually presented estimates of β together with its SE or 95% CI that could be used directly. Others presented estimates in a form that could readily be converted, e.g. increase in risk per 1% decrease in FEV1%P.

  2. b)

    Other studies presented data by grouped values of FEV1%P either directly as RRs and 95% CIs or in other ways that allowed RRs and 95% CIs to be calculated using standard methods [15]. Berlin et al. [13] described a method for estimating β, and its standard error (SE β), that requires data for a study to consist of dose and number of cases and controls (or subjects at risk) at each level of exposure. The method is not a straightforward regression, as it has to take into account the fact that the level-specific RR estimates for a block are correlated, as they all depend on the same comparison group. It can also be applied to studies with data in the form of confounder-corrected RRs and 95% CIs, provided that such data are first converted into counts (“pseudo-numbers”). We used the method of Hamling et al. [16] to estimate the pseudo-numbers.

  3. c)

    A final group of studies had RRs that were not expressed in terms of FEV1%P, but in terms of an associated measure, such as uncorrected FEV or FEV1/FVC. To ensure consistency in the estimation process for β, we converted values of the associated measure into values in terms of FEV1%P. To do this we made use of the publicly available data in the NHANES III study.

The NHANES III dataset

The National Health and Nutrition Examination Surveys (NHANES) were conducted on nationwide probability samples of approximately 32,000 persons 1–74 years of age. The NHANES III survey [17], conducted from 1988 to 1994, was the seventh in a series of these surveys based on a complex, multi-stage plan, designed to provide national estimates for the US of the health and nutritional status of the civilian, non-institutionalised population aged two months and older. Inter alia, the NHANES III study makes available data on age, sex, race, height, smoking habits, FEV1 and FVC on an individual-person basis.

Based on the NHANES data, Hankinson et al. (1999) [18] provides widely-used equations to predict FEV1 for an individual which are of the form:

FEV 1 predicted = b 0 + b 1 age years + b 2 age years 2 + b 3 height cm 2

where the coefficients: b0, b1, and b2, vary by sex, race and age, as shown in Table 1. The observed value of FEV1 for an individual can then be divided by the predicted value based on the individual’s characteristics, and then multiplied by 100, to give the estimated value of FEV1%P for that individual.

Table 1 Age, sex and race specific coefficients used to predict FEV 1 for the equations of Hankinson et al.[18]a

For each result not expressed in terms of FEV1%P, we selected those NHANES III subjects who had the range of characteristics relevant to that result. These characteristics included the range of the lung function measure provided, age and sex (and in some cases smoking habit or an additional lung function specification). We then applied the FEV1 prediction equations to each of the selected subjects and thus estimated the mean value of FEV1%P. For example, one study [19] was of males aged 16–74 and gave relative risks for categories of FEV1/FVC (<80%, 80-89% and 90%+ of predicted). From the NHANES data we looked within males aged 16–74 and, for each category of FEV1/FVC, calculated the mean value of FEV1%P. The calculated mean was then used as the dose value for our calculations of β.

One study [20] was a particular problem as the groupings were in terms of residuals from a regression analysis including age, smoking status and current cigarettes smoked. This model was fitted to the NHANES III data, and mean values of FEV1%P were calculated for different quartiles of the residuals.

Only one publication [21] provided mean levels for each category when the original measure was FEV1%P. Where means were not available, we used the NHANES III dataset to calculate them. This was of particular benefit when dealing with open-ended categories.

Predictions and goodness-of-fit of the fitted model

For data presented by grouped levels of FEV1%P (or associated measures) the estimate of β was used to calculate predicted RRs and numbers of lung cancer cases at each level corresponding to the observed RRs and numbers. The observed (O) and predicted (P) numbers were then used to derive a chisquared test of goodness-of-fit by summing (O-P)2/P, taking the degrees of freedom (d.f) as one less than the number of levels. For defined values of d (0, 0.01-10, 10.01-20, 20.01-30, 30.01-40, >40) O and P were summed over block to similarly derive an overall goodness-of-fit chisquared statistic on 5 d.f. Blocks involving only two levels were ignored for the chisquared tests as providing no useful information on goodness-of-fit.

Meta-analysis and meta-regression

Individual study estimates of β and SE β were combined to give overall estimates using inverse-variance weighted regression analysis, equivalent to fixed-effect meta-analysis. Random-effects meta-analyses were also conducted, but are not reported here as the results were virtually identical. Heterogeneity was investigated by testing for significant variation in β, considering the following factors: sex (male, female, combined), publication year (<1990, 1990–1994, 1995+), age at baseline (<50, 50–59, 60+ years), Newcastle-Ottawa quality score (5–7, 8–9), continent (North America, other), mortality or incidence (deaths, incidence, both), population type (general population, other), exposed population (exposed to known lung carcinogens, other), length of follow up (≤15, 16–23, 24+ years), smoking adjustment (yes, no), measure of FEV1 reported (FEV1%P, other), effect as originally reported (regression coefficient, RR and CI, SMR/SIR) and inverse-variance weight of β (<1000, 1000–2999, 3000+). Simple one factor at a time regressions were carried out first, with the significance of each factor tested by a likelihood-ratio test compared to the null model. A stepwise multiple regression analysis was then carried out to determine which of the factors predicted risk independently.

Forest plots

Exp(β) is an estimate of the RR associated with a decrease of 1% in FEV1%P. For each such RR included, referenced by the study REF and associated block details such as sex, the RR is shown as a rectangle, the area of which is proportional to its weight. The CI is indicated by a horizontal line. The RRs and CIs are plotted on a logarithmic scale so that the RR is centred in the CI. Also shown are the values of each RR and CI and the weight as a percentage of the total. Results from the meta-analysis are shown at the bottom of the plot. The combined estimate is presented as a diamond, with the width corresponding to the CI and the RR as the centre of the diamond.

Publication bias

Publication bias was investigated using Egger’s test [22] and using funnel plots. In the funnel plots, β is plotted against its precision (=1/SE). A dotted vertical line corresponds to the overall estimate.

Software

All data entry and most statistical analyses were carried out using ROELEE version 3.1 (available from P.N.Lee Statistics and Computing Ltd, 17 Cedar Road, Sutton, Surrey SM2 5DA, UK). Some analyses were conducted using SAS or Excel 2003.

Results

Publications and studies identified

Thirty-three publications [15, 7, 9, 10, 1921, 2344] satisfying the inclusion and exclusion criteria were identified from the searches carried out in October 2011. Details of these searches are given in Figure 1. Subsequently, at the analysis stage, seven of these publications were rejected. Two [41, 42] described a study in Denmark which presented its results in a way that did not allow estimation of β. Two [24, 36] described a study in France of iron miners which only provided results for decreased FEV1 without giving the ranges of FEV1 being compared. One [29] described a nested case–control study in the USA of heavily asbestos-exposed shipyard workers, which reported only the mean difference in FEV1 between cases and controls. Two [33, 34] described results from the Italian rural cohorts of the Seven Countries Study, which reported results only for forced expiratory volume in ¾ second. A brief summary of the findings from these is reported in Additional file 2: Others, which demonstrates that these were consistent in showing an association of reduced FEV1 with increased lung cancer risk.

Table 2 Selected details of the 22 studies of FEV 1 and lung cancer

The remaining 26 publications were then subdivided into 22 distinct studies, some details of which are summarized in Table 2. Of the 22 studies, 12 were conducted in the USA, 3 in Scandinavia, 2 in Italy, 2 in the UK, 2 in Canada and 1 in South Africa. Many of the studies were quite old, with 16 starting before 1980. 12 involved follow-up of 20 years or more, with a further 6 involving at least 10 years follow-up. Numbers of lung cancers analysed ranged from 11 in study SKILLR to 1514 in study VANDEN. 10 studies involved over 100 cases. 3 studies involved subjects exposed to known lung carcinogens other than smoking (CARET: asbestos, CARTA: silica, FINKEL: radon) and a further study (WILES) was of gold miners. Newcastle-Ottawa quality scores ranged from 5 to 9, with 10 studies scored as 8 or 9. The 22 studies provided data for 32 independent data blocks, with CARET giving results separately for those with FEV1/FVC above or below 0.70, RENFRE, SPEIZE and TAMMEM giving results separately for men and women, ISLAM giving results separately for current and non-current smokers, and VANDEN, the study involving the largest number of lung cancer cases, giving six sets of results, separately for all combinations of sex and smoking status (never, former, current).

Table 3 Results for the five blocks already expressed as regression coefficients

Fitted β estimates and goodness-of-fit

Table 3 summarizes the results for those five blocks where regression estimates for the lung cancer/FEV1 relationship were provided by the authors. For two blocks, β was directly available, and for the other three β could readily be calculated from the odds ratio for a given percentage increase or decrease in FEV1%P.

Table 4 Fit of the model to the data for the 27 blocks with grouped data

Table 4 summarizes the results for the remaining 27 blocks where results were given by level of FEV1%P or an associated measure. The table shows the measure the data were originally presented in, the estimated mean reduction in FEV1%P compared to the base group with the highest value of FEV1%P, the observed RRs and 95% CIs and those fitted using the estimate of β, which is also shown. Also shown are the observed pseudo-numbers of lung cancer cases at each level and those fitted using the estimate of β, and the goodness-of-fit chisquared. Additional file 3: Fit gives plots comparing the observed and fitted RRs.

Table 5 Testing for significance of variation in β by various factors considered one at a time

Where only two levels of FEV1%P were available, the fitted numbers of cases necessarily equalled the numbers observed. Where there were more than two levels being compared, the goodness-of-fit to the model was generally satisfactory. The significant (p<0.05) misfits to the model were for: block 5 (CARTA), where there was almost a 4-fold difference in risk between the highest and middle groups (90+ and 80 to <90 FEV1/FVC) but virtually the same estimated FEV1%P; block 13 (NOMURA) and block 29 (VANDEN female former smokers), where the pattern of increasing risk with declining FEV1%P was non-monotonic; and block 14 (PETO), block 17 (RENFRE females) and block 30 (VANDEN female current smokers), where the increase in risk was similar but marked in all the groups with reduced FEV1%P. Only for block 13 (NOMURA) was the p value for the fit <0.01. Table 4 also includes the results from an overall goodness-of-fit test for those blocks involving more than two levels. While there is some tendency for fitted numbers of lung cancer cases to be somewhat higher than the observed numbers at the extremes (the comparison group and differences in FEV1%P greater than 40), and lower in the four intermediate groups (differences of 0.01 to 10, 10.01 to 20, 20.01 to 30 and 30.01 to 40) the goodness-of-fit chisquared statistic of 8.43 on 5 d.f. is not significant (p=0.13).

Meta-analysis and meta-regressions

Exp(β) is the RR associated with a decrease in FEV1%P by one unit, and Figure 2 presents a forest plot showing the estimated values with 95% CI for each of the 32 blocks. These range from 0.972 to 1.075, with a combined estimate of 1.019 (95% CI 1.016 to 1.021, p<0.001). It is evident from Figure 2 that the estimates are reasonably consistent. As shown in Table 5, the deviance (chisquared) of the 32 results is 44.01 on 31 d.f., equivalent to an I2 of 29.6%.

Figure 1
figure 1

Flow diagram for literature searching. The diagram gives details of the four stages of the search; the Medline search, the Embase search, the search based on reviews of interest, and the search based on secondary references. The four criteria for rejecting papers during these four stages are described further in the Methods section under the headings “patients”, “not cohort”, “not lung cancer” and “reviews not of interest”. Note that one of the three papers accepted from the search based on secondary references cited a paper that was also examined but provided no lung cancer results. The four stages produced a total of 33 accepted papers (22 Medline, 5 Embase, 3 reviews of interest, 3 secondary references). Subsequently 7 of these were rejected for reasons described in the first paragraph of the Results section.

Figure 2
figure 2

Forest plot of the 32 estimates of exp(β). Estimates of β and SE(β) are presented in Table 3 for results presented originally as regression coefficients and in Table 4 for results presented by grouped level of FEV1 or associated measures. For each of the 32 estimates Figure 2 shows the associated values of exp(β) with their 95%CIs. These estimates are shown both numerically and also graphically on a logarithmic scale. The studies are sorted in order of block number, and are referenced by study reference (REF). Multiple blocks within the same study are distinguished by the following codes (M = males, F = females, N = never smokers, X = ex smokers, C = current smokers, LO = FEV1/FVC ≥ 0.70, and HI = FEV1/FVC < 0.70). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight (inverse- variance of log RR).

Table 5 also presents estimates of β by level of a range of different factors. For 10 of the 13 factors considered, including sex, publication year, study quality, continent, exposed to lung carcinogens, follow-up period, smoking adjustment, measure of FEV1 reported, inverse-variance weight of β, and how the data were originally recorded, there was no significant evidence of variation by level. However, there was significant evidence of variation by mean age at baseline (p<0.01), disease fatality (p<0.01) and population type (p<0.05), with estimates of β being somewhat higher in younger populations, in studies involving lung cancer deaths rather than incidence, and in studies not of the general population. In stepwise regression, however, only mean age at baseline remained in the model as an independent predictor of lung cancer risk.

Publication bias

Based on the 32 estimates of β there was no evidence of publication bias using Egger’s test. This is consistent with the funnel plot shown as Figure 3, and with the lack of relationship between β and its weight shown in Table 5.

Discussion

Based on 32 independent data sets from 22 studies we estimate β as 0.018 (95%CI 0.016-0.021). This relationship is highly significant (p<0.001) and is equivalent to saying that, compared to someone with an average FEV1%P of 100%, someone with an FEV1%P of 90% would have a 20% increase in lung cancer risk, and someone with an FEV1%P of 50% would have a 151% increase.

There is little evidence of heterogeneity over study (I2 = 29.6%), or that estimates vary by specific factors including sex, study location, length of follow-up, adjustment for smoking, the measure of FEV1 reported, or how the results were originally reported. Nor was there any evidence of publication bias. There was, however, some evidence that estimates varied by age of the population at baseline, but even then clear reductions were seen in all three age groups studied, with β varying only between 0.015 and 0.024. We discuss below various aspects of our methods, which might attract criticism.

One is the use of the data from NHANES III which, though nationally representative of the USA, would not be representative of the populations involved in the 22 studies we considered. We used NHANES III for two reasons. First, we needed to have mean FEV1%P values corresponding to the groups used, only one study actually reported such means, and NHANES III was a large and available database. Our feeling is that any errors for non open-ended intervals are likely to be minor, and that even for open-ended intervals any errors are unlikely to have affected our main conclusions. In this we are fortified by the general consistency of the estimates of β and also by the observation that for the one study (STAVEM) that did supply means, the estimates reported (121.9, 106.6, 95.3 and 75.7) were similar to those that could be estimated from NHANES III (122.1, 106.2, 94.8 and 71.9). The other reason was that we needed some method of incorporating studies reporting results, not by FEV1%P directly, but by associated measures. Had we restricted attention to results reported by FEV1%P we would have reduced the number of available blocks from 32 to 20, and we wished to avoid such loss of power. Here it is reassuring that the overall estimate for the 12 blocks where β was estimated using data for associated measures of 0.019 (0.014-0.024) was very close to that for the other 20 blocks of 0.018 (0.015-0.021).

We should also comment on the fact that the method of estimation of β required pseudo-numbers of cases and numbers at risk for each level of FEV1%P corresponding to the adjusted RRs, as using simple numbers would have removed the effects of adjustment. We used the method of Hamling et al.[16] here to estimate the pseudo-numbers, and note that Orsini et al.[45] recently reported that they arrived at very similar results using this method as they obtained based on the available individual person data, although this was in a somewhat different context. Our experience too is that the method provides a very robust way of estimating the magnitude and significance of functions of relative risks.

Figure 3
figure 3

Funnel plot. Funnel plot of the 32 estimates of β against their precision (1/SE). The dotted vertical line indicates the meta-analysis estimate. Estimates based on data originally presented as FEV1%P are distinguished from other estimates by different symbols.

Another issue is the use of a simple model in which the logarithm of the RR is linearly related to the difference in FEV1%P. As always, one could postulate more complex relationships, but have found that the model fits the data quite well, as judged by the goodness-of-fit tests conducted. We have not explored whether more complex models fit materially better, nor attempted to estimate risks for a given level of FEV1%P, but note that a simple model has advantages in expressing the relationship to the reader. Clearly our model may not fit perfectly at the extremes (e.g. comparing someone with a value of FEV1%P of 150 and one of 30) but data here are limited. One would really need individual person data to get a more precise answer, but we have not attempted to obtain such data, particularly as many of the studies were conducted many years ago.

Based on those studies where we could estimate β we found no evidence of publication bias. However, we should point out that we had to reject seven publications, describing four studies, as the data were not presented in a way that allowed estimation of β. These studies, which each involved less than 40 lung cancer cases, were consistent in demonstrating a positive association of reduced FEV1 with increased lung cancer risk, and it seems unlikely that this omission has caused material bias.

While our β estimates were quite consistent over study, we did observe somewhat higher values in younger populations. This may reflect variations in the rate of FEV1 decline associated with susceptibility to smoking [46]. Subjects in younger populations who already have reduced FEV1 may have even more reduced FEV1 later in life and therefore an even greater risk of lung cancer during follow-up. None of the studies we reviewed relate FEV1 recorded on two occasions to subsequent risk of lung cancer, to allow direct testing of the relationship of rapidity of FEV1 decline to lung cancer risk.

In their review Wasswa-Kintu et al.[11] concluded that “reduced FEV1 is strongly associated with lung cancer” and that “even a relatively modest reduction in FEV1 is a significant predictor of lung cancer, especially among women.” Their meta-analyses were based on four studies that reported FEV1 in quintiles, with their estimated relative risks for the lowest to the highest quintile being 2.23 (95%CI 1.73-2.86) for men and 3.97 (95%CI 1.93-8.25) for women. While our meta-analyses, which are based on far more studies, confirmed the strong association of reduced FEV1 with increased lung cancer risk, we found no significant difference between the sexes. It is not possible to compare our estimates precisely but, taking the difference in FEV1%P between the lowest and highest quintiles to be 60 (approximately the value for the NHANES III population for both sexes), our estimate of β of 0.0184 predicts a lowest to highest quintile relative risk of 3.02, which is not very different from the estimates of Wasswa-Kintu et al.[11].

Conclusions

Our review confirms the strong association between reduced FEV1 and increased risk of lung cancer. The strength of the association is very consistent, with our 32 estimates of β showing remarkably little variation, given the variety of ways in which the source papers presented their results. Based on our results, we estimate that each 10% decrease in FEV1%P is associated with a 20% (95% CI 17%-23%) increase in lung cancer risk.