Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive fibrotic interstitial lung disease (ILD) of unknown etiology. The disease occurs predominantly in middle-aged and elderly adults, is more frequent in men, and has been linked with cigarette smoking and occupational exposure [1, 2]. Prognosis for patients with IPF is poor, with median survival estimated to be around only 3–4 years [3,4,5,6,7]. Historically, treatment options for patients with IPF have been limited; however, the UK has seen a number of important developments in the treatment of IPF, which it is hoped will translate into improved patient outcomes. These include the development of ILD specialist services [8]; the licensing and reimbursement approval of antifibrotic therapy, firstly pirfenidone in 2013 and nintedanib in 2015 [9, 10]; and publication of National Institute for Health and Care Excellence (NICE) guidelines on the diagnosis and management of suspected IPF [11]. If the impact of these important developments in IPF care is to be measured, it is vital that the underlying IPF disease burden is understood. Moreover, the limited available data on the epidemiology of IPF, both in the UK and in the USA [6, 12], suggest that the incidence [4, 5] and mortality [5, 13, 14] rates, as well as hospitalizations [15], for IPF are increasing. Establishing robust estimates of the underlying and secular trends in the epidemiological burden of IPF is therefore of increasing public health significance.


We carried out a population-based study using data from 2000 to 2012, which aimed to describe the incidence, prevalence, and survival of patients with IPF in the UK, both overall and stratified by calendar year, age, gender, and strategic health authority. We use the term IPF–clinical syndrome (IPF–CS) to describe patients with IPF in our study in order to reflect the imprecision of current coding terms. Owing to the relatively recent introduction of recommended standard diagnostic criteria for IPF [16], and the potential for variability in diagnostic coding, we used both broad and narrow definitions for IPF–CS to explore the impact of coding specificity on the chosen outcome measures.

Data Source

Data were obtained from the Clinical Practice Research Datalink (CPRD) GOLD primary care database (formerly the General Practice Research Database, GPRD). The CPRD is a research and data services provider in the UK, which brings together data from across the UK National Health Service and is jointly funded by the Medicines and Healthcare products Regulatory Agency (MHRA) and the National Institute for Health Research [17]. CPRD GOLD is one of the largest computerized databases of anonymized and longitudinal electronic medical records in the world. At the time of analysis, data were held for approximately 14 million patients, around 5.4 million of whom were alive and registered in 660 contributing general practices across the UK. The data recorded include information on patient demographics, clinical events and diagnoses, details of specialist referrals, hospital admissions, results of laboratory tests, and prescriptions issued. Clinical events and diagnoses are recorded via Read codes [18], and practices are assigned an “up-to-standard” date, which indicates when the data recorded by that practice met pre-specified completeness criteria. Data in CPRD GOLD are broadly representative of the UK population [19] and the validity of the recorded clinical data is considered to be high [20]. The database has also been validated previously for use in respiratory epidemiology [21]. The study protocol was reviewed and approved by the CPRD Independent Scientific Advisory Committee (reference number 13_083).

Study Population and Follow-up to Identify Cases of IPF–CS

The study population comprised all patients registered for at least 1 day in practices contributing data deemed “up-to-standard” by CPRD GOLD between 1 January 2000 and 31 December 2012 (N = 9,748,108). Patients were followed from their index date until the earliest of the following: date of death, date of transfer out of the practice, or date of last practice data collection. Prevalent cases of IPF–CS were identified on the basis of the presence of a relevant Read code using our broad or narrow definition of IPF–CS. Read codes included in our narrow IPF–CS case definition were H563.00 (Idiopathic fibrosing alveolitis), H563.12 (Cryptogenic fibrosing alveolitis), H563z00 (Idiopathic fibrosing alveolitis NOS), H563300 (Usual interstitial pneumonitis), and H563.13 (Idiopathic pulmonary fibrosis). Our broad IPF–CS definition included the following three additional Read codes: H563100 (Diffuse pulmonary fibrosis), H563200 (Pulmonary fibrosis), and H563.11 (Hamman–Rich syndrome). Patients with Read codes for connective tissue disease, extrinsic allergic alveolitis, sarcoidosis, pneumoconiosis, or asbestosis (Table S1) at any time in their medical records were not included as cases. Incident cases of IPF–CS were identified by applying the following eligibility criteria to the prevalent cases: at least 1 year of registration with their general practitioner, at least 1 year of up-to-standard data in their patient records before the index date, and no Read code for IPF–CS prior to the start of the study period (1 January 2000).

Statistical Analysis

Incidence and prevalence of IPF–CS per 100,000 patient-years were estimated along with 95% confidence intervals (CIs) calculated using the Poisson distribution. Incidence rates were calculated by dividing the number of incident cases by the total person-time contribution, and were calculated both overall and stratified by calendar year, gender, 5-year age group, and strategic health authority. Poisson regression modelling was used to estimate annual incidence rate ratios (IRRs), controlling for age, sex, and strategic health authority. Multiplicative interaction terms were applied to test for potential effect modification by age and sex. Point prevalence of IPF–CS was calculated for each calendar year in the study period. The numerator was the number of patients with a specified Read code for IPF–CS prior to the midpoint of the year of interest, and the denominator was the number of patients in the study population at the midpoint of the year of interest. Prevalence ratios for 2012 were additionally stratified by age group, gender, and strategic health authority.

Life table analysis was carried out to estimate cumulative mortality (with 95% CIs) at 48 weeks, 52 weeks, 60 weeks, 72 weeks, 3 years, 5 years, 10 years, and more than 10 years after incident diagnosis. Survival was analyzed using Kaplan–Meier methods, both overall and stratified by year of diagnosis, gender, 5-year age group, and strategic health authority. Cox regression analysis was performed to calculate hazard ratios adjusted for all other covariates, and to test for interactions between mortality rates and year of diagnosis, age group, gender, and strategic health authority. The following reference categories were used in these analyses, respectively: (1) patients diagnosed in 2000; (2) patients aged 65–69; (3) male patients; (4) patients based in the North West Strategic Health Authority. Patients were censored in the Kaplan–Meier and Cox regression analyses according to the date they left the practice or the last date that the practice contributed data to CPRD prior to analysis. Statistical analyses were performed using Stata version 12.0 (StataCorp LP, College Station, TX, USA).

Compliance with Ethics Guidelines

This article reports a population-based cohort study and does not contain any studies with human participants or animals performed by any of the authors.


Incidence and Prevalence of IPF–CS

The total number of incident cases of IPF–CS identified between 2000 and 2012 was 1491 using our narrow case definition, and 4527 using our broad case definition. These cases were identified over a total follow-up period of 52,355,644 and 52,341,029 years, respectively. Overall and stratified incidence rates of IPF–CS and IRRs are shown in Figs. 1 and 2. The overall incidence rate per 100,000 patient-years was 2.85 (95% CI 2.71–3.00) using the narrow case definition and 8.65 (8.40–8.90) for the broad case definition. Incidence of IPF–CS increased with age, was more common in male patients than female patients, and varied across regions (as assessed by strategic health authority); being highest in the North West region and Northern Ireland, and lowest in the South East. When using a broad IPF–CS case definition, mutually adjusted IRRs significantly increased over the study period. Compared with the year 2000, a 78% increase in the incidence of IPF–CS was observed in 2012 (IRR 1.78, 95% CI 1.50–2.11). No statistically significant interactions (0.55 < p < 0.74) were observed to suggest the annual increase in incidence varied by age or gender for either IPF–CS definition. Annual prevalence is shown in Fig. 3. When the broad case definition was used, prevalence of IPF–CS approximately doubled over the study period; starting at 19.94 per 100,000 patients (95% CI 18.48–21.47) in 2000 and rising to 38.82 per 100,000 patients (95% CI 37.04–40.66) in 2012. Prevalence in 2012 was higher in male patients, increased with age, and varied across strategic health authority, with highest rates in Scotland and lowest rates in the South East Coast (Fig. 3).

Fig. 1
figure 1

Adjusted incidence rate ratios of IPF–CS stratified by calendar year using a narrow definition and b broad definition. CPRD, Clinical Practice Research Datalink; IPF–CS, idiopathic pulmonary fibrosis clinical syndrome. All incidence rate ratios are mutually adjusted for other variables: gender, age group, and strategic health authority

Fig. 2
figure 2

Crude incidence rates of IPF–CS stratified by a calendar year, b gender, c age, and d strategic health authority. CPRD, Clinical Practice Research Datalink; IPF–CS, idiopathic pulmonary fibrosis clinical syndrome; Yorks, Yorkshire

Fig. 3
figure 3

Prevalence rates of IPF–CS stratified by a calendar year, b gender, c age, and d strategic health authority. CPRD, Clinical Practice Research Datalink; IPF–CS, idiopathic pulmonary fibrosis clinical syndrome; Yorks, Yorkshire. “[a]” For this analysis the lower age groups (0–39 and 40–44) and two regions (Yorkshire & The Humber and North East) have been combined in line with CPRD policy on low cell counts

Mortality and Survival of Patients with IPF–CS

Among incident cases, a total of 2618 and 996 deaths occurred over 11,468 and 4167 years of follow-up from diagnosis in broad and narrow case sets, respectively. Similar patterns of survival were observed with both narrow and broad case definitions (Table 1). Median survival was 3.0 years (95% CI 2.80–3.10) in the broad case group and 2.7 years (95% CI 2.5–3.0) in the narrow case group. Half of patients were alive at 3 years following IPF–CS diagnosis using our broad case definition; 5- and 10-year survival rates were 34% and 19%, respectively. There was no statistically significant survival difference between the broad and narrow case groups, log rank test p = 0.06 (Fig. 4a). There was no statistically significant difference in the survival function by year of diagnosis, log rank test p = 0.17 (broad case definition; Fig. 4b). Models adjusted for demographic variables show that patients diagnosed in 2000 had a higher mortality rate than patients diagnosed in later years (Table 2). However, there is no clear evidence of improvement over time. There was no statistically significant interaction between age and gender following further adjustments (0.07 < p < 0.75).

Table 1 Mortality rates using narrow and broad IPF–CS case definitions
Fig. 4
figure 4

Kaplan–Meier survival analysis for all incident cases of IPF–CS a using broad and narrow case definitions, b by year of diagnosis using the broad case definition

Table 2 Relative effect of calendar year of diagnosis, gender, age group, and strategic health authority on risk of mortality using narrow and broad IPF–CS case definitions


Our large population-based study provides national estimates of the epidemiological burden of IPF in the UK. We did not observe any improvements in survival from diagnosis in patients with IPF–CS over our relatively recent study period (2000–2012). Overall, median survival was found to be 3 years, consistent with rates from previous studies [3, 5] and similar, albeit slightly lower, than that reported by other groups [4, 6, 7]. Navaratnam et al. [5] found that median survival did not change significantly in the UK between 2000 and 2008 using data from The Health Improvement Network (THIN), a primary care database similar to CPRD GOLD. They also reported 5-year survival to be 37%, similar to our value of 34%. Gribbin et al. [4] also investigated mortality of IPF using data from THIN, and reported a 5-year survival rate of 43% over their earlier study period.

When a broad case definition was used, the incidence of IPF–CS increased over time and was found to have risen almost 80% over the course of our study period. This trend of rising incidence is in line with reports from previous studies carried out in the UK using data from primary care [4, 5] and hospital admissions [15]. However, when a narrow case definition was used, an overall decrease in incidence over time was observed. This observation is likely due to changes in the availability of Read codes and recording practices over time. Alternatively, this contrasting trend could be explained by the fact that the overall number of incident cases recorded using the narrow definition is considerably lower than those recorded using the broad definition (1491 and 4527 cases, respectively). There is also a possibility that the diagnostic precision used to distinguish IPF from other types of fibrotic lung disease may have increased over the course of the study period.

We found the incidence of IPF–CS to increase with age and to be higher in male patients, consistent with other reports from the UK and the USA [4,5,6, 22], but found no evidence that the increase in incidence over time was restricted to, or significantly varied by, age group or gender. Findings from others regarding such age and gender effects have been mixed [4, 5, 15]. Our estimates of IPF–CS occurrence and survival were adjusted by age group, sex, and health authority, thus suggesting that the observed increase in incidence and survival is unlikely due to demographic changes in the UK population over time. Other explanations should nevertheless be considered. For instance, it is possible that changes in disease ascertainment and recording of IPF–CS could explain some of the observed rise in incidence. Given that improvements in ascertainment should result in the detection of milder cases, such an explanation is compatible with the changing, albeit inconsistent, survival found using both narrow and broad case definitions. Furthermore, the widely heralded publication of international evidence-based guidelines on the diagnosis and management of IPF in 2011 [23] may explain the slightly greater increase in incidence observed in 2012 due to heightened physician awareness, as might the approval in Europe of pirfenidone in March 2011.

We found regional variation in the incidence of IPF–CS across the UK, with highest rates in the North West and Northern Ireland, and lowest rate in the South East (using the broad case definition), consistent with findings by Navaratnam et al. [5]. In their earlier study period, Gribbin et al. [4] also found rates to be lowest in the South East, but high rates were reported for Scotland in addition to the North West. This regional variation may reflect geographical differences in environmental and occupational risks and is likely to have important implications for national resource utilization. Our study also highlights the impact of using different diagnostic codes for IPF on estimates of incidence and prevalence. The observed similarity in patterns of survival when using either narrow or broad case definitions suggests that both sets of Read codes capture a similar range of cases across disease severity. The similarity of outcomes seen in both groups, when compared to well-defined cohorts of patients with IPF identified through tertiary centers, suggests that the chosen diagnostic codes are successfully identifying patients with disease conforming to the internationally accepted criteria for a diagnosis of IPF.

Our study has several strengths. The large size of the CPRD cohort provided a representative snapshot of the UK population and the long study period enabled ascertainment of precise estimates of disease occurrence, mortality, and temporal change. The use of the CPRD GOLD database enabled analysis of the large dataset required to attempt to estimate the epidemiological burden of IPF; however, we recognize that the main potential weakness of our study concerns the validity of the IPF diagnosis.

Case ascertainment was based on diagnostic coding used by general practitioners in primary care, rather than standard diagnostic criteria. We accept that there is a possibility of misclassification of IPF [19]. Unfortunately, it was not possible to validate the diagnostic coding because of the large quantity of data, but limited informal re-evaluation suggested that subjects did indeed have IPF. In a previous study evaluating IPF using data from the GPRD, Hubbard et al. [24] found the validity of IPF diagnoses in the database to be high; 95% of identified cases were found to be true positives. Furthermore, IPF can be difficult to diagnose, with patients usually only identified in secondary, or even tertiary, centers by multidisciplinary teams experienced in the diagnosis of ILD. We therefore believe that it is unlikely that a general practitioner would record a diagnosis of IPF without having received notification from a respiratory specialist in secondary care. The similar pattern in survival seen over time using both narrow and broad case definitions in our study also supports the validity of the IPF–CS diagnosis in our cases. With regards to case ascertainment, we feel that it is unlikely that many cases in the community would have been missed. This is because patients with IPF suffer chronic and progressive exertional dyspnea and cough for which they usually seek medical attention [1]. However, under-ascertainment of cases in our study is possible owing to the reliance on the transfer of information between secondary and primary care, and so some cases diagnosed in secondary care may have been missed. Another limitation is the possibility that some incident cases were misclassified and were actually prevalent cases. This would not have affected the observed trends in increased incidence or survival over time, but may have overestimated survival because prevalent cases have greater average survival than incident cases [3].

It should also be noted that a considerable proportion of patients were lost to follow-up (22% and 32% of patients in the narrow and broad definitions, respectively, over the first 5 years after diagnosis). Follow-up time for the incidence analysis was slightly lower using the broad definition than the narrow definition, as patients would have been identified as prevalent IPF patients at study initiation (using the broad definition) and were therefore excluded from the analysis. In some cases, censoring of patients may be linked to disease outcome; for example, IPF patients are more likely to move to a nursing home and change to another general practice prior to death, which would lower the estimated mortality rate.


This study provides estimates of the epidemiological burden of IPF in the UK. Our finding that the survival of patients with IPF–CS is very poor and has changed little over recent years undoubtedly reflects the lack of effective treatments available to patients with this devastating disease during this time period. The antifibrotic therapies pirfenidone, in 2013, and nintedanib, in 2015, have been approved for reimbursement in the UK by NICE for the treatment of adults with mild-to-moderate IPF. Since then pirfenidone has been shown, in a clinical trial setting, to reduce disease progression and to improve survival [25,26,27]. Whether the use of antifibrotic therapies [28] translates into increased survival in real-world patients with IPF will need to be evaluated in coming years. Electronic medical records, such as those in CPRD GOLD, constitute valuable data sources for the evaluation of IPF. Unlike CPRD GOLD, disease registries in most countries are limited by the fact that registration of IPF is not mandatory and therefore inclusion of cases is open to bias.

The results of this study provide an important benchmark against which the effects of changes in the management and treatment of IPF can be measured, and thereby lays the foundation for further research.