Background

Idiopathic pulmonary fibrosis (IPF) is a diffuse parenchymal lung disease of unknown etiology associated with a median survival of 3 to 5 years after diagnosis [1]. Disease behavior is variable among patients, with some individuals remaining relatively stable over long periods, while others may experience a slow progressive decline, rapid decline, or suffer acute exacerbation [2]. Predicting the clinical course in IPF is challenging due to the heterogeneous nature of the disease, but it remains a critically important goal for both clinical and research purposes. Knowledge of an individual’s probability of disease progression or risk of death may affect timing of drug therapies or listing for lung transplantation. In clinical trials of therapeutics, accurate prognostication is desirable to maximize the likelihood of detecting treatment effects through cohort enrichment. For these reasons, several models of disease behavior for IPF have been developed with the common goal of accurate prognostication. Each model has contributed valuably to our understanding of IPF, identifying key clinical, physiologic, radiologic, pathologic, and biologic features associated with outcomes of interest.

Clinical models of disease behavior

Early risk prediction models incorporated baseline clinical and radiographic parameters to predict mortality in IPF. The composite clinical, radiological, and physiological scoring system identified age, clubbing, smoking history, lung volumes, end-exercise hypoxemia, and chest radiographic evidence of pulmonary hypertension and interstitial abnormalities to be associated with survival [3, 4]. The Composite Physiologic Index was similarly developed, incorporating three lung function parameters to predict mortality and accounting for the confounding effects of concomitant emphysema in IPF patients [5], a limitation of prior models.

More recently, du Bois et al. [6] developed a risk scoring system based on age, history of respiratory hospitalization, baseline forced vital capacity (FVC), and change in FVC over 24 weeks to predict mortality. This was subsequently modified to include a functional and longitudinal parameter, the 6-minute walk distance (6MWD) and change in this parameter over 24 weeks [7]. Ley et al. [8] derived and validated the ‘gender, age, physiology’ (GAP) model, which identified four readily available baseline parameters, namely gender, age, FVC, and diffusion capacity of the lung for carbon monoxide (DLCO), to develop staging and risk prediction scores. An alternative model in which the extent of fibrosis on high resolution computed tomography of the chest was used in place of the DLCO performed equally well [9]. The original du Bois and GAP models have subsequently been combined to provide an integrated baseline and longitudinal risk prediction approach [10].

These clinical models have demonstrated the impact of cohort characteristics on calibration of risk. This is most obvious in comparing risk in referral center-based cohorts and clinical trial cohorts. Models derived in center-based cohorts appear to significantly overestimate mortality risk in clinical trial cohorts, where patients are highly selected [11]. Additionally, age and gender appear to be more relevant prognostic variables in clinical cohorts, perhaps through capturing the influence of comorbidities, while adding relatively little in a clinical trial cohort. Thus, calibration of risk prediction models to the population of interest appears critical to accurate quantification of risk.

Clinical risk prediction models provide important prognostic tools for practice and clinical trial development. However, their performance remains modest, likely because clinical markers are limited in their inability to directly assess the underlying pathobiology and disease activity. Translational studies are providing novel tools in the form of molecular and genetic biomarkers to address this limitation.

Molecular and genetic biomarker-based models of disease behavior

Several recent studies have identified molecular and genetic biomarkers associated with clinical outcomes in IPF [12]. These can be divided into three categories: genetic, protein, and cellular.

Genetic-based biomarkers associated with worse survival in IPF include mucin 5B promoter polymorphisms [13], shorter leucocyte telomere length [14], and the toll-interacting protein single nucleotide polymorphism [15]. Protein-based biomarkers that have been associated with worse outcomes in IPF include surfactant proteins A (SP-A) [16] and D [17], Krebs von den Lungen-6 (KL-6) [18, 19], CC-chemokine-ligand-18 [20], C-X-C motif chemokine 13 [21, 22], matrix metalloproteinase (MMP)-3 [22] and MMP-7 [23], fibulin-1 [24], interleukin-8 and intercellular cell adhesion molecule-1 [23], osteopontin [25], periostin [26, 27], and collagen degradation products [28]. Cellular biomarkers associated with worse outcomes in IPF include regulatory T cells (Tregs) [29], semaphorin 7a + Tregs [30], and circulating fibrocytes [31].

Molecular and genetic biomarkers seem certain to add to the predictive abilities of currently available clinical risk prediction models. To date, few studies have examined this additive benefit and rigorous validation is lacking, but superior model performance has been suggested with certain combinations of clinical variables and biomarkers [13, 18, 23, 32]. Song et al. [18] proposed that the combination of at least three biomarkers (e.g., MMP-7, SP-A, and KL-6) improved risk prediction over clinical variables alone. Clearly, more needs to be done to clarify the additive role of molecular and genetic biomarkers.

Conclusions

Taken together, these early reports highlight the potential for more accurate modeling of disease behavior in IPF. However, several important limitations remain. First, while survival is unquestionably a clinically meaningful outcome, it is of less use to patients and clinicians than pre-mortality outcomes such as disease progression. No model to date accurately predicts pre-mortality outcomes such as loss of lung function or acute exacerbation. Second, available models demonstrate only modest prediction accuracy. Potential explanations for this include the inability to capture other co-morbidities (e.g., cardiac disease, cancer) leading to death in IPF patients, the lack of reliable biomarkers of disease activity, and the failure to account for processes such as acute exacerbation. Lastly, quantification of risk may differ between patient populations, suggesting that models may need to be tailored to the population of interest.

Future research will need to address these and other limitations. We anticipate that models that combine clinical and biological variables will lead to improved prognostication for patients and improved cohort enrichment strategies in clinical trials. To develop these integrated models, we believe that a centralized registry of well-characterized patients with systematically collected bio-specimens will prove essential [33].