Introduction

Idiopathic pulmonary fibrosis (IPF) is a specific form of fibrosing interstitial pneumonia characterized by progressive worsening of dyspnea and lung function [1]. In the United States, the annual incidence of IPF has been estimated as 6.8–8.8 cases per 100,000 using narrow case definitions (requiring a definite pattern of Usual Interstitial Pneumonia [UIP] on high-resolution computed tomography [HRCT]), and as 16.3–17.4 cases per 100,000 using broad case definitions (including patients with a possible UIP-pattern on HRCT) [2]. Although IPF has a poor prognosis, with a median survival time from diagnosis of 2 to 3 years, the clinical course of IPF varies considerably [1],[3]. Symptoms experienced by patients with IPF include non-productive cough, fatigue and chronic dyspnea, with the latter being the most prominent and disabling [4]. The morbidity associated with IPF has a broad and profound impact on patients’ health-related quality of life (HRQL) [4],[5].

As IPF is a progressive disease with no cure, HRQL and other patient-centered outcomes are important endpoints to evaluate in research and clinical practice [6]. Although no disease-specific measure of HRQL has been established as suitable for longitudinal research in patients with IPF, several HRQL instruments (and others, including symptom and generic quality of life questionnaires) have been used [7],[8]. Which patient-centered instrument(s) (including HRQL questionnaires) to use in a particular study depends on a number of factors, including the design of the study, the intervention being assessed, the hypotheses being tested, and the characteristics of the comparator group (general population, patients with IPF of different severity, patients with another disease, etc.). In any situation, whether a generic HRQL instrument might perform as well or better than a disease-specific HRQL instrument is uncertain.

In this review, we focused on the St George’s Respiratory Questionnaire (SGRQ). Although originally developed for use in patients with chronic obstructive pulmonary disease (COPD) and asthma [8], it has frequently been used to evaluate HRQL in patients with IPF. The SGRQ is a 50-item questionnaire split into three domains: symptoms (assessing the frequency and severity of respiratory symptoms), activity (assessing the effects of breathlessness on mobility and physical activity), and impact (assessing the psychosocial impact of the disease) [9]. Scores are weighted such that every domain score and the total score range from 0 to 100, with higher scores indicating a poorer HRQL.

The aim of this review was to assess the appropriateness of the SGRQ for measuring HRQL in patients with IPF by examining the evidence relating to the psychometric performance of the SGRQ in this population. A revised version of the SGRQ, the SGRQ-I, has been developed for use in patients with IPF [10]; however, studies assessing this tool are limited, and SGRQ-I data are not covered in this manuscript.

Methods

Search strategy and data extraction

A comprehensive literature review was conducted to identify articles that evaluated the psychometric properties of the SGRQ in patients with IPF. Following a PubMed search (see Additional file 1), articles were excluded if they were not published between 1 January 1991 (date of first publication of the SGRQ) and 31 August 2013, were not published in English, did not report data on the psychometric properties of the SGRQ in patients with IPF or duplicated clinical trial data reported in another article (Figure 1). Data extracted from the studies included study characteristics (country, duration, design, sample size), participant characteristics (age, gender, time since diagnosis, forced vital capacity [FVC]% predicted, diffusing capacity for carbon monoxide [DLCO]% predicted) and results of the psychometric tests.

Figure 1
figure 1

Selection of articles to be included in the review.

Articles were selected that assessed any of the following psychometric properties of the SGRQ: internal consistency, convergent validity, known groups validity, test-retest reliability (reproducibility), responsiveness, minimal important difference (MID), and floor and ceiling effects [11]. Internal consistency refers to the degree to which the individual items within an instrument correlate with each other (i.e., tap the same underlying construct). This is determined using Cronbach’s coefficient alpha, with ≥0.70 considered to indicate acceptable internal consistency for a multi-dimensional instrument. Convergent validity describes the degree to which two measures, hypothesized to measure the same construct, correlate. Known groups validity refers to the extent to which scores on an instrument distinguish groups that differ on a key variable, usually clinical in nature. For the described validity measures, correlations were regarded as weak if ≤0.30, moderate if 0.30–0.60, and strong if >0.60 [12]. Test-retest reliability assesses the ability of an instrument to produce consistent scores over repeated measurements in patients who are clinically stable. Responsiveness assesses the ability of an instrument to detect change in individuals who are hypothesized to have changed on the underlying construct (HRQL) and who are known to have experienced change in clinical status. MID estimates identify the smallest difference in the score on an instrument that patients perceive as important. Floor and ceiling effects are limitations that occur when an individual scores at the extremes of an instrument; if a patient’s score is the lowest or highest possible value, the instrument is unable to detect a reduction or increase, respectively.

Results

A total of 30 papers were included in the review (Figure 1; Table 1).

Table 1 Studies included in this review

Internal consistency

Data from a clinical trial of bosentan have been used to determine the internal consistency of the SGRQ in patients with IPF. Cronbach’s alpha was 0.66 for the symptoms score and ≥0.84 for each of the SGRQ activity, impact and total scores [10],[34].

Convergent validity

Convergent validity was evaluated by extracting cross-sectional and longitudinal correlations between SGRQ scores and other patient-reported outcome measures (Table 2), an assessment of exercise capacity (Table 3), pulmonary function tests (PFTs) or partial pressure of arterial oxygen (Table 4), and assessments of fibrotic abnormalities on HRCT (Table 5).

Table 2 Correlation coefficients between SGRQ scores and other patient-reported assessments of health status
Table 3 Correlation coefficients between SGRQ scores and the 6MWD as a measure of exercise capacity
Table 4 Correlation coefficients between SGRQ scores, pulmonary function tests and arterial blood gas analysis
Table 5 Correlation coefficients between SGRQ scores and extent of fibrosis on HRCT

Patient-reported outcome measures

In nine studies, investigators provided information on the correlation between SGRQ scores and other patient-reported outcome measures (BDI [Baseline Dyspnea Index], D-12 [Dyspnea-12], K-BILD [King’s Brief Interstitial Lung Disease questionnaire], UCSD-SOBQ [University of California San Diego Shortness of Breath Questionnaire], CQLQ [Cough Quality of Life Questionnaire], a single-item dyspnea assessment, SF-36 Physical Component Summary score [SF-36 PCS] and the Borg Dyspnea Index) (Table 2). Moderate to strong correlations were observed between the SGRQ total score and the total scores on these instruments (Table 2). In general, moderate to strong correlations were observed between SGRQ domain scores and the total scores on these instruments. Likewise, moderate to strong correlations were observed between SGRQ domain or total scores and the total, physical complaints, extreme physical complaints, and functional ability sub-scale scores of the CQLQ (r = 0.34 to 0.81) [21], the total and sub-scale scores of the K-BILD (r = -0.59 to -0.89) [27], the SF-36 PCS, a composite score measuring overall physical health (r = -0.52 to -0.74) [10] and the Borg Dyspnea index (r = 0.35 to 0.56) [10],[15]. For most measures and their sub-scales, correlations were weakest with the SGRQ symptoms score (when compared with other SGRQ domains or the total score).

In two studies, investigators evaluated correlations between SGRQ change scores and change scores from other patient-reported outcome measures (Table 2). In one study, correlations were moderately strong between change scores for the SGRQ activity, impact and total scores and change scores from the single-item dyspnea assessment (r = 0.59, 0.56 and 0.45, respectively) [28]. In the other study, investigators found that the correlation between the BDI change score and SGRQ total change score was -0.29 and not significant [25]. However, the BDI was designed to measure dyspnea severity at a single point in time and not to measure change in dyspnea severity [42].

Measures of exercise capacity

Correlation coefficients between SGRQ scores and a measure of exercise capacity are presented in Table 3. Distance covered during the 6-minute walk test (6MWD) is frequently used as a measure of exercise capacity in patients with IPF, and change in 6MWD has been shown to be a predictor of mortality in these patients [16]. In five cross-sectional studies in patients with IPF, investigators examined the relationship between the SGRQ total score and 6MWD. The strength of these correlations was moderate to strong in three (-0.45 to -0.72) [15],[28],[40] and weak in two (-0.26 and -0.28) [10],[16] studies. In four cross-sectional studies, investigators examined the relationship between the SGRQ domain scores and the 6MWD [10],[28],[39],[40]; the strength of these correlations was moderate to strong for the activity score in all four studies (r = -0.32 to -0.72), moderate to strong for the impact score (r = -0.41 to -0.63) and moderate for the symptoms score (r = -0.32 to -0.41) in three studies. In three studies, investigators examined the relationship between change scores for the SGRQ total and change in 6MWD;[16],[25],[28]correlation coefficients ranged from -0.23 to -0.43.

Pulmonary function tests and arterial blood gas analysis

Table 4 presents correlations between SGRQ scores and either PFTs or arterial blood gas analysis in patients with IPF. All correlations between the SGRQ total score and these variables were moderate to strong (r = -0.30 to -0.66, and p < 0.05 for all but one). There were moderate to strong correlations between the SGRQ activity score and the majority of pertinent PFT results (e.g., FVC or DLCO) or arterial blood gas analysis in all studies, while correlations between the SGRQ symptoms or impact domain scores and these variables were generally weak to moderate. Results for FVC, the lung function parameter regarded as the most statistically useful physiological indicator of IPF severity, and the one most frequently used as a primary endpoint in contemporary clinical trials, were weakly to moderately correlated with SGRQ total and domain scores (r = -0.34 to -0.45 for the SGRQ total and -0.13 to -0.31 for the SGRQ domains).

HRCT

In one study of patients with IPF, investigators assessed correlations between SGRQ scores and the extent of fibrotic abnormalities on HRCT (degree of ground-glass opacity [CT-alv], interstitial opacity [CT-fib], and both [total score]) (Table 5). Correlations were moderately strong between the SGRQ symptoms, impact and total scores and CT-alv or total scores (r = 0.34 to 0.42) and moderately strong between the SGRQ activity score and both the CT-fib and total scores (r = 0.37 to 0.39) [28].

Known groups validity

Although there are no well-established categories of disease severity in IPF, it may be hypothesized that patients receiving supplemental oxygen represent patients with more severe disease. In two studies, investigators found that SGRQ total scores were worse in patients using supplemental oxygen versus those not using supplemental oxygen [15],[38]. In one study by Chang and colleagues, the magnitude of difference between patients using versus not using oxygen was 4.7 (p < 0.05) [15].

Test-retest reliability (reproducibility)

No studies were found that reported data on the test-retest reliability of the SGRQ in patients with stable IPF.

Minimal important difference

A triangulation approach has been used to determine an MID estimate for SGRQ scores in patients with IPF [34]. Using both distribution- and anchor-based approaches (using FVC, DLCO and the TDI as anchors), the MID for the SGRQ symptoms, activity, impact and total scores was 8, 5, 7 and 7 respectively.

Responsiveness

The responsiveness of the SGRQ domain and total scores has been assessed in one study [34]. Using data from a randomized placebo-controlled trial of bosentan, investigators assessed the ability of the SGRQ to discriminate among IPF patients who had experienced an improvement, decline, or no change in disease status over 6 months, as defined by three clinical anchors (change in FVC, DLCO, transition dyspnea index [TDI]). With the exception of the SGRQ symptoms score when DLCO was the anchor, changes in SGRQ domain and total scores differed significantly between patients who had declined, remained stable, or improved. [34]. Change scores from the SGRQ total and its domains were reported for the DLCO and TDI response categories and ranged from +3 to +13, +1 to -5, and 0 to -12 for patients that declined, remained stable, or improved, respectively. The impact domain discriminated best between all categories of change for all three anchors [34].

SGRQ as an endpoint

In sixteen trials, investigators used the SGRQ domain and/or total scores as outcome variables. In four trials, investigators evaluated the within-subject change in SGRQ total score from baseline to end of treatment [22],[23],[32],[37] (Table 6). In all four, improvements were observed in exercise endurance or FVC; among these, in three there was a significant decrease in SGRQ total score from baseline to end of treatment (8–24 weeks).

Table 6 Changes in SGRQ scores in within-subject clinical trials

In the remaining 12 trials, investigators assessed whether the SGRQ domain and/or total scores differed between active and placebo groups (Table 7). In four of these [13],[17],[18],[25], statistically significant between-group differences for the primary endpoint coincided with statistically significant between-group differences in at least one SGRQ total or domain score (range of between-groups difference in SGRQ total score: -6.1 to -13.4). Six studies [17],[20],[26],[29]-[31] reported a lack of statistically significant treatment effect in the primary endpoint or SGRQ scores (range of between-groups difference in SGRQ total score reported in three studies: -0.5 to -3.0; scores were not reported in three studies). In three studies [19],[33],[41], the primary endpoint was not met, but the SGRQ total or domain scores were significantly different between treatment groups (range of between-groups difference in SGRQ total score: -3.3 to -6.1).

Table 7 Changes in SGRQ scores in randomized controlled trials

Four studies [20],[31],[33],[41] reported changes from baseline in SGRQ total score in the placebo group. Adjusting for different trial durations, the SGRQ total score in the placebo arms of these trials deteriorated (increased) by a median of +4.9 (range: 3.2 to 10.6) per 52 weeks.

Floor and ceiling effects

No studies were found in which investigators reported data on floor and ceiling effects for the SGRQ in patients with IPF. However, in most studies, the minimum and maximum achievable SGRQ total scores (0 and 100, respectively) were outside an interval spanning twice the standard deviation around the reported means (Table 1). For the two studies in which investigators reported ranges for baseline SGRQ total scores, ranges did not include minimum or maximum possible values [24],[38], thus confirming the absence of floor or ceiling effects in these studies.

Conclusions

Measurement standards and psychometric criteria have been proposed to assist with choosing an appropriate instrument to evaluate HRQL in patients with IPF [6],[43]. As with any patient-reported outcome measure used in the study of any condition, an instrument must have face validity, internal consistency, test-retest reliability, longitudinal validity, and minimal floor and ceiling effects in the target patient population.

The constellation of findings from studies identified in our search revealed that in patients with IPF, the internal consistency of the SGRQ activity and impact domains and the SGRQ total score was excellent, and the internal consistency of the symptoms domain was moderate, and in most studies, fell below the acceptable threshold of 0.7. The lower internal consistency of the symptoms domain is likely because it asks about a range of respiratory symptoms (cough, sputum, shortness of breath, wheezing and attacks of chest trouble), the majority of which apply to few patients with IPF whose major symptoms are shortness of breath and cough. In response data, off-target items create a weaker level of inter-relatedness among items in this domain, and thus lower internal consistency. This also contributes to the lower convergent validity of this domain, as the off-target items weaken the associations between its scores and clinical measures of IPF severity (e.g., patients may endorse wheezing or attacks of chest trouble, but these symptoms are unlikely related to a person’s FVC). These off-target (for IPF) items in the symptoms domain detract from the SGRQ’s face validity and would likely have been removed or modified in a tool specifically designed for use with patients with IPF. Overall, the symptoms domain may be well-suited for patients with COPD, but is not tailored to precisely assess symptoms in patients with IPF. The non-informative noise in the symptoms domain might also contribute to a less than optimal performance of the SGRQ total score. Overall, however, despite its weak face validity in IPF, the symptoms domain performs reasonably well in this population, and its potential to detract from the performance of the SGRQ total score is tempered because it contributes least to the SGRQ total score.

Convergent validity analyses seek to determine whether two measures, hypothesized to measure the same construct, do in fact correlate, and moderate, statistically significant correlations in the expected direction support convergent validity. Very strong or ‘perfect’ correlations, suggest redundancy in measurement, so moderate correlations between a patient-reported outcome measure and another clinical variable support convergent validity of the patient-reported outcome measure while confirming that it contributes unique information not captured by the other clinical variable [5]. The SGRQ has been used as a secondary endpoint in several clinical trials conducted in patients with IPF. Among the select few in which the intervention outperformed placebo, SGRQ results were as one would anticipate, i.e., SGRQ scores improved in the group that benefited from the intervention. Although not a formal assessment of responsiveness, consistency between the changes in SGRQ scores and the changes in other endpoints supports responsiveness.

In sum, the limitations of the SGRQ in IPF should be noted, as it was not originally developed for use in patients with IPF. In particular, this applies to possible over-interpretation of results of individual domains. However, the cross-sectional correlations between SGRQ domain and total scores and other measures of patient-reported health status, exercise capacity or lung function, along with the ability of the SGRQ to distinguish patients who experience a change in clinical status or remain stable over time, support the SGRQ as a useful patient-reported outcome measure in IPF.

Limitations to our research include the following: we could only identify one study in which MID estimates for the SGRQ scores in IPF were determined [44]. This study used a triangulation approach and concluded an MID that was higher than that reported for COPD [45], but more research with additional datasets is needed to evaluate these estimates. In the meantime, the use of responder rates of patients experiencing a minimum change from baseline in SGRQ scores – or perhaps more informative, cumulative distribution plots – may be a useful assessment, as research suggests that it may be less dependent on the exact cutoff, i.e. the precise value of the MID [46].

No articles were identified that evaluated the test-retest reliability of the SGRQ in patients with stable IPF. Likewise, we could not locate a study in which floor and ceiling effects of SGRQ scores were reported, although an analysis of the reported baseline mean SGRQ total scores and their standard deviations suggested that there was no evidence for either. Furthermore, we did not assess the content validity of the SGRQ in patients with IPF, nor did we include analyses of articles published in languages other than English. Content validity and cultural adaption are important factors to consider for any patient-reported outcome measure, but these topics were beyond the scope of this evaluation of the SGRQ’s psychometric properties. Therefore, it is evident that more research on the SGRQ is needed in this patient population.

The utility of a patient-reported outcome measure may be assessed only after a wealth of data becomes available. The assessment involves examining how the measure performs in the target population under several circumstances. The cache of available data has greatly advanced our understanding of HRQL in general, and the performance of the SGRQ in patients with IPF. For example, whilst the mean baseline SGRQ total score reported in IPF (around 45; interquartile range: 42–50) is similar to that reported in COPD trials [47],[48], an analysis of the reported changes from baseline in the SGRQ total score in the placebo arms suggests that untreated patients with IPF deteriorate by +4.9 points over a period of 52 weeks. This contrasts with the experience in COPD, where patients on placebo show an improvement of 2–3 points per year [46], and reflects the progressive decline in health status seen in patients with IPF.

Finally, a major factor in this assessment revolves around how confidently response data from the measure can be used to make inferences about patients in the target population. For example, what can be said about a patient with IPF whose SGRQ score is 50? How does day-to-day functioning, or how a patient feels, change for an IPF patient whose SGRQ score increases by 10 over 6 months? Being able to answer these, and similar, questions confidently and accurately will further and more strongly support the validity of the SGRQ as an instrument capable of assessing domains of HRQL in this population. Until then, the balance of the data suggests that the SGRQ may be a suitable secondary endpoint for measuring HRQL in therapeutic trials of IPF.

Additional file