Introduction

Hereditary Angioedema due to C1 inhibitor deficiency (C1-INH-HAE) is a rare disease characterized by recurrent episodes of subcutaneous and/or submucosal edema, that may cause significant morbidity and be life-threatening [1, 2]. A recent systematic review of epidemiologic studies estimates that its prevalence varies between 1.1 and 1.6 per 100,000 inhabitants [3]. The minimal prevalence rate in Spain was 1.09:100,000 inhabitants in 2003 [4].

C1-INH-HAE may cause significant morbidity due to the unpredictability of angioedema attacks, painful attacks, risk of asphyxia and the need for emergency intervention [5]. Different factors such as the low prevalence of the disease, hereditary transmission, improper diagnosis, concern about transmission of the disease to children, worry about access to specific treatments, and side effects of treatments, among others, have a negative effect on health-related quality of life (HRQoL) [6,7,8,9,10,11].

There are currently no reliable biomarkers to monitor the activity of C1-INH-HAE during follow-up visits. HRQoL measurement tools have been recommended for use in clinical practice to facilitate communication, modify or establish therapeutic action, discuss patients’ hidden problems, and monitor treatment response [1, 12]. The SF-36v2 (SF-36 from now on) is a generic health survey that can be applied both to the general population as well as to specific conditions [13]. This tool has demonstrated good psychometric properties that have been assessed in over 400 articles [14], however its psychometric characteristics in this disease are unknown. It is of special interest to have a generic HRQoL assessment scale validated for C1-INH-HAE. This will lead to a better interpretation of the HRQoL results for patients with C1-INH-HAE who complete SF-36 and allow for comparisons across disease groups.

The main aim of this study was to assess the psychometric properties of the generic SF-36 questionnaire in adult patients with C1-INH-HAE in order to validate it for use in this disease.

Methods

Study description

Psychometric study

Phase 1 A post-hoc analysis of the descriptive characteristics and psychometric properties of the SF-36 was carried out using data provided by an international sample of adult C1-INH-HAE patients who participated in the pilot study for the HAE-QoL development and validation between 2009 and 2011 [5].

Phase 2 Adult C1-INH-HAE patients in Spain were recruited for the study from the Allergy Department of La Paz University Hospital and the National Association of Patients with Hereditary Angioedema (AEDAF) in order to assess SF-36 reliability (test–retest phase).

Ethics committee

The study was reviewed and approved by the Clinical Research Ethics Committee of La Paz University Hospital (Madrid, Spain) (PI-281 and PI-1881) and the committees of the participating hospitals according to the specific regulations of each country.

Participants

Participation in this study was voluntary. Patients were recruited by physicians from our research group in each country. All patients provided informed consent. The inclusion criteria for the patients were that they were at least 18 years old and had a diagnosis of C1-INH-HAE (type I or II) confirmed by a participating physician based on low plasma levels of functional C1 inhibitor and/or low serum antigenic C1 inhibitor. In some cases, mutations in the SERPING1 gene were also detected to confirm diagnosis [7].

Exclusion criteria were patients under 18 years of age, patients with other types of angioedema, patients having a mental health condition adversely affecting their understanding of the study and/or lack of fluency in the language used to answer the questionnaire.

A convenience sample of patients who were heterogeneous with regard to sex, age, level of education, geographical origin and severity of disease was selected for phase 1, whereas a random sample of patients was obtained for phase 2.

Questionnaires

The patients filled the following questionnaires in the first phase: a clinical questionnaire on demographic and clinical characteristics (CQ-HAE) [5], the SF-36, and the HAE-QoL v2.0. Validated versions of HAE-QoL and CQ-HAE questionnaires were available in each of the target languages spoken in the participating countries [5]. Validated versions of the SF-36 in every language from all the countries were purchased from QualityMetric for use in the study.

In the second phase, patients were initially required to complete the CQ-HAE, the SF-36 and the HAE-QoL v2.0 questionnaires. Seven to ten days later they answered SF-36 questionnaire again, as well as a short version of the clinical questionnaire to assess the patients’ clinical stability between the two tests (CQ-retest).

The SF-36 questionnaire consists of 36 questions (items) which encompass 8 domains: physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE) and mental health (MH). Additionally, the SF-36 includes a transition question asking respondents to assess changes in their general health condition compared with the previous year. Although this item is not used to calculate scales it does provide useful information on perceived changes in the health or functioning since the year before completing the SF-36 [15]. The SF-36 questionnaire was not designed to generate a global health index. However, it allows the calculation of two summary scores, namely, the physical component summary (PCS) and the mental component summary (MCS), by combining the scores of several dimensions. A higher SF-36 score indicates higher HRQoL. No missing data were imputed.

HAE-QoL is the first specific HRQoL instrument for adult patients with C1-INH-HAE [5, 6]. It contains 25 items classified under seven HRQoL domains (treatment difficulties, physical functioning and health, disease-related stigma, emotional role and social functioning, concern about offspring, perceived control over illness, and mental health) [5]: Total HAE-QoL score varies between 25 and 135, where a higher score indicts a better HRQoL. HAE-QoL showed strong psychometric properties with a Cronbach’s coefficient of 0.92 and intraclass correlation coefficient of 0.87 [5].

The C1-INH-HAE severity in the last 6 months was measured by the C1-INH-HAE severity score previously described [5] (see Table 1).

Table 1 Ad hoc C1-INH-HAE severity score

Data collection

Patients completed written questionnaires either in-person at the hospital or at home, which were subsequently sent to La Paz University Hospital (Madrid, Spain) for data processing and analysis.

Anonymous data were entered into an Excel database in accordance with the regulations of the Organic Law on Protection of Personal Data (LOPD 15/1999), the applicable law at the time the study was conducted. The assessment of data entry was verified by two researchers.

Data management and analysis were centralised at La Paz University Hospital (Madrid, Spain).

Statistical analysis

The analysis involved:

  1. 1.

    Descriptive statistics of the items, including missing values, minimum and maximum scores, ceiling effect, floor effect, mean, standard deviation, median, interquartile range, skewness, kurtosis, corrected homogeneity index (CHI), and internal consistency coefficient or Cronbach’s alpha. A ceiling or floor effect was considered to be present if more than 15% of respondents had the lowest or highest possible scores [16, 17].

  2. 2.

    Psychometric properties analysis of the SF-36 by means of the reliability and validity evidence study:

  3. 3.

    Internal reliability or internal consistency was assessed by Cronbach’s alpha coefficient [18], for both summaries and domains of the SF-36. Values between 0.70 and 0.95 were considered optimal [16].

  4. 4.

    Construct validity was studied by means of the convergent validity analysis with the HAE-QoL, the study of several predefined clinical hypotheses and the discriminant validity analysis among known groups:

    • Convergent validity was assessed by calculating the Pearson correlation coefficient between the scores of the physical (PCS) and mental (MCS) component summaries and the 8 domains of the SF-36 and the total score of the HAE-QoL and its domains. Association was deemed to exist when this coefficient was higher than 0.4.

    • It was hypothesized “a priori” that a lower SF-36 score (worse HRQoL) would exist for certain patients; those who had ever undergone intubation or a tracheotomy; and patients who were symptomatic, under long-term prophylactic treatment, had received an inappropriate treatment for angioedema attacks (e.g. antihistamines) or had received psychological/psychiatric care for C1-INH-HAE in the last 6 months. In a post hoc analysis, it was decided to further include the hypotheses that patients who had had laryngeal angioedema attacks, had required emergency intervention and had had a higher number of angioedema episodes in the last 6 months would also have lower HRQoL. Construct validity was considered supported if clinically differentiable patient groups had significantly different SF-36 scores in the expected ways in the two summaries and at least 4 out of the 8 domains. The Kruskal–Wallis test and post hoc comparisons were carried out.

    • Discriminant validity among known groups: patients were classified into subgroups according to C1-INH-HAE severity in the last 6 months (Asymptomatic, Mild, Moderate and Severe) (Table 1).5 The Asymptomatic and Mild subgroups were combined for statistical analysis. Validity of known groups was considered supported if clinically differentiable patient groups had significantly different SF-36 scores in the expected ways.

The Mann-Whitney U test and the Student t test were used to compare two independent samples, whereas the Kruskal-Wallis test or one-way ANOVA was used for three or more independent samples. In addition, the post hoc analysis was carried out adjusted by the Bonferroni correction.

  1. (c)

    Reliability: The test–retest reliability was measured by means of intraclass correlation coefficient (ICC) in a group of subjects considered stable with regard to their personal situation and clinical condition during retest period. ICC was considered acceptable if ≥ 0.7 [16].

  2. (d)

    Minimal clinically important difference (MCID): Two methods based on the distribution of values were used to estimate MCID:

    • MCID-1: The half standard deviation (SD) approach, which has been shown to approximate the threshold of discrimination for a clinically meaningful change or difference in PRO scores for patients with chronic diseases [19].

    • MCID-2: The standard error of measurement (SEM), which is widely accepted to represent the MCID of an instrument [20, 21], calculated by multiplying the standard deviation of the instrument by the square root of one minus its reliability coefficient [SD*square root (1-reliability)]. ICC was used as the reliability coefficient.

Contrast hypothesis gave a 95% confidence interval. Data analysis was performed using SPSS v.12.0 and STATA v. 12 statistical software.

Results

Patient demographic and clinical characteristics

Phase 1 International study

Three hundred and thirty-two adult patients with C1-INH-HAE participated in the study. There were sufficiently completed data from 290 patients. The countries participating in the multi-centre study were (from highest to lowest representation in the sample) Spain (n = 42), Germany (n = 42), Hungary (n = 38), Brazil (n = 34), Denmark (n = 27), Poland (n = 22), Canada (n = 21), Romania (n = 19), Austria (n = 18), Argentina (n = 16) and Israel (n = 9). Characteristics of the patients included in the study are shown in Table 2.

Table 2 Characteristics of international multicenter study (phase I) and the re-test (phase II) sample groups

Psychometric analysis of SF-36

The descriptive study of the SF-36 items can be seen in Table 3. The CHI of individual items varied between 0.489 and 0.880. The non-response rate per item was very low and varied from 1 to 3.4%. Two hundred and sixty-six patients (91.7%) had no missing data. The item ceiling effect was present in 25 out of the 35 items included in the SF-36. This ceiling effect was very high, mainly in the domains “PF” (items SF3b to SF3j, which varied between 65.0% and 89.5%), “RE” (items SF5a to SF5c, ranging between 48.6% and 52.6%) and “RP” (items SF4a to SF4d, with a fluctuation between 38.2% and 46.5%). The SF9c item of the “MH” domain also showed a significant ceiling effect (50.2%). The only domain in which no item had a ceiling effect was “VT” with 4 items (SF9a, SF9e, SF9g, SF9i). In general, the floor effect was very low, with only 3/35 items with minor floor effect (between 17.3 and 27.9%). The only items with a floor effect were SF3a in the “PF” domain (27.9%), and SF11b and SF11d in the “GH” domain with 17.3% and 21.2%, respectively.

Table 3 Descriptive analysis of the SF-36 item scores

Regarding the SF-36v2 domains, no floor effect was observed in the domains, while a moderate ceiling effect was observed in 5 out of 8 domains: 31.8% for PF; 30.8% for “RP”; 24.8% for “BP”; 32.9% for “SF” and 41.0% for “RE” (see Table 4).

Table 4 Descriptive analysis of the SF-36 dimension scores

The SF-36 showed an internal consistency from good to excellent. Cronbach’s alpha coefficient varied from 0.82 to 0.93 for the domains (Table 4).

In the convergent validity study, the summaries of the physical component (PCS) and mental component (MCS) of the SF-36 showed a good correlation with the HAE-QoL total score (0.45 and 0.64 respectively, P < 0.001). The MCS presented higher correlations with all the HAE-QoL domains than the PCS. The lowest correlation was found between the “Concern about offspring” domain of the HAE-QoL with both summaries of the SF-36 (MCS 0.30 and PCS 0.17).

Similarly, the total score of the HAE-QoL showed a good correlation with the “PF” (0.64), “SF” (0.59) and “GH” (0.58) domains of the SF-36. Furthermore, statistically significant mild-to-moderate correlations (≥ 0.4) were observed among most of the SF-36 and the HAE-QoL domains, except for the “Concern about offspring” domain of the HAE-QoL, with which all the correlations were < 0.4. Moreover, a mild correlation was observed (< 0.4) between the “PF” domain of the SF-36 and the “Disease-related stigma” (0.35), “Treatment difficulties” (0.38) and “Perceived control over disease” (0.39) domains of the HAE-QoL and between “VT” of the SF-36 and “Disease-related stigma” of the HAE-QoL (0.39). These results have been previously reported [5] and are shown in (see Table 5).

Table 5 Convergent validity of the SF-36 with the HAE-QoL

Construct validity based on combined a priori predefined and post hoc defined hypotheses according to clinical criteria was confirmed in 7 out of the 8 hypotheses. Four out of the eight hypotheses (50%) showed significant differences in all the domains and the two summaries, three in the two summaries and at least 4 of the domains, and only one of the hypotheses was not satisfied. Details are summed up in Table 6.

Table 6 Construct validity according to predefined hypotheses regarding clinical criteria

In the discriminant validity assessment, significant differences were observed in both SF-36 summaries and all SF-36 domain scores among the 3 categories of the C1-INH-HAE severity scale (Asymptomatic-Mild, Moderate, Severe) (see Table 7).

Table 7 Discriminant validity between known groups of the SF-36

Phase 2 Thirty-seven adult patients with C1-INH-HAE participated in the test–retest reliability study (phase 2 of the psychometric study). Thirty patients had all data completed and 20 of them were considered stable. The demographic and clinical characteristics of these patients are shown in Table 2.

The ICC (95% confidence interval) can be seen in Table 8 and varied between 0.758 for the “VT” domain and 0.962 for the “SF” domain.

Table 8 Intraclass correlation coefficient (phase 2: test–retest)

The MCID for the different domains and the two component summaries of the SF-36 is shown in Table 9.

Table 9 Estimation of the minimal clinical important difference

Discussion

The SF-36 is one of the most commonly used generic HRQoL questionnaires worldwide, in studies that measure the impact of a disease on HRQoL in different groups of patients [22,23,24,25,26,27,28,29,30,31], as well as studies that assess the effect of certain therapeutic interventions on HRQoL [32,33,34,35,36,37,38]. It has also been used as a reference in the validation of new instruments [39,40,41,42,43,44]. The SF-36 was used to measure HRQoL in patients with C1-INH-HAE [25,26,27,28,29,30,31] and to assess the effect of some therapeutic interventions [35, 36, 38, 45,46,47]. However, we have found no evidence of any studies on its psychometric properties in patients with C1-INH-HAE and, to the best of our knowledge, it has yet to be for use in C1-INH-HAE.

The psychometric analysis in this study yields satisfactory results overall and provides support for validating the SF-36 as a tool for assessing HRQoL in C1-INH-HAE patients. The SF-36 showed good internal consistency, with all Cronbach’s α coefficient values being higher than 0.7. Similar data were observed for the eight domains in other studies [42,43,44, 48].

The extremely low rate of unanswered questions indicates the questionnaire was suitable for patients with C1-INH-HAE. However, further analysis reveals elevated ceiling effect in the majority of individual items. This suggests that either a greater choice of answers should be included at the top of the scale or respondents did not consider those items to be relevant to C1-INH-HAE. In either case, it would clearly limit the content validity of the SF-36 in C1-INH-HAE.

The ceiling effect is present in 5 out of the 8 SF-36 domains (“RE”, “SF”, “PF”, “RP”, “BP”). However, we should take into account that we adopted a very strict definition of this effect (if > 15% of respondents obtained the highest possible score), in comparison to other studies in which the threshold was as high as 60% [49]. The presence of the ceiling effect indicates that there may be a lack of response options for items at the top of the scale, which would imply a limited content validity. Consequently, patients with the highest score may not be distinguished apparently and thus reliability would be reduced. It could also indicate that these domains are not relevant to C1-INH-HAE patients. On the contrary, no floor effect was found in the SF-36 domains, which might mean there is not a lack of responses at the bottom of the scale. The SF-36 has certain content validity limitations that may affect its use in C1-INH-HAE. Similar findings have already been described in a study in which the author found a low sensitivity of the SF-36 when assessing subtle variations of functional status and emotional functioning in patients with brain tumors [50].

As there is no single gold standard assessment tool for HRQoL, we analysed convergent criterion validity by comparing data from the SF-36 and HAE-QoL questionnaires. In our study, correlations obtained among the SF-36 domains and summary scores, and the HAE-QoL total and domain scores were mostly mild to moderate (> 0.40) and statistically significant, which indicates some agreement between the two instruments. The strongest correlations were seen between the HAE-QoL total score and the “BP” and “RP” domains, as well as the “MCS” of the SF-36. Higher correlations were also observed among related domains of both questionnaires (such as the “MH” domain of both questionnaires, “Physical functioning and health” with “RP” and “Emotional and Social roles” with “SF”) than among other unrelated domains. Based on these results, we can assume that coherence and equivalence are verified for the quality-of-life concept, as assessed by these two instruments. This indicates that both scales coincide in subjective and objective aspects that make up the construct, although their conceptual structures and items differ. Furthermore, the lack of strong correlations might be due to the fact that SF-36 is a generic questionnaire while the HAE-QoL is specifically for patients with C1-INH-HAE. This would also explain the low correlations observed between the “Concern about offspring” domain and the SF-36 domains and their physical and mental summaries, as this aspect is specific for C1-INH-HAE and other hereditary diseases and could not be adequately considered by a generic questionnaire such as the SF-36.

For the construct validity, the recommended quality criterion that at least 75% of pre-established hypotheses be confirmed [16] was fulfilled using the combination of “a priori” and “post hoc” defined criteria with an 87.5% (7/8) of confirmed hypothesis. It is worth noting that patients who presented some factors which could be a priori considered determinants of the impact on HRQoL (such as having undergone intubation or a tracheotomy at least once) showed no significant differences. Past intubation or tracheotomy procedures may have no impact on current HRQoL as they may have been performed years earlier and, as a result, are no longer of concern at the time of questioning. Therefore, it would not be a good criterion on which to assess the construct validity of the instrument. This issue also arose in the psychometric study of the HAE-QoL [5]. With respect to other factors, such as the effect of long-term prophylaxis (LTP), no significant differences were observed in the “RP”, “RE”, and “SF” domains, in which there was a ceiling effect, and in the “VT” domain, which had neither floor nor ceiling effects. The variable of having angioedema symptoms in the last 6 months had no significant differences in the “PF”, “SF”, and “RE” domains, and all of them exhibited a ceiling effect.

Analysis of the discriminant validity of the SF-36, shows discrimination was good among patients with different levels of C1-INH-HAE severity in the last 6 months. There were significant differences in the 3 scoring groups across all domains and the two summaries, with HRQoL lower when the severity of the disease was higher. Such data show the SF-36 capacity to distinguish among these known groups.

An examination of test–retest reliability shows that the generic SF-36 questionnaire is stable in patients with C1-INH-HAE, as it meets the recommended standards of the GA2LEN taskforce for assessing Patient-Reported Outcomes on allergy [51]. This means that SF-36 is stable in patients with C1-INH-HAE.

The MCID calculated by two different distribution methods shows that the generic SF-36 questionnaire could be useful as a tool for detecting real changes in HRQoL in patients with C1-INH-HAE. MCID has been evaluated to a lesser degree than other psychometric properties in other studies that validate SF-36 in other diseases.

The main limitations of the study include the post hoc design of the study and the different sample sizes among participating countries.

Despite these disadvantages, the internationally accepted scientific recommendations for the validation of HRQoL measurement instruments have been followed [15, 17], and data on reliability and content and construct validity have been highly acceptable. Moreover, as an international multicentric study, it provides results on which to base generalization, unlike studies with less diverse patient samples.

Conclusions

This is the first study to assess the psychometric properties of a generic instrument, the SF-36, in adult patients with C1-INH-HAE.

The SF-36 psychometric properties have shown that it has a limited content validity, revealing with a high ceiling in many of the items and in several domains. Despite this limitation, it shows no floor effect and has a high internal consistency, together with good construct validity and high reliability and reproducibility in C1-INH-HAE.

This validation will facilitate the interpretation of HRQoL studies performed using the SF-36 in adult C1-INH-HAE patients and lays the groundwork for future studies on how C1-INH-HAE affects HRQoL in comparison with other diseases in which SF-36 is used to assess HRQoL.