Background

Hemophilia is a rare X-linked chronic genetic blood coagulation disorder seen predominantly among males. It is caused by a deficiency of clotting factors VIII or IX in blood plasma. It affects about 400,000 people across the world and about 20,000 in the United States (US) [1, 2]. Patients with hemophilia experience bleeding into joints and muscles which in severe cases can lead to chronic pain, reduce the range of joint motion and eventually progress to chronic arthritis [3].

For patients living with hemophilia, merely treating and preventing bleeding episodes and other physical symptoms using clotting factor concentrates is not enough. Patients with hemophilia must be careful about participating in activities such as contact sports because immediate bleeding may ensue. Long-term impairments in mobility and impact on functional status due to reduced range of joint motion may also limit the activities in which patients can participate. This can affect social participation and peer integration [4, 5]. Employment and occupational disabilities can occur as well. Also, the disease can influence the mental well-being of patients within whom signs of depression, anxiety and psychological distress are common [6]. Thus, the physical, mental and social consequences of the disease serve to reduce the HRQOL of patients. Therefore, HRQOL assessment is now recognized as an important health outcomes endpoint which can help decide and optimize treatment options among patients with hemophilia. Overall HRQOL is a multidimensional, subjective concept which incorporates physical functioning, psychological functioning, social interaction, and somatic sensation [7].

One key aspect of measuring HRQOL of a population is the selection of the appropriate instrument. The SF-12 Health Survey version 2 (SF-12v2) is a generic measure of HRQOL [8]. Generic instruments allow for comparison of patients’ health status across disease states and conditions [9]. Generic HRQOL measures may be less sensitive to certain key aspects or symptoms of a particular disease state and as a result may not be able to capture small changes in the HRQOL of patients having a certain disease [10]. On the other hand, disease-specific HRQOL measures focus on problems that may be specific to a disease population. However, these instruments cannot be used to compare HRQOL across different disease states. Such information may be important to clinicians and policy makers in making key treatment and resource allocation decisions. Given their underlying utility, it is necessary to obtain evidence about the appropriateness of use of generic HRQOL measures (such as the SF-12v2) in different patient population [11].

Initial evidence regarding the reliability and validity of the SF-12 in the general US population was provided by Ware and colleagues in 1996 using data from the National Survey of Functional Health Status (NSFHS) and the Medical Outcomes Study (MOS) [12]. The instrument has since been evaluated for use among general populations in several different countries such as Denmark, Germany, United Kingdom, Netherlands, United States and others [13,14,15,16] as well as among patients with different diseases including Parkinson’s disease, stroke, diabetes mellitus, inflammatory rheumatic disease, hemodialysis [17,18,19,20,21]. The results of these studies suggest that the SF-12 has good psychometric properties. The SF-12v2 is an abbreviated version of the SF-36, which is one of the most commonly used generic HRQOL measures [9].

Although the SF-12v2 has been used to assess the HRQOL of hemophilia patients [22, 23], its psychometric properties have never been established among patients with hemophilia. To ascertain that the SF-12v2 is appropriate for use among hemophilia patients, its psychometric properties must be established in this population. Therefore, this study evaluated the psychometric properties of the SF-12v2 among adult patients with hemophilia in the US. The psychometric properties of the SF-12v2 assessed included: convergent validity, discriminant validity, known-groups validity, factorial validity using confirmatory factor analysis, and internal consistency reliability. Presence of floor and ceiling effects was also examined.

Methods

Setting

A cross-sectional design using a web-based, self-administered survey was distributed to a national convenience sample of adults with hemophilia in the United States. Study approval was obtained from the University of Mississippi Institutional Review Board under the exempt status.

Potential participants were sent an email explaining the objective and scope of the study. This email assured the respondents that their information would be kept confidential. The email also contained a URL link to the survey which was programmed in Qualtrics [24]. The survey was open from October 31, 2015 to January 31, 2016. All respondents were provided $10 Amazon gift cards for participation in the study.

Participants

The sample included adults (≥ 18 years of age) with hemophilia A or B. Patients with other blood coagulation disorders such as Von Willebrand’s disease were excluded from the study sample. The sample was recruited with the help of a market research vendor company called Rare Patient Voice [25] which maintains a panel of hemophilia patients who were primarily recruited at hemophilia-related conferences and patient advocacy group meetings across the US. Considering hemophilia is a rare disease, patients were also recruited using a Facebook community of hemophilia patients called Hemo Friends and at the University of Mississippi (UMMC) hemophilia treatment center (HTC) to maximize the analyzable sample size for the current study. In this study, 169 (77.5%) patients were recruited using the Rare Patient Voice panel, 44 (20.2%) from the Hemo Friends Facebook community, and 6 (2.3%) from UMMC. Given the nature of the statistical analysis plan for this study (i.e., confirmatory factor analysis), an a priori sample size of 200 patients with hemophilia was considered to be adequate [26].

Instruments

Patients with hemophilia were asked to describe their HRQOL using the SF-12 Health Survey Version 2 (SF-12v2). The SF-12 is the shorter version of the SF-36 [12]. The SF-12v2 is a generic health profile instrument with 12-items which compose 8 health concepts forming a health profile [8]. These eight sub-domains are: physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role-emotional (RE), and mental health (MH). These eight sub-domain scores can be weighted and summarized into two component scores – the physical component summary (PCS) score and the mental component summary (MCS) score. According to the theoretical test model, the items from the physical functioning, role-physical, bodily pain, and general health sub-domains are primarily indicators of PCS while vitality, social functioning, role-emotional, and mental health items are primarily indicators for MCS [18]. For the SF-12v2, the norm-based PCS and MCS scores for the general US population have a mean of 50 and a standard deviation of 10 with higher scores indicating a better health status [8]. PCS, MCS, and sub-domain scores were calculated using the scoring software available from Optum (using 2009 US norms).

Hemophilia-related symptom severity was reported using the Patient Global Impression of Severity (PGI-S). The PGI-S is a single self-reported item that asks respondents to rate the severity of their disease condition. In this study, the PGI-S was worded: “When thinking about all of the hemophilia-related symptoms that you may have experienced during the past 4 weeks, please indicate the one option that best describes how your symptoms overall have been: (1) no symptoms, (2) mild symptoms, (3) moderate symptoms, or (4) severe symptoms.” A similar self-reported symptom severity measure has been used in studies of males with lower urinary tract symptoms secondary to benign prostatic hyperplasia and women with stress urinary incontinence [27, 28].

Analysis

A descriptive analysis of the individual SF-12v2 items was conducted in terms of means and standard deviations (SD). Missing data, if any, was reported in terms of frequencies and percentages on a per-item level. Kurtosis and skewness coefficients were also calculated and variables with absolute value of the skew index > 3.0 and kurtosis index > 10.0 were considered to be non-normal [26]. Descriptive statistics were calculated for all other study variable in the form of frequencies and percentages for categorical variables, means and standard deviations for continuous variables.

Confirmatory factor analysis (CFA) was used to evaluate the factor structure of the SF-12v2 among patients with hemophilia. CFA is a structural equation modeling technique which can be used to evaluate the fit of a theoretically-based measurement model. Three measurement models were tested. First, a 1-factor model (Model 1) which forced all the SF-12v2 items to load on a single latent factor. Second, a 2-factor model based on the approach adopted by Okonkwo and colleagues (Model 2) [18] where the PF, RP, and BP items were specified to load on a latent physical health factor (LPF), the RE and MH items were specified to load on a latent mental health factor (LMF), and the three GH, VT, and SF items were allowed to load on both latent factors. The residuals for the two PF items were allowed to correlate in both the 1-factor and 2-factor models as modeled by Okonkwo et al. [18]. Third, a 2-factor model employed by Maurischat and colleagues (Model 3) [20] where the GH, PF, RP, and BP items were specified to load on a LPF and the RE, MH, VT, and SF items were allowed to load onto a LMF. The residuals for each of the two PF, RP, RE, and MH items were allowed to correlate as modeled by Maurischat et al. [20]. In both models 2 and 3, LPF and LMF were allowed to correlate.

Considering that the items on the SF-12v2 are ordered categorical variables with limited response options along with the possibility of item responses that skewed toward one end (i.e., the presence of floor or ceiling effects), weighted least squares estimation (WLSMV) for categorical indicators was used to quantify the hypothesized relationships [29]. All CFA models were estimated using Mplus version 7.31 (Muthen & Muthen, Los Angeles, CA). Model fit for each model was assessed using the following five fit statistics: χ2 statistic, the root mean square error of approximation (RMSEA), the Tucker Lewis Index (TLI), the comparative fit index (CFI), and the weighted root mean square residual (WRMR). Bagozzi and Yi (2012) suggest that for a well-fitting model, the RMSEA, TLI, CFI must be ≤0.08, ≥0.92, and ≥ 0.93 respectively [30]. For a good fitting model, WRMR must be less than or equal to 1 [31].

Given that the LPF and LMF loadings are sample specific, an additional model was fit to estimate the correlations of the latent factors with PCS and MCS scores from the standard algorithm.

Factor loadings from the CFA models and item-scale correlations were used to assess convergent validity among the items. The size of the factor loading is an indication of the amount of variance in a particular item that is explained by the latent construct. For the current study, standardized factor loadings that were statistically significant and greater than 0.5 were considered to be indicative of good convergent validity [26, 31]. Statistical significance of the factor loadings was considered as a minimum requirement because a significant loading could be weak or moderate in strength.

Higher item-scale correlations (Pearson’s correlation between score on an individual item in a sub-domain with the total score on the underlying sub-domain) indicate that expected items in the same sub-domains correlate strongly with each other. This approach of establishing convergent validity has been used by previous studies [32]. Item-scale correlations of 0.1–0.29 were considered small, 0.3–0.49 as moderate, and ≥ 0.45 was considered to be suggestive of strong [33]. A strong correlation of the items belonging to the GH, PF, RP, and BP sub-domains with PCS was hypothesized. Similarly, a strong correlation between items representing the RE, MH, VT, SF sub-domains with MCS was hypothesized.

To assess latent construct discriminant validity, the fit of the best fitting 2-factor model obtained from the factorial validity analysis was compared to that of a similar model where the latent factor correlation (i.e., correlation between LPF and LMF) was fixed to 1; this test was carried out using the DIFFTEST option in Mplus [34, 35]. A significant difference in the model fit (χ2 statistic) between the two models was suggestive of discriminant validity [36].

To assess item discriminant validity [37], lower item-to-other scale correlations (≤ 0.40) were suggestive of adequate discriminant validity. The reasoning behind this technique was that items from different domains should have low or no correlations with each other. A weak correlation of the items belonging to the GH, PF, RP, and BP sub-domains with MCS was hypothesized. Similarly, a weak correlation between items representing the RE, MH, VT, SF sub-domains with PCS summary scale score was hypothesized.

Known-groups validity is the ability of an instrument to differentiate among individuals who have varying levels of disease severity. One-way ANOVA was used to compare mean PCS and MCS scores from the SF-12v2 across hemophilia patients with different symptom severity levels measured using the PGI-S.

In order to evaluate the internal consistency reliability for the SF-12v2, Cronbach’s alpha (α) was calculated for the LPF and the LMF items. An α ≥ 0.70 was considered to be suggestive of adequate internal consistency reliability [32].

To assess the floor and ceiling effects of the SF-12v2, the percentage of adults with hemophilia with the least possible and the maximum possible PCS and MCS were determined. Floor and ceiling effects were considered to be present if more than 20% of the respondents received the lowest or the highest possible PCS or MCS score [32, 38]. Given the estimation technique used for the CFA and the treatment of the items as categorical indicators, floor and ceiling effects of the individual SF-12v2 items were not considered to be problematic.

Results

The final study sample consisted of 218 adults with hemophilia (Table 1). The majority of the sample included patients with hemophilia A (77.5%), males (79.5%), and Caucasians (68.5%). The mean age of the study sample was 35.45 (12.3) years. Hepatitis C (36.5%) and depression (38.4%) were the most commonly reported comorbidities.

Table 1 Demographic and clinical characteristics of the sample

Table 2 shows the mean scores, and skewness and kurtosis coefficients on a per-item basis for the SF-12v2. The skewness and kurtosis coefficients for all items on the SF-12v2 were found to be within a range of − 1.00 and 0.63. The mean PCS was 43.68 (SD = 10.20) and the mean MCS was found to be 46.48 (SD = 10.09) among adults with hemophilia. Mean PCS and MCS scores were lower than the norm scores for the general healthy US population. This indicated that adults with hemophilia had a worse overall HRQOL as compared to the US norm population. There was no missing data for any of the SF-12v2 items.

Table 2 SF-12v2 item-level characteristics and PCS/MCS scores among adults with hemophilia

The three measurement models tested to examine the factorial validity of the SF-12v2 among adults with hemophilia can be found in Fig. 1. The model fit indices for the three models can be found in Table 3. The two-factor model based on the approach used by Maurischat et al. [20] had the best fit among the three models (Chi-square [df] = 270.183 [49]; CFI = 0.952; TLI = 0.935; RMSEA [90% CI] = 0.144 [0.127–0.162]; WRMR = 1.250). Based on the modification indices for Model 3, residuals for items 9 and 10 (i.e., MH09 and VT10) were correlated in addition to the correlated residuals already specified in the Maurischat et al. model. This significantly improved model fit of the final model (Chi-square [df] = 172.778 [48]; CFI = 0.972; TLI = 0.962; RMSEA [90% CI] = 0.109 [0.092–0.127]; WRMR = 0.947).

Fig. 1
figure 1

a: Single-Factor Model (Model 1) for the SF-12v2. b: Two-Factor Model (Model 2) for the SF-12v2 based on Okonkwo et al. c: Two-Factor Model (Model 3) for the SF-12v2 based on Maurischat et al.

Table 3 Summary of model fit indices for the SF-12v2 confirmatory factor models

For the best fitting model (i.e., Model 3 in Fig. 1c), the correlation between LPF and the PCS score was 0.996 (p < 0.0001) while the correlation between LMF and MCS was > 0.999 (p < 0.0001). This suggested that there was a high and significant correlation between the sample-specific latent factors (i.e., LPF and LMF) and the PCS and MCS scores calculated using population-based weighting coefficients.

The standardized factor loadings for the final study model (Fig. 1c) can be found in Table 4. All factor loadings were statistically significant (p < 0.05). Most factor loadings (except MH09 on LMF) were greater than 0.5. Table 5 depicts the item-scale correlation matrix. Items comprising the PF, RP, GH, and BP sub-domains had a strong and statistically significant correlation with the PCS. While RE, MH, VT, SF items were strongly correlated with the MCS summary scale score. Overall, the standardized factor loadings and item-scale correlations suggested acceptable convergent validity for the SF-12v2 among adults with hemophilia.

Table 4 Standardized factor loadings for the final two-factor model of HRQOL (Model 3) for the SF-12v2 among adults with hemophilia
Table 5 Item-scale correlations for the SF-12v2 among adults with hemophilia

The fit of the final two-factor model (Model 3) where the correlation between LPF and LMF was freely estimated was compared to that of a model where the correlation between LPF and LMF was fixed to one. Although the correlation between LMF and LPF was high (r = 0.83), the test yielded a significant difference in the chi-square value (Δχ2 [df] = 18.686 [1]; p < 0.0001), suggesting that LPF and LMF are not perfectly correlated (i.e., latent construct discriminant validity) [34, 36]. Items comprising the PF, RP, GH, and BP subdomains had a weak correlation with the MCS summary scale score. While RE, MH, VT, SF items had a weak to moderate correlation with the PCS summary scale score, supporting item discriminant validity. Overall the SF-12v2 was found to have acceptable discriminant validity among adults with hemophilia.

The ability of the SF-12v2 to discriminate among hemophilia patient groups defined by the PGI-S (i.e., no symptoms, mild symptoms, moderate symptoms, and severe symptoms) was assessed using a one-way ANOVA (Table 6). Differences in PCS and MCS scores between individual groups were assessed using Tukey’s honestly significant difference (HSD) tests. The mean PCS (50.10 vs 47.39 vs 40.79 vs 35.24; p < 0.0001) and MCS (50.61 vs 46.80 vs 46.96 vs 42.29; p = 0.007) scores were significantly different across the four symptom severity levels. A definite gradation was observed in terms of PCS and MCS mean scores with increasing levels of symptom severity on the PGI-S.

Table 6 Known-groups validity for the SF-12v2 components among adults with hemophilia

The internal consistency reliability for the SF-12v2 was found to be satisfactory with the Cronbach’s alpha value of 0.848 for LPF and 0.785 for LMF.

Less than 20% of the study sample received the lowest or highest possible PCS or MCS summary scale score which was indicative of the absence of floor and ceiling effects. The minimum and maximum PCS score for the study sample was 16.95 and 67.63, respectively. The minimum and maximum MCS score was 15.75 and 68.91, respectively. The minimum and maximum PCS score for the general US population as per the SF-12v2 scoring manual is 4.92 and 69.24, respectively [8]. While the minimum and maximum MCS score for the US norm population was 8.14 and 73.24, respectively. Therefore, none of the respondents from our study sample received the lowest or highest possible score as compared to the general US population.

Discussion

As HRQOL continues to evolve as a key endpoint among patients with hemophilia, so does the need for psychometrically-sound generic instruments which measure HRQOL. Such instruments not only allow one to ascertain the burden of hemophilia on patient HRQOL, but also compare their HRQOL to the healthy US population and across subgroups of individuals suffering from other diseases. The current study assessed the validity (factorial, convergent, discriminant, and known-groups) and internal consistency reliability of the SF-12v2, a generic measure of HRQOL, among adults with hemophilia.

Factorial validity of the SF-12v2 was tested by examining the model fit indices across three different models. A two-factor model based on the approach adopted by Maurischat and colleagues [20] was found to be the best fitting model in this population. Previous studies have also conceptualized the SF-12v2 as a two-factor model where items related to the GH, PF, RP, BP subdomains loaded onto a LPF while items related to the RE, MH, VT, SF subdomains loaded onto a LMF and the error covariance for items which belonged to the same subdomain (PF, RP, RE, and MH) were correlated. Items belonging to the same subdomain were expected to have additional commonality not explained by the latent factors due to similarities in item wording (i.e., a shared method effect), which warranted the specification of residual correlations for these items. A similar two-factor model for the SF-12v2 was found to have acceptable fit among patients with inflammatory rheumatic disease [19] and diabetes mellitus [20]. In the current study, modification indices suggested an additional residual correlation between MH09 (felt calm and peaceful) and VT10 (had a lot of energy). Residuals for these items on the SF-12 have been previously shown to be correlated by McBride et al. [39]. in a sample of diagnostic orphans (i.e., adults with a type of alcohol dependence or use disorder) and by Fleishman and Lawrence [40] in a population of non-institutionalized US civilians [39, 40].

The SF-12v2 was found to have good convergent and discriminant validity among adults with hemophilia. These findings were supported by factor loadings, the latent factor correlation, and correlations between the individual items and SF-12 subdomains. Although the latent factor correlation was high (0.83), the test for construct discriminant validity suggested that this correlation was significantly different from one. Additionally, a one-factor model had the worst model fit in the factorial validity analysis. These results provide evidence that a two-factor HRQOL model was appropriate and that the two latent factors (LPF and LMF) did indeed measure distinct concepts. This is important as a two-factor model forms the basis of the commonly reported PCS and MCS scores. Because other studies have reported have reported smaller correlations between latent physical and mental factors using the SF-12 (i.e., range 0.5–0.7) [18,19,20], future research should examine reasons for the higher latent factor correlation found in the current study. The high correlations between the sample-specific latent factors and the PCS and MCS scores calculated using the standard scoring approach also provide support for the use of the summary scores in HRQOL research with adults with hemophilia. Such high correlations have also been observed in studies with different populations and slightly different factor structures for the latent variables [18], providing evidence of the generalizability of the standard scoring approach for the component summaries.

The results of the current study lend support to the known-groups validity of the SF-12v2 in terms of its ability to discriminate across different symptom severity levels among adults with hemophilia. PCS and MCS means were found to be significantly different across the four symptom severity groups. Additionally, significant decreases in PCS scores were associated with increasing levels of symptom severity. Although the severe symptoms group and no symptoms group were notably different on MCS scores as expected, the overall linear trend observed with PCS scores and symptom severity was not seen in the case of MCS scores. The mean MCS for the moderate symptom severity group was slightly greater, although not statistically different, than the MCS for the mild symptom severity group.

The internal consistency reliability of the LPF and LMF summary scales was found to be good. The PCS and MCS scale scores did not indicate the presence of any floor or ceiling effects. These results may indicate that the SF-12v2 is sensitive in capturing the variation in HRQOL among adults with hemophilia.

The results of the current study must be interpreted in the light of certain limitations. The cross-sectional nature of the study precluded the assessment of the predictive validity as well as test-retest reliability of the SF-12v2. Future studies should adopt a longitudinal design in order to explore these aspects of the psychometric profile of the SF-12v2. Adults with hemophilia who participated in this study are likely to have higher physical functioning because of their ability to participate in survey research. Also, future studies must examine the measurement invariance of the SF-12v2 among adults with hemophilia in addition to testing its psychometric properties in order to ensure the appropriateness of its use in this patient population.

This was the first study to assess the psychometric properties of the SF-12v2 among adults with hemophilia. Considering that hemophilia is a rare genetic disorder, most previous published reports have employed smaller sample sizes. To the best of our knowledge, this is the first US-based study to capture the HRQOL of such a large population of adults with hemophilia. The study sample included an even distribution of patients from all regions of the country which ensures the generalizability of the study results to most adults with hemophilia in the US.

Conclusions

This study provides evidence about the acceptable psychometric properties of the SF-12v2 among adults with hemophilia in the US. The SF-12v2 was found to be a valid and reliable generic measure of HRQOL among adults with hemophilia. The scale demonstrated adequate factorial, convergent, discriminant, and known-groups validity. The scale was found to have adequate internal consistency reliability and no evidence of floor or ceiling effects was found. Overall, the results provide basis for the future use of the SF-12v2 among adults with hemophilia and incorporating the HRQOL information obtained from these studies into health policy and clinical decision making.