Background

Cardiovascular disease remains the most common cause of disease burden in Europe, with coronary heart disease (CHD) its most common manifestation [1]. Patient-centered outcomes research emphasizes the importance of outcomes that patients notice and are aware of such as symptoms, functional status, and health-related quality of life (HRQoL). Thus, patient perceived health status is an important measure of health [2], reflecting the effects of disease and treatment from the patient’s own perspective so helping to identify health-related problems, to facilitate communication with patients, to follow the trajectory of disease and to evaluate intervention and treatment effects [3, 4].

To identify patient needs and evaluate intervention and treatment effects, HRQoL instruments, which come in two major designs — generic and disease-specific — need to be reliable, valid, and responsive [3,4,5,6]. As generic instruments address multiple aspects of HRQoL across a broad spectrum of diseases or patient groups, they are less specific and have poorer sensitivity than disease-specific instruments [7].

Coronary heart disease-specific HRQoL instruments have been designed for patients with heart failure, arrhythmias, angina pectoris, and myocardial infarction [8] with more than one diagnosed heart condition complicating the decision of which disease-specific HRQoL instrument to use. To overcome this problem, core disease-specific instruments have been developed to cover different heart disease diagnoses [8]. The HeartQoL was developed as a core CHD disease-specific HRQoL instrument in patients with angina pectoris, myocardial infarction, and ischemic heart failure. The instrument includes 14 items with a 10-item physical and a 4-item emotional subscale [9, 10].

The HeartQoL has been psychometrically evaluated in 11 peer-reviewed publications in patients speaking at least one of 27 European languages, Mandarin Chinese, Persian (Farsi) and Malay (Bahasa Malaysia) [11]. The dimensionality of the HeartQoL has mainly been evaluated using Mokken scale analysis or factor analysis. While the Mokken scale analyses have confirmed the suggested two factor model [1,2,3,4,5,6], evaluations with exploratory and/or confirmatory factor analyses have shown some problems with cross-loadings, correlated error variances, and/or poor model fit in different language versions [7,8,9, 12,13,14]. One problem with Mokken scale analysis is that only the Loevinger’s H coefficient has been used to justify scalability. Thus, potential issues with correlated error variances or local dependency are unknown based on these studies. Further, as the Mokken scale analysis is conducted on each subscale individually, potential issues with cross-loadings are not addressed. From that perspective, confirmatory factor analysis (CFA) models taking the ordinal nature of data into account, or multidimensional item response theory models, seem to be the most appropriate methods to use in future evaluations of the HeartQoL.

Translations of HRQoL instruments is another critical aspect that can have serious validity and reliability consequences. A poor translation process leading to a questionnaire that is not equivalent to the original language questionnaire limits the comparability of responses across populations with different languages or cultures [15]. Therefore, aim of this study was to translate and evaluate the psychometric properties of the HeartQoL in patients with CHD in Iceland.

Methods

Participants

Patients with CHD were recruited from two main hospitals in Iceland between October 2017 and May 2019 for this psychometric study. All patients ≥ 18 years who were admitted electively or acutely for CHD (angina, acute myocardial infarction, acute myocardial ischemia, elective percutaneous coronary intervention and/or coronary artery bypass graft) were eligible for the study. Information on age, sex, diagnosis, and treatment was collected from medical records. Eligible patients were asked to complete a battery of questionnaires at discharge. Patients who did not speak or write Icelandic and those unable or unwilling to respond to the questionnaire were excluded from the study.

A total of 446 patients consented to participate in the study; 50 patients had complete missing HeartQoL data, and the final sample included 396 patients (88% response rate). The mean age was 64.4 (SD = 8.8) years; 315 (79.6%) were males; most patients were cohabiting (n = 291, 73.5%) and had at least upper secondary education (n = 259, 68.3%). A subsample (n = 47), of these 396 patients also completed the HeartQoL after 14 days to evaluate test-retest reliability. Patients who completed the test-retest assessment were significantly more often men (p = 0.031) with no age, cohabitation, education, or admission diagnosis differences. Detailed information about the sample is presented in Table 1.

Table 1 Sample characteristics (n = 396)

Instruments

Consenting patients completed demographic questions, the HeartQoL to assess disease-specific HRQoL, the Short-Form 12v2 Health Survey (SF-12v2) [16] to assess generic HRQoL, and the Hospital Anxiety and Depression Scale (HADS) [17, 18] to assess symptoms of anxiety and depression.

HeartQoL

The 14-item HeartQoL is divided into two subscales, a 10-item physical and a 4-item emotional subscale. All 14 items are scored on a 4-point Likert-type scale ranging between 0 (‘A lot’) and 3 (‘No’). The subscale scores are calculated as the mean rating across the items with higher scores indicating better HRQoL [9]. The HeartQoL was translated and culturally adapted from English to Icelandic using standard procedures as described by Ware (1995) [19] and the Medical Outcomes Trust committee (2002) [5]. Two individuals, one a health care professional, the other not, both fully bilingual in Icelandic (as their first language) and English independently translated the HeartQoL into Icelandic. The backtranslation into English was conducted by two different translators, one a healthcare professional, the other not, both also fully bilingual in Icelandic and English and blinded to the original version. The back-translations were sent to the developer of the HeartQoL who compared them to the original English version of the HeartQoL. Items not accurately translated were discussed and re-translated by the research group in collaboration with the developer of the instrument until a consensus was reached. The Icelandic version of the HeartQoL can be found in Appendix.

Short-Form 12v2 Health Survey

The SF 12v2 is a 12-item, generic HRQoL instrument designed to measure health status from the patient’s perspective and covers eight health dimensions: Physical functioning (PF), Role physical (RP), Bodily pain (BP), General health (GH), Vitality (VT), Social functioning (SF), Role emotional (RE), and Emotional health (EH). Based on a scoring algorithm, each dimension has a possible score range between 0 and 100 with higher scores indicating better HRQoL. In addition, two component summary scores can be calculated, one each for physical (PCS-12) and mental (MCS-12) health [16]. The instrument has been shown to be reliable and valid for use in patients with CHD [20]. The Icelandic version of the SF-36, from which the SF-12v2 was derived, has acceptable psychometric properties [21].

Hospital Anxiety and Depression Scale

The 14-item HADS is designed to screen for anxiety and depression. Each item is scored on a response scale ranging between 0 and 3, with subscale scores between 0 and 21 with higher scores implying more problems [17, 18]. The Icelandic version of the HADS has demonstrated satisfactory psychometric properties among university students [22]. Evaluated with ordinal alpha, both subscales demonstrated good internal consistency in the present study, 0.90 for anxiety and 0.88 for depression.

Data analysis

Descriptive statistics were used to present the sample, mean and standard deviations for continuous normally distributed data, median and quartiles for ordinal data, and frequencies for nominal data. Unpaired t-test and chi-square test were used to compare patients who did and did not complete the test-retest assessment. To consider the ordinal nature of item and scale scores, all analyses, except for the evaluation of test-retest reliability, were based on non-parametric statistics.

Item analysis was used to evaluate the basic measurement properties on an item level [23]. The score distribution was evaluated using median, quartiles, frequencies, skewedness, and kurtosis statistics. A normal distribution has a skewness and kurtosis value close to 0. Item discrimination was evaluated using item-total correlations adjusted for overlap based on polychoric (polyserial) correlations (rpc). Higher item-total correlations imply a higher discriminating capacity and should therefore be at least > 0.2 [24] with higher cut-off levels also reported in the literature.

The factor structure was evaluated using CFA for ordered category indicator variables, that is, the weighted least squares means and variance adjusted (WLSMV) method, and polychoric correlations [25]. Two CFA models were considered: a baseline two-factor model, and a modified two-factor model. The model fit was evaluated in terms of factor loadings and goodness-of-fit indices including the Root Mean Square Error of Approximation (RMSEA) with 90% confidence interval (CI), Standardised Root Mean Square Residual (SRMR), Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI). Satisfactory model fit was defined according to Hu and Bentler [26] with RMSEA close to 0.06 or below, CFI and TLI close to 0.95 or greater and SRMR close to 0.08 or below. Missing data were treated with listwise deletion since the WLSMV estimation cannot handle these, in contrast to the full information maximum likelihood (FIML).

Convergent and divergent validity were examined simultaneously using the multitrait-multimethod (MTMM) approach [27]. Spearman correlations (rs) were used to estimate associations between the HeartQoL physical and emotional subscales and the PCS-12 and MCS-12 subscales. To support convergent and divergent validity, the strongest correlations are expected between similar constructs measured by different methods (HeartQoL physical subscale and PCS-12 and HeartQoL emotional subscale and MCS-12), while the weakest correlations are expected between different constructs measured by different methods (HeartQoL physical subscale and MCS-12 and HeartQoL emotional subscale and PCS-12). Convergent and divergent validity were also evaluated by correlating the HeartQoL scales with the eight health dimensions of the SF-12v2 and HADS subscales of anxiety and depression. To support convergent and divergent validity, the HeartQoL physical subscale should correlate more strongly with the SF-12v2 physical dimensions (PF, RP, BP and GH) while the HeartQoL emotional scale should correlate more strongly with the SF-12v2 mental dimensions (VT, SF, RE and ME). Further, the HeartQoL emotional subscale should correlate more strongly with symptoms of anxiety and depression measured by the HADS compared to the HeartQoL physical subscale. All analyses of the convergent and divergent validity were based on the original measurement model of the HeartQoL.

An ordinal version of Cronbach’s alpha for Likert-type scales was used to estimate the internal consistency reliability on the original measurement model of the HeartQoL [28]. Ordinal alpha has the same interpretation as the traditional Cronbach’s alpha coefficient, i.e., > 0.7 was considered satisfactory [24, 28]. Cronbach’s alpha was also calculated to facilitate comparisons with previous studies. Component reliability was estimated with ordinal omega and calculated for both the bassline CFA model and the modified CFA model [29].

Test-retest reliability of the scale scores, based on the original measurement model of the HeartQoL, was evaluated using intraclass correlations (ICC) (two-way mixed effect, single measurement, absolute agreement); values between 0.50 and 0.75, 0.75 and 0.90 and > 0.90 indicate moderate, good and excellent reliabilities, respectively [30].

Statistical significance was set at p < 0.05. All analyses were conducted with R 4.2.3 (R Foundation for Statistical Computing, Vienna, Austria), including the following packages: irr 0.84.1, lavaan 0.6–13, psych 2.2.9, semTools 0.5–6, sjmisc 2.8.9, and summarytools 1.0.1.

Ethical considerations

The study was approved by the National Bioethics Committee for Medical Research Ethics (17–159) and permission to examine patient records was granted by hospital authorities. The study was conducted in accordance with the Helsinki Declaration [31].

Results

Item analysis and score distribution

The HeartQoL items demonstrated either a discrete uniform distribution (items 3–6, 8 and 13) or a positive skewed distribution with ceiling effects (1, 2, 7, 9–12 and 14). The observed inter-item correlations (rpc) were between 0.74 and 0.88 for the physical subscale and 0.74–0.81 for the emotional subscale, indicating good discrimination. Missing data among the items was low (2–4%) with item 13 not answered by 11% of the patients (Table 2). Both subscales demonstrated a negative skewed distribution, somewhat more pronounced for the emotional scale than the physical scale (skewness = -0.67 vs. -0.37) (Table 3).

Table 2 HeartQoL item analysis (n = 396)
Table 3 Score distribution for the HeartQoL, SF-12v2, and HADS (n = 325–390)

Factor structure

Factor loading for the two-factor baseline model ranged between 0.74 and 0.91 for the physical and 0.74–0.93 for the emotional subscale (Table 3). While the CFI, TLI, and SRMR indicated satisfactory model fit, the RMSEA did not (Table 4). Inspections of the standardized residual covariance matrix (the difference between the model implied covariance matrix and the sample covariance matrix) and the modification index identified problems with cross-loadings for items 8 and 9 and correlated error variances between items 3 and 4. The suggested revisions are conceptually defendable; items 8 and 9 reflect both physical and emotional health while item 3 and 4 capture physical efforts but at different levels. Consequently, a modified two-factor model was examined. All fit indices improved with the RMSEA close to the suggested criteria of 0.06. Additionally, the RMSEA did not deviate significantly from the criteria of 0.05 (p = 0.085) and the 90% CI covered both the 0.05 and 0.06 criteria (Tables 4 and 5).

Table 4 Standardized factor loadings, error variances (brackets), factor correlations, and reliability estimates for the HeartQoL (n = 314)
Table 5 Goodness-of-fit indices for the confirmatory factor analyses of the HeartQoL (n = 314)

Convergent and divergent validity

The MTMM analysis supported convergent and divergent validity. As expected, the strongest correlations were found between the similar constructs measured with different scales (rs = 0.73 and 0.63) with the weakest correlation between dissimilar constructs measured with different scales (rs = 0.35 and 0.29) (Table 6).

Table 6 Multitrait-multimethod (MTMM) based on Spearman rank order correlation coefficients between the HeartQoL physical and emotional subscales and the SF-12v2 physical and mental component scores (n = 286)

The correlations between the HeartQoL, SF-12v2, and HADS supported both convergent and divergent validity. As expected, the HeartQoL physical subscale correlated more strongly with the SF-12v2 physical health dimensions (i.e., PF, RP, and BP; rs = 0.58–0.67) than with the mental dimensions (i.e., SF, RE, and EH; rs = 0.29–0.47). Similarly, the HeartQoL emotional subscale correlated more strongly with the SF-12v2 mental health dimensions (rs = 0.49–0.65) than with the physical dimensions (rs = 0.28–0.39). Furthermore, the HeartQoL emotional subscale correlated more strongly with HADS symptoms of anxiety and depression than the HeartQoL physical subscale (rs = -0.69 and − 0.59 and rs = -0.34 and − 0.55, respectively). Despite the expected correlation pattern, the differences between the two subscales were minor regarding the SF (rs = 0.47 vs. 0.50) and RE (rs = 0.42 vs. 0.49) in the SF-12v2 and HADS depression (rs = -0.55 vs. -0.59) (Table 7).

Table 7 Convergent and divergent validity based on Spearman rank order correlation coefficients between the HeartQoL subscales and SF-12v2 health dimensions and HADS subscales (n = 286)

Reliability

Internal consistency reliability, as measured by ordinal alpha coefficient, was high for the HeartQoL physical (0.96) and emotional (0.90) subscales. The corresponding traditional Cronbach’s alpha values were 0.93 and 0.84, respectively. Composite reliability was 0.96 and 0.90 for the physical and emotional subscales in the baseline CFA model and 0.85 and 0.63 in the modified CFA model (Table 4).

Test-retest reliability was good for the HeartQoL physical (ICC = 0.84, 95% CI = 0.72, 0.91, n = 46) and emotional (ICC = 0.78, 95% CI = 0.64, 0.88, n = 47) subscales.

Discussion

Following translation the present study evaluated the psychometric properties of the Icelandic version of the HeartQoL taking the ordinal nature of data into account. The two-factor structure was confirmed overall although, by using the CFA models, some problems with cross-loading and correlated error variances were identified. In addition, the instrument demonstrated good construct validity and reliability in the present sample of patients with CHD.

Overall, missing data were small except for HeartQoL item 13 with 11% missing values suggesting that the meaning of exercise in the Icelandic language may focus on elite sport or intense training and this should be considered in future revisions and evaluations of the Icelandic version of the HeartQoL. In case of missing data for this item, users of the Icelandic version of the HeartQoL may calculate the mean score without this item.

The two-factor model generally demonstrated an acceptable model fit although cross-loading for items 8 and 9 and correlated errors between items 3 and 4 were observed. The problem with cross-loadings for items 8 and 9 has been reported in previous HeartQoL studies [14, 32] and may be because they are not as clear indicators for physical and emotional attributes as the other indicators in the respective subscale. Problems with correlated errors have not been widely reported although they have been reported in the Persian HeartQoL [10, 13] which, among other reasons, can arise from overlapping items measuring the same attribute and multidimensionality [25]. The problem with correlated errors between items 3 and 4 may be because both items measure physical efforts but at different levels. The hypothesized dimensionality of the HeartQoL has been evaluated using Mokken scale analysis in patients with CHD; for example, the original HeartQoL validation study [9] and the evaluations of both the German [33] and Italian [34] version of the HeartQoL concluded that the global scale and the two subscales are unidimensional based solely on the Loevinger H coefficient. Thus, the assumptions of unidimensionality need to be further explored in future evaluation studies of the HeartQoL.

Convergent and divergent validity of the HeartQoL were supported in the correlation analyses including the MTMM analysis and the correlations with the eight health dimensions of the SF-12v2 and the HADS subscales. Although the HeartQoL subscales correlated significantly with all other scales in the present study, no correlation was above 0.70. This is not an unexpected finding, since HeartQoL is a disease-specific instrument while SF-12v2 and HADS are generic instruments with the measures expected to overlap, but not strongly. However, these findings underpin the importance of using both generic and disease-specific instruments to measure HRQoL in patients with CHD. That the physical and emotional subscale in the HeartQoL correlated equally strongly with HADS depression as well as with the SF and RE in SF-12v2 was unexpected. This finding is somewhat hard to explain as the differences, as expected, were more pronounced for the EH in SF-12v2 and HADS anxiety.

Both HeartQoL subscales demonstrated high internal consistency reliability measured by both ordinal alpha and Cronbach’s alpha. It should be noted that the ordinal alpha was higher than Cronbach’s alpha illustrating the importance of taking the ordinal nature of the data into account [35]. The internal consistency measured by Cronbach’s alpha was consistent with levels that have been reported in a study of 22 European languages [14], in English [10], German [33] and Italian [34]. Despite satisfactory high alpha values on the raw scores, these reliability coefficients should be interpreted with caution in models with a bad fit [25]. Since ordinal and Cronbach’s alpha are based on raw scores, that is, the models without cross-loadings and correlated error variances, these results have probably overestimated the reliability of the HeartQoL scales. On the contrary, the composite reliability corrects the alpha’s overestimation bias in multidimensional data and correlations between error variances [36] which may explain why the composite reliability coefficients differed from the ordinal and Cronbach’s alpha in the modified two-factor model.

Test-retest reliability was good for the HeartQoL subscales. However, the lower limit of the 95% CI for the emotional subscale was below 0.75, indicating that moderate rather than good test-retest reliability could not be excluded. Compared to a study by Lee et al. [37] we observed a higher test-retest reliability in the physical subscale with similar levels in the emotional subscale. However, Lee et al. [37] did not report the CI; therefore, no strong conclusions can be drawn based on these differences.

The translation of the HeartQoL followed recommended procedures for patient-reported outcome measures [38] using the same standard forward and backward translations and pilot testing that were previously used in translating the HeartQoL from English into other languages [11]. We therefore argue that the Icelandic version linguistically is equivalent to the original English version of the questionnaire. However, as problems with missing data were detected for item 13, further exploration of this item, for example with cognitive interviews, is recommended. In addition, to make valid and meaningful comparisons between different countries and cultures, evaluation of measurement invariance across different language versions are needed. Until more evidence is obtained about the findings from the present study, the Icelandic version of the HeartQoL should be used as recommended.

Study strengths and limitations

The HeartQoL questionnaire is a heart disease-specific HRQoL instrument for patients with CHD and has been validated internationally in 30 languages in 11 peer-reviewed publications [10, 12,13,14, 33, 34].

Strengths of this study include the population of patients with angina, acute myocardial infarction, acute myocardial ischemia, elective percutaneous coronary intervention and/or coronary artery bypass graft. The rigorous translation process and the use of appropriate statistical methods, such as CFA models for ordinal data, are also strengths of this study. This is important since parametric methods tend to underestimate the association between indicator variables, increase the risk for identifying pseudo factors, and create incorrect test statistics and standard errors [25].

Limitations of our study include the fact that, as this is a sub-study from a larger empirical study, no priori sample size calculation was conducted. Further, because of the restricted sample size with each diagnosis, we analyzed the group of patients with CHD as a single cohort. However, the sample size of 396 patients was considered large enough with 10–20 observations for each free parameter in the model [39] and prior simulation studies have shown that the WLSMV performs well in small samples [25]. However, these limitations mean that the measurement properties within specific cardiac diagnoses need to be addressed in future validation studies of the Icelandic HeartQoL. Finally, the proportion of women who completed the test-retest assessment was smaller than for those who did not, and the opposite situation was found for the males. On the other hand, no significant differences were found in age, cohabitation, education, and admission diagnosis. We can only speculate why women were underrepresented in the test-retest assessment, but a reasonable explanation may be that they were significantly older compared with the men (M = 66.6 vs. 63.8, t(394) = 2.62, p = 0.009).

Conclusion

The Icelandic version of the HeartQoL has sound measurement properties in patients with CHD. Users of the Icelandic HeartQoL can use the original scoring but should be aware of problems with cross-loadings and correlated error variances which may increase measurement errors for the physical and emotional subscales.