Background

Quality of life (QOL) is of central importance in daily life, and most people wish to maximise or at least maintain it. The growing emphasis on the significance of QOL is apparent in public-health evaluation research in which the patient’s QOL is considered the endpoint [1]. In health-care economics, health-related QOL is reflected in quality-adjusted life years (QALYs) as utility in cost-utility analyses [2]. It is in evaluating complex therapeutic situations like those frequently encountered in medical rehabilitation that health-related quality of life (HRQOL) from the patient’s perspective has demonstrated particular advantages over “harder” biological endpoints (e.g.[3, 4]). Multidisciplinary rehabilitation with its biopsychosocial disease model does not just aim to change certain physical functions, rather it aims to improve or regain the patient’s activity and participatory abilities according to the International Classification of Functioning, Disability and Health (ICF) of the WHO [5]. It is thus not surprising that HRQOL has become a widely-acknowledged and valued parameter in medical rehabilitation evaluations [6].

Improving HRQOL also makes obvious sense in cardiac rehabilitation [7, 8]. However, to enable such therapeutic results to be accurately and reliably measured, we need psychometrically-tested means of collecting such data. Important to consider is that there are several types of instruments in HRQOL which differ in being either generic or disease-specific [1]. The disease-specific instruments address complaints that are characteristic to certain diseases, whereas generic instruments enquire about aspects beyond the specific illness. The decision as to which version to employ depends on the nature of the procedure in question regarding that particular query. In other words, generic questionnaires permit more than just comparison among various diseases – they enable effects to be identified on fields not directly related to the given illness. Disease-specific instruments are however considered more sensitive, and thus better suited to measuring changes in disease-related aspects [911].

While there are several generic instruments of proven quality in the German rehabilitation system (i.e. IRES-3 [12] and the German SF-36 version [13]), there is a lack of German-language, disease-specific instruments applicable in cardiac rehabilitation. The MacNew Heart Disease Health-related Quality of Life questionnaire (MacNew) [14] is just such an instrument. The English-language MacNew [15, 16] evolved from the interview version of the Quality of Life after Myocardial Infarction instrument [17, 18]. The reliability of the English version shows Cronbach’s α values between 0.93 and 0.95 and is thus considered very good [19]. Its validity and sensitivity to change are generally satisfactory. There are indications that the original three-factor model does not provide optimal fit, which is why Dempster et al. [20] suggest using a five-factor model. There is reference data describing clinical significance for the English version [21] – the authors designate a mean difference of 0.5 points as a “minimal clinically-important difference”.

The MacNew is now available in a German-language version and was used within various projects (e.g.[2224]). However, to our knowledge, psychometric reanalyses were done only within the Austrian and Swiss health-care systems in small patient cohorts (68 < N < 200) [14, 2527]. To use this MacNew questionnaire within the German health-care system, we need to investigate its psychometric properties in the German population. Moreover, this psychometric examination should involve a large patient cohort so as to yield results both convincing and representative of rehabilitating patients in the German system. It was the aim of this analysis to determine the psychometric properties of the MacNew questionnaire in a large cohort of patients in rehabilitation by testing acceptance, ceiling and floor effects, reliability (internal consistency), factor structure, construct validity, and sensitivity to change.

Methods

Patients

The questionnaires were given only to patients able and willing to fill them out and who had provided informed consent. 8654 patients from 37 cardiac rehabilitation clinics throughout Germany were asked to participate, of whom 5692 agreed. The percentage of patients that did not fill out the questionnaire (decliners) was 34.2%. The most important reason for exclusion was refusal to participate (N = 745), followed by cognitive or physical limitations (N = 575), language problems (N = 263), transfer to another institution (N = 59), and discontinuing rehabilitation (N = 29). 149 patients gave other reasons for not participating, and 1374 patients gave no reason for refusing enrolment (patients could name more than one reason). Thus we had data at hand for our final analysis from 5692 patients in rehabilitation who participated in the quality assurance programme of the statutory health insurance funds in medical rehabilitation between autumn 2004 and spring 2007 (QS-Reha®-Verfahren [6, 28]). The study has been approved by the ethics committee of the University of Freiburg (approval number 265/2000). Patients were queried at the start (N = 5692, 100%) and at discharge from the rehabilitation centre (N = 5169; 90.8%), as well as six months after having been discharged (N = 3663; 64.4%).

Table 1 provides information on the study patients. N = 523 patients left the rehabilitation centre and thus the study between the start and end of rehabilitation: N = 97 patients refused to continue participating, N = 24 quit rehabilitation, N = 67 patients were transferred, and N = 9 patients stopped for other reasons. A total of N = 339 patients in rehabilitation gave no reason whatsoever for dropping out of the study. Here, too, they could have provided more than one reason.

Table 1 Cohort description

Instruments

Höfer et al. published the German version of the MacNew in 2003 [14]. This questionnaire contains 27 items summarised as emotional scale (12 items), social scale (11 items) and physical scale (5 Items) with item 6 being assigned both to the emotional and the physical scale. All the items taken together make up a global scale. The English version [16] contains 26 items, as the item "sexual intercourse" has been omitted from scoring due to high numbers of missing values. However, it has mostly been included in the questionnaire as it could provide important information for therapeutic decisions. The allocation of the items to scales differs in the English version as well, although a factor analysis revealed their loading pattern to be similar to that in the German version [27]. In the English version, the emotional scale has 14 items, the physical 10 items, and the social scale 3 items with item 26 being allocated to the physical scale as well as to the social scale. The MacNew poses questions on complaints becoming apparent during the previous two weeks that are associated with heart disease. Patients complete the self-administered MacNew using a seven point Likert scale (1 = minor to 7 = severe). The range of values of the resulting scales also ranges from one to seven.

We also employed the IRES-3 [29], a questionnaire used to document generic QOL whose 144 items are accommodated in 27 scales forming eight dimensions. The eight dimensions of the IRES are: “physical health“, “pain“, “ability to function in daily life“, “ability to function at work“, “emotional well-being”, “social integration“, “health behaviour“ and “dealing with the disease“.

Analyses

Practicability and distribution characteristics

We investigated the acceptance of the MacNew by referring to the percentage of missing values per item and scale. This was followed by testing ceiling and floor effects per item and scale level. We noted a ceiling or floor effect whenever over 50% of the answers fell into the highest or lowest of the seven categories, respectively.

Reliability

To test internal consistency, we calculated Cronbach´s α [30] in each of the MacNew scales. We then examined the corrected item-scale correlation. An item was defined as selective when it correlated sufficiently with the total score of the scale: coefficients from 0.30 were classified as “moderate“, those from 0.50 as “good“.

Factor structure

To test the unidimensionality of the MacNew scales we carried out a confirmatory factor analysis. This involved first imputing the missing data using the Expectation-Maximation-Algorithm in NORM software [31] and then testing the validity of the three-factor model of the MacNew [27]. We relied on the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI) and root mean square error of approximation (RMSEA) as indications of model quality. CFI and TLI values >0.90 indicate a good fit, RMSEA values <0.10 suggest a moderate fit; values <0.05 are a good fit [32]. In addition to testing the original German model of the MacNew [27], we tested each scale on its own. To cross-validate, we halved the sample at random. We developed a new model from the first partial sample after eliminating certain items (1. more than 10% missing values, 2. ceiling or floor effect) by using explorative factor analysis (principal component analysis with VARIMAX rotation). In so doing, an item in a scale was assigned when its assignation was understandable from its content and it loaded on one factor (> = 0.50 on one factor, <0.40 on all the others). The resulting factor model was then cross-validated in the other half of the sample via confirmatory factor analysis. Measurement errors were not allowed to intercorrelate.

Construct validity

Construct validity was tested using IRES-3 dimensions [12] and the Pearson correlation. We hypothesised that the MacNew emotional scale would correlate significantly and closely (r > 0.50, cf. [33]) with the IRES “emotional well-being” dimension, that the correlation with the other IRES dimensions would be at most moderately close (between 0.30 and 0.50, cf. [33], and that the MacNew’s social scale would correlate significantly and closely with the IRES “social integration“ dimension, the physical scale of the MacNew with the IRES dimension “physical health”, and the MacNew’s global scale with the IRES total “Rehab-Status“ score. To assess whether the correlation differences are significantly different we calculated Steiger’s test.

Sensitivity to change

It is imperative when measuring outcomes that the instrument used be capable of demonstrating change during rehabilitation. To be able to make such demands on the MacNew, we calculated the standardized effect sizes [34] between the start and discharge of rehabilitation, and between the start of rehabilitation and six months after discharge. The standardized effect size is the quotient from the difference between the means of two values (i.e., start and end of rehabilitation) and the standard deviation of the value at baseline (rehabilitation start). The strength of these effects can be taken to indicate the sensitivity to change of the questionnaire. As in Cohen [33], effect sizes of 0.20 were considered “small“, around 0.50 “medium”, and >0.80 were deemed “large”.

All our statistical analyses were carried out using PASW Statistics 19 software, while the AMOS 19 was used to do the confirmatory factor analysis.

Results

Practicability and distribution characteristics

Our practicability and distribution results are illustrated in Tables 2 and 3. The portion of missing values was 9.5% in one item (item 17 "limitations in sports because of a heart problem") and 26.2% in another (item 17 "limitations in sexual intercourse"). All the other items revealed a portion of missing values between 1.9 and 6.3%. None of the scales in the MacNew showed more than 12.1% missing values, and all the answer categories apply to all items. We observed no signs of floor or ceiling effects in any of the items or scales in the MacNew.

Table 2 Item properties of the MacNew questionnaire (N = 5692)
Table 3 Scale properties of the MacNew questionnaire

Reliability

We obtained internal consistency values of α = 0.93 in the emotional, α = 0.89 in the social, and α = 0.78 in the physical scale (Table 3). The sensitivity coefficients were over 0.50 for all items except items 22, 23 and 27 (Table 2), which belong to the social scale.

Factor structure

The fit indices of the confirmatory factor analysis indicate a moderate model fit for the original German three-factor model [27] (Table 4).

Table 4 Model fit of the various factor models

When regarding the scales of this model individually, one notes that the fit indices of the emotional and physical scales suggest a good model fit, while the model fit of the social scale is poor. Developing a new model we removed item 27 “sexual intercourse”, as it had shown 26.2% missing values. No item had to be removed because of ceiling or floor effects. In the next step we omitted items 11 “more dependent”, 15 “lack self-confidence”, 17 “sports/exercise limited”, 20 “restricted or limited”, 24 “excluded” and 25 “unable to socialise”, as they had failed to load convincingly on any one of the factors.

Table 5 illustrates our final model. The fit indices of this model demonstrate good model fit in the cross-validation (Table 4). The scale characteristics of our modified model are found in Table 6.

Table 5 Factor loadings of the MacNew items for our modified factor model
Table 6 Scale properties of the MacNew questionnaire for our modified factor structure

Construct validity

Testing construct validity, we observed that the MacNew emotional scale correlated at r = 0.73 with the IRES-3 “emotional well-being” dimension (Table 7). The correlation value between the MacNew social scale and IRES “social integration“ dimension was r = 0.16, and r = 0.68 between the MacNew physical scale and IRES dimension “physical health“. The MacNew’s global scale and the IRES total “Rehab-Status“ score correlated at r = 0.73. Thus we observed correlation patterns that conform to our hypotheses in all but the social scale. All the relevant correlation differences were significantly different from each other according to Steiger’s test.

Table 7 Pearson correlation of the MacNew scales with the IRES-3 dimensions

Sensitivity to change

The standardized effect size of each of the scales between the start and end of rehabilitation lay between 0.66 (emotional and physical scale) and 0.69 (social scale). The effect size of the global scale between these two timepoints was 0.74 (Table 8). The effect size between the start of rehabilitation and at 6-month follow-up ranged from 0.62 (physical scale) and 0.92 (social scale). The global scale’s effect size here was 0.86 (Table 9).

Table 8 Effect sizes from rehabilitation-start to rehabilitation-end
Table 9 Effect sizes from rehabilitation-start to 6-month follow-up

Discussion

Most of the MacNew’s missing values vary in terms of their agreement with previous reports by between 1.9 and 6.3% [16]. We observed nearly 10% missing values in conjunction with item 17 “sports/exercise limited”. This relatively high value may have to do with the fact that most patients are in an acute-care facility at the start of rehabilitation when they first fill out the questionnaire, and thus cannot engage in sports or take exercise. This supposition is supported by the observation that the missing values at discharge from rehabilitation amounted to just 3.8%. Nearly a third of all patients did not answer item 27 “sexual intercourse”. This is probably because this is a taboo subject many patients are reluctant to address. Another reason for the high rate of missing values here could be the patient’s disease course, and the infeasibility of having sex with a partner while in hospital, a pattern reflected in our data. The percentage of this item’s missing values at discharge from rehabilitation was still 31.7%, sinking dramatically to 19.6% at the 6-month follow-up and thus lower than at the start of rehabilitation, even though it remains relatively high compared to the other items. Yet it may also be an artefact, since we can assume that only those patients particularly motivated to take part in the study answered the 6-month follow-up, thereby providing fewer missing values at that point in time. However, this possibility is not confirmed by our data, as the percentage of missing values at that point in time is nearly as high as at the start of rehabilitation. The number of missing values in the “sports/exercise” item falls within an acceptable range, while the “sexual intercourse” value is much too high. Perhaps these items should be removed from the questionnaire to reduce the total number of missing values. Yet one should make such a step dependent on the purpose for which the MacNew is being administered: if it is being employed to evaluate a therapy, one should minimise the production of missing values as far as possible. If however the aim is to identify areas requiring intervention, such items can provide helpful hints, and even if patients fail to answer them, we can still query them about important aspects of daily life. The percentage of missing values on the scale level falls within an acceptable range, and neither floor nor ceiling effects were evident on the scale or item levels. Thus one can consider the MacNew to be an acceptable and comprehensible questionnaire.

Cronbach’s α value fell within an acceptable range in the MacNew factor model proposed by Höfer et al [27]. Moreover, the corrected item-total correlation coefficients showed good selectivity in nearly all items – being in the moderate range in only three items in the social scale. This reinforces the reliability of MacNew also. However, to convincingly evaluate reliability, the unidimensionality of the scales must be verified [35], which we did in a confirmatory factor analysis. As the fit indices reveal an only moderate model fit, the unidimensionality is questionable, thus the MacNew’s reliability cannot be definitively assessed. One cause of this suboptimal fit of the three-factor model could be the contextual heterogeneity of the social scale, which first suggests its corrected item-total correlation coefficients, and secondly, that items are being captured here that address physical limitations (i.e. items 17 and 26) together with items whose content is more social in nature (i.e. items 23 und 25). Our investigation of each scale via confirmatory factor analysis has strengthened our suspicion that the cause of moderate model fit can be found in the social scale. While the fit indices show a good model fit in the emotional and the physical scales, those of the social scale reveal an unsatisfactory model fit. These model fit issues have been addressed by several other investigators: Höfer et al. [19] described various item allocations as depending on the languages. If one considers just the English version, there are two different models. Thus Dempster et al. developed a five-factor model from what was originally a poorly-fitting three-factor model [16]. They accommodate 25 items of the MacNew in five scales: emotional, restrictions, physical symptoms, perception of others, and social functioning. Yet even their five-factor model [20] was not entirely verified in our confirmatory factor analysis (results not reported here). In fact, it performed worse in our analysis than the three-factor model [27]. There also seem to be problems with the factor model in the German version of the MacNew. Höfer et al. [2527] report three different loading patterns in three different analyses, wherein the items do not load unequivocally on the scales. In their analysis in 2005, the item “worn out” loads on the social scale at 0.59 and at 0.61 in the emotional scale. In the original German version, this item is assigned to the emotional scale.

All in all, there seems to be a lack of consensus regarding the MacNew’s factorial quality. We therefore attempted to develop a new factor model for the MacNew using the aforementioned procedures, and arrived at a four-factor solution containing 20 items incorporated in the emotional, participation, perception of others, and symptoms scales. In naming the scales, we follow the example set by Dempster et al. [20] in every instance except for the social scale, which we have named “participation scale” as does the ICF; in our opinion this designation describes the content of those items more accurately. The fit indices of our model offer good model fit, or at least a better fit than is possible in the original German version. Cronbach’s α is somewhat smaller in our model than that in the original German version. Since our new model reveals a good model fit, we can assume at least acceptable accommodation. A limitation of our new model is that the remaining psychometric properties have not yet been assessed. Moreover, whoever uses our model must also question whether an instrument of such brevity is adequate for the job at hand. The fact that the perception of others scale contains only two items is admittedly problematic. Yet the advantage is that our model is more economical. In short, we believe that all of the factor models reported thus far are less than optimal – improvement is especially needed in the social scale. Here, it would be worthwhile to integrate new items reflecting participation more globally as in the ICF, or which reveal limitations in social activities as in the IRES questionnaire. Both would require the production of new items, however. Until this has been exercised, we can only recommend use of the physical and emotional scales. It is not possible to interpret the social scale unequivocally, as long as there is such uncertainty as to what it actually captures. The global scale could be used anyway as the items of the MacNew reflect the broad understanding of health as defined by the WHO.

We can consider the construct validity as verified except for that of the social scale. The correlation patterns of the MacNew scales with those of the IRES dimensions conform to our hypotheses. The poor correlation between the MacNew social scale and “social integration“ IRES dimension may be caused by the fact that the two scales capture different contextual aspects. In the MacNew, the social scale addresses largely corporal aspects, as in item 17 sports/exercise limited” or item 26 physically restricted”, while the IRES dimension exclusively addresses the patient’s social support. A further sign of this is the close correlation between the MacNew scale and the IRES dimension “physical health”. The poor performance of the social scale in terms of construct validity serves to highlight its need for revision. Yet the overall construct validity of the MacNew can be considered proven. It is not possible to compare our results with others in the English- and German-speaking literature, as they used the SF-36 to test construct validity (e.g. [36]). However, even correlations with the SF-36 reveal hypothesis-conforming correlations with the scales in the MacNew (e.g. [20] for the English version). They observed no signs of inadequate construct validity in the social scale.

MacNew is also capable of capturing short and mid-term changes; moderately-sized effects are apparent between the start of rehabilitation and its conclusion, or at the 6-month follow-up. As we used a more conservative effect size measure, this is a hint for the ability of the MacNew to assess changes. Overall, our results regarding the psychometric properties resemble those from other investigators testing the quality of both the English and German versions of the MacNew (see [19]).

A limitation of our study is that our results cannot be extrapolated entirely, as made obvious by our study cohort’s recruitment: 34% of those qualifying for study participation could not be enrolled for various reasons. This corresponds roughly with usual drop-out rates in similar investigations. Thus we cannot claim that our study patients are entirely representative. Moreover, our patients were all insured by the statutory health insurance, and they were all inpatients. Nevertheless, our study cohort does not differ in age or gender substantially from cohorts in other investigations examining psychometric properties of the MacNew (e.g. [14, 26]). What is positive is that our patients came from throughout Germany and from different clinics, which in turn supports the claim of their being sufficiently representative. We are unable to make any claims as to the test-retest reliability of the MacNew within the framework of this study design. Höfer et al. [27], however, report good test-retest reliability with a test-retest correlation of over 0.80.

Conclusions

All in all, we can consider the German version of the MacNew Heart Disease Health-related Quality of Life questionnaire to be a suitable instrument with which to document the impairment experienced by individuals with heart disease during inpatient cardiac rehabilitation. However, the factor structure of the social and global scales remains somewhat problematic. Therefore the social and the global scale must be interpreted cautiously.