Background

The consideration of "quality of life" (QoL) in clinical studies and various attempts to make this construct measurable to determine therapeutic success is an ongoing process. This is particularly the case in those therapeutic attempts that focus on integrative aspects of disease management that in turn offer holistic care including a variety of therapeutic directions. Here, the QoL has become a central evaluation parameter. It simultaneously acts as an aid for decisions on the choice of treatment strategy for chronically ill patients [1], which is obviously a challenging therapeutic aim, and is at least as significant as somatic parameters [2]. QoL has therefore become a leading criteria in many outcome studies alongside somatic and economic factors. In the course of this development, the concept of QoL is explicitly listed as outcome parameter in many medical societies' guidelines [3].

However, there are a variety of opinions regarding the factors that contribute to QoL. According to a WHO-definition, QoL relates to the physical, psychological and social well-being of an individual as laid out by formal health terms [4]. According to this definition, it is necessary to differentiate between a general and a health related QoL [5]. The former relates to aspects that exist independently from any particular disease (e.g. items such as "being spontaneous", or "feeling exhausted"), whereas the later focuses on particular characteristics of specific diseases (e.g. factors such as "walking distance" or "pain" in rheumatic diseases)

Despite the methodological difficulties involved in making QoL measurable, we have seen the development of numerous instruments for measuring disease specific aspects of QoL [68] in the recent past. An advantage of disease specific instruments is precise registration regarding strains and limitations of specific diseases rather than those of diseases in general. In addition, the course of clinical diseases can be more easily registered because of the development of disease-related questionnaires ("course of disease sensitivity" of questionnaires). The majority of current recommendations by health economists and clinical pharmacological associations include suggestions regarding the use of disease specific and general QoL questionnaires [9].

In Germany, one of the newer instruments attempting to measure general QoL with a distinct focus is the "Herdecke Questionnaire for Quality of Life" (HLQ is the German acronym of the phrase "Herdecke Questionnaire for Quality of Life") [10, 11]. Clinical research projects have been reluctant to employ the HLQ although it was evaluated on a sample of healthy subjects, and that some reference values of clinical studies on different diseases do exist, and also despite of the fact that the HLQ has a very comprehensive understanding of the QoL problematic [12],. This is mainly because conclusive validation based on a large sample is still missing. To improve this situation, this study aimed to show the characteristics of the HLQ, to describe its external validation using other test instruments, and to develop a short form of the questionnaire.

Methods

Data for this study derive from a model project on the treatment of patients using naturopathy methods in Blankenstein Hospital, Hattingen. To investigate the benefits and limits of naturopathic treatment in the field of in-patient care, the Department of Naturopathy was established as a model at the Blankenstein Hospital in Hattingen and was scientifically evaluated by the Chair of Medical Theory and Complementary Medicine of Witten/Herdecke University. This evaluation began on July 1st 1999 and was completed on March 31th 2003. It focused on the following question: "How does a three-week in-patient treatment with naturopathic methods affect the QoL of the patients, regarding a pre-post-comparison and a follow-up carried out after 6 months? Detailed information concerning this model project and its' scientific evaluation can be found in [13] and [14].

In total, 2,461 patients between 16 and 92 years (mean age 58.0 ± 13.4 years) were included in this study. The socio-demographic characteristics of the patients are shown in Table 1.

Table 1 Socio-demographic data of the patient population

Alongside the HLQ, other standardized questionnaires were used. These included the MOS-SF-36 Health Survey [15], Zerssen's Mood-Scale Bf-S [16], the Giessener Physical Complaints Questionnaire GBB-24 [17] and McGill's Pain Perception Scale SES [18].

The HLQ as referred to in this study uses 39 five-point likert scales ranging from 0 to 4 (agreement/disagreement or often/never). In contrast to the SF-36, the items are not defined by situations related to daily life and household situations (shopping, career situations, physical activity). As a result, the HLQ is very suitable for registering QoL particularly in monitoring the course of a disease or therapeutic intervention [19]. As an evaluation scheme, Schulte et al. [10] described 5 scales of the 39 item HLQ, unfortunately without any confirmation by factor analysis of the following areas: Physical Well-being (4 items), Vitality (9 items), Mental behavior (10 items), Presence of Personality (9 items), Social Environment (7 items). All scales are expressed in percentage values from 0 = lowest to 100 = highest QoL.

The main question of this study relates to the re-examination of the HLQ by means of a factor and reliability analysis and the explorative evaluation of the factors. External validation was performed by correlating the HLQ scales with those of the external test instruments: MOS-SF-36 Health Survey [15], Zerssen's Mood-Scale Bf-S [16], the Giessener Physical Complaints Questionnaire GBB-24 [17] and McGill's Pain Perception Scale SES [18].

Factor analysis was performed using principal components analysis with Varimax rotation on 35 of the 39 items. The items, #13 (avoided conflicts), #14 (behavior of others was unclear to me), #15 (was glad) and #29 (reduced sexual activity) were omitted following the positive preliminary results on the reliability of the HLQ by Kroez et al. [20]. To determine the internal consistency of the questionnaire, reliability analysis was performed using Cronbach's alpha. Both factor analysis and reliability analysis were performed for the long and the short version of the HLQ.

For the short form, only relevant items with a factorial weight of >0.6 were selected. This method of selection was originally suggested by Grimley [21] and has successfully been applied elsewhere [22, 23].. Coefficients of determination (R-square) of short and long form scales were calculated to evaluate the proportion of variance of the original HLQ which can be explained by the short form.

Evaluation of responsiveness of the HLQ over a course of time was achieved by analyzing the change of HLQ-total score from the time of admission to the time of discharge by using a dependent t-test and calculation of Cohen's effect size (ES). Cohen's guidelines were used to classify the magnitude of effect sizes: 0.2 represents a small effect, 0.5 a moderate effect, and 0.8 a large effect.

The statistical data evaluation was performed using the SPSS Version 10.0 program packet.

Results

The descriptive statistics of each item, the reliability parameters and the difficulty index are given in Table 2. Considering the high percentage of patients with chronic rheumatic diseases, an item-difficulty index between 0.26 (Item: "I suffered from physical pain") and 0.73 (Items "Family life was a burden" and "I felt over directed") can be regarded as sufficient. This also holds for item-total correlations with values between 0.27 and 0.69 (median: 0.55) for the original HLQ and between 0.32 and 0.61 (median: 0.46) for the short version (HLQ-S), These correlations are considered to be optimal ranges for psychological test instruments.

Table 2 Descriptive statistics and reliability parameters of HLQ-Items

The results of the structural analysis of the HLQ-items yielded surprising results. The scales found by factor analysis (Table 2) were only partly congruent with the scalesin the original publication [10]. Instead, we found a new and stable 6-factor-model which fits better with the original data than the original 5-scale model derived by Schulte et al., which used a multitrait analysis approach (developed by Hays et al. [24]). This is underlined by a Kaiser-Meyer-Olkin measure of sampling adequacy of 0.957 and a highly significant Bartlett test of sphericity (p < 0.001). The cumulative variance explained by this model is 54.7%.

Correlation analysis (Table 3) of the earlier HLQ scale with the new scale revealed significant correlations between the scales "Social Environment (SOC)" and "Social Interaction (SOCI)" (r = 0.923 for the HLQ-L and r = 0.860 for the HLQ-S). Unfortunately, such clear correlation between an old HLQ scale with the unique factor of our current analysis was not found with the other scales. However, "physical well-being (PWB)" of the old HLQ correlated well with the new "motility (MOT)" scale (HLQ-L r = 0.929 resp. HLQ-S r = 0.958), while the old "vitality (VIT)" scale correlates with the "mental balance scale (MB)" (HLQ-L r = 0.840 resp. HLQ-S r = 0.681). The old scales "presence of personality (PERS)" and "mental balance (MEB)" are represented well by the new scale "initiative power and interest (IPI)" (See Table 3).

Table 3 Partial Correlation of HLQ-Scales with other instruments and with the HLQ-Scales (old adjusted for Gender and Age. Abbrev.: SF-36: PF-physical function, RP-role physical, BP-bodily pain, GH-general health, VT-vitality, SF-social function, RE-role emotional, MH-mental health, MCS-mental component summary, PCS-physical component summary; GBB: SE-severity of exhaustion, GS-gastric symptoms, LP-limb pain, and HS-heart symptoms;, Zerssen's Mood Scale Bf-S; SES: AFF-Affective Pain, SENS-SenSory Pain; HLQ-OLD: PWB-physical well-being, VIT-vitality, MEB-mental behaviour, PERS-presence of personality, SOC-Social Environment.

The internal consistency of the instruments (HLQ-L and HLQ-S), both for the total score (Cronbach's α is 0.935 for the HLQ-L and 0.862 for the HLQ-S) as well as for the subscales of the HLQ-L (Cronbach's α between 0.621 and 0.885) can be regarded as being excellent. The highest alpha reliability in the HLQ-L was obtained for the "Initiative Power and Interest" scale, the lowest for the 2-item scales "Digestive Well-Being" (0.621) and "Physical Complaints" (0.692).

The mean difference between the scales of the HLQ-S and the HLQ-L for all patients is between 1.20 ("Initiative Power and Interest ") and 2.24 points ("Social interaction") on a percentage scale. The absolute differences are clustered in groups and are given in Table 4. Although there is a low overall mean difference, absolute differences greater than 10 percent range between 17.9% ("Initiative Power and Interest") and 26.8% ("Social interaction"). However, with correlation coefficients ranging from 0.899 to 0.964, the proportion of variance of the HLQ-L can be explained by the short form ranges between 79% and 93% and thus can be regarded as an adequate proportion for a short version.

Table 4 Comparison of the HLQ-L and the HLQ-S.

The correlation of the HLQ with other test instruments is shown in Table 5. There are acceptable correlations with r> 0.5 between the mental-health associated scales from the HLQ with those of the other instruments, for example, the "mental health"-Scale of the SF-36. In detail, the scales "Initiative Power and Interest", "Social Interaction" and "Mental Balance" of the HLQ correlate well with "mental health" and the "mental component summary", "Social Functionand "Vitality" of the SF-36 and Zerssens Bf-S Mood-Scale. The "motility" scale of the HLQ correlates with "physical function" and "vitality" of the SF-36, with the "severity of exhaustion" of the Giessener Physical Complaints Questionnaire GBB 24, and somewhat weaker with the "role physical", "bodily pain" and "physical component summary" scales of SF-36 and "limp pain" of the GBB 24. The "physical complaints" subscale of the HLQ correlates well with "bodily pain" of the SF-36 and its "physical component summary" scale, and also with the "affection pain" subscale of McGill's Pain Perception Scales SES. Among the SF-36 scales, the factor "general health" is not represented by the HLQ scales. The factors, "gastric symptoms" and the "heart symptoms" from the GBB 24 scales and "sensory pain" from the SES are not represented by the HLQ.

Table 5 HLQ-scales (Mean ± SD)) of patients separated into diagnostic-, age-and gender specific groups.

According to the diagnostic spectrum (Table 5), the values of the scale "Motility (MOT)" and "Physical Complaints (PHY)" show particularly low values in patients suffering from rheumatic diseases. Also, in contrast with other scales of the HLQ, these two appear to have little correlation with age, which indicates a suitable discriminatory power of the HLQ considering age and different types of disease.

The results from the responsiveness analysis are presented in Table 6. We found a high sensitivity of the HLQ-scales to change within the treatment with particularly high significant changes in the mean and calculated effect sizes between 0.39 (Digestive Well Being) and 0.92 (Mental Balance).

Table 6 Responsiveness of HLQ-scales measured with Cohen's effect size.

Discussion

The aim of our study was to confirm the structure and consistency of the HLQ. Surprisingly, we found that the original scales presented earlier [10] were not in accordance with the results of this factor analysis. However, the scales "IPI-Initiative Power and Interest", "SOCI – Social Interaction", "MB – Mental Balance", "MOT – Motility", "PHY-Physical Complaints", "DWB – Digestive Well-Being" show a good reliability and sufficiently differentiate the diagnostic groups, especially between those patients suffering with connective tissue and soft tissue disorders from those with metabolic and nutritional disorders or hypersensitivity reactions.

Although the HLQ sub-scales "Initiative Power and Interest", "Social Interaction" and "Mental Balance" of the HLQ correlate well with the corresponding SF-36 scales and with Zerssens Bf-S Mood-Scale, and thus indicate that these qualities share several interconnections, our findings also showed that the HLQ provides several aspects of health such as "Appetite and Digestive Affections" which are not well covered by existing QoL-measures. Nevertheless, with only two items, the subscale "digestive well-being" has to be strengthened by additional items. This is also true for the scale related to physical complaints and pain. With correlation values of 0.11 (physical total scale of the SF-36) and 0.29 (sensory pain SES), it is quite obvious that this scale is deficient and needs an upgrade in respect to quality and number of items.

As, according to [25] internal consistency reliability is a poor predictor of responsiveness, we measured the responsiveness of the HLQ directly using Cohen's effect size. Together with the highly significant results of the t-test statistics and being aware of the methodological limitations which are immanent in obtaining results on a questionnaires responsiveness by means of effect sizes [26], we can nevertheless conclude that the HLQ shows sufficient responsiveness for the use in a clinical setting.

In our opinion, the HLQ is more sensitive to health changes brought about by Complementary Therapies including anthroposophic medicine or homeopathy. This does not mean that the HLQ is only suitable for such therapies. Although, there is a trend to consider QoL-questionnaires being specific for special complementary therapies such as mistletoe treatment in cancer patients [27], we do not favor such labels, as this might result in an inflation of "new" QoL-measures for each new therapeutic situation [28].

QoL is a multidimensional construct composed of functional, physical, emotional, social and spiritual well-being [29, 30] with, several interconnections between distinct constructs of well-being. The HLQ scales "Social Interaction", "Mental Balance", "Motility", and "Physical Complaints" share similarities with the these constructs, but highlights two further significant topics, i.e. "Initiative Power and Interest" and "Digestive Well-Being". The highly relevant topic of spirituality and illness is addressed in another instrument developed by our group, the SpREUK questionnaire, with its sub-scales "Search for meaningful support", "Positive interpretation of disease", "Trust in external guidance", "Support through spirituality/religiosity" [22, 31, 32].

Our evaluation indicates an adequate representation of aspects like "mental well-being" and "depression" which are essential in defining QoL, and shows special features of the HLQ that highlights its' uniqueness in the group of generic QoL-measures. Particularly in clinical studies in which, because of feasibility or patient compliance the use of huge psychometric test batteries is inappropriate, the HLQ now serves as a economic test-instrument. To conclude, we can state that this study presents necessary foundations and developments for existing and future studies that wish to use the HLQ as a reliable and valid instrument.