Background

The PGWBI questionnaire is a validated Health Related Quality of Life (HRQoL) measure, widely used in clinical trials and epidemiological research to provide a general evaluation of self-perceived psychological health and well-being [110]. In the late sixties Harold Dupuy, psychologist at the National Center for Health Statistics, developed his Psychological General Well Being Schedule, a questionnaire of 68 items to measure the degree of 'happiness' of the American population or the potential psychological distress. The questionnaire was considered one of the first generic measures of health-related quality of life with specific interest to mental health.

Some years after Dupuy together with John E. Ware revised the questionnaire and a final version of 22 selected items was validated under the name of PGWB Index (PGWBI). Extensive reference data of this version generated in the US general and patient population were published and fully described in 1984 in "Assessment of Quality of Life in Clinical Trials of Cardiovascular Therapies" [11].

About ten years after the index was also introduced in Europe. The PGWBI was adapted in many languages and cross-culturally validated for the use in several countries under the coordination of the MAPI Research Institute. As a result different language versions of the PGWBI are available for use on the MAPI website [12].

In Italy various research activities concerning the field of Outcome Research were started in the framework of the MiOS project, a multidisciplinary initiative to study in depth different kinds of subjective outcome measures for the health assessment. In 2000 as part of the MiOS project, the PGWBI was validated in a representative sample of 1129 Italian citizens above 15 years of age. The results of this study were published of the Italian user manual and were recognized as reference data for the self-perceived health in the Italian general population [1315]. During the same period the MiOS group validated also the Italian version of the Short Form-12 (SF-12) derived from the original longer version of the SF-36 [16, 17]. The successful validation of the SF-12 set the ground for the development of an abbreviated and more user-friendly version of the original PGWBI. The reason for the reduction of items was to achieve a higher acceptability of the questionnaire in the population, aiming for shorter times of administration, better response rates and lower rates of missing data.

The main objective of this study was to reduce the number of items of the original 22-item PGWBI while keeping adequate validity and reliability of the questionnaire. The step-wise approach to identify the best and most relevant set of items and the results of the application of the new PGWB-Short (PGWB-S) in various settings, including samples of general and specific patient populations, are described in this paper.

Methods

Development and validation strategy of the PGWB-S

The original PGWBI consists of 22 self-administered items, rated on a 6-point scale, which assess psychological and general well-being of respondents in six HRQoL domains: anxiety, depressed mood, positive well-being, self-control, general health and vitality [see Additional file 1]. Each domain is defined by a minimum of 3 or a maximum of 5 items. The scores for all domains can be summarized to provide a summary score, which reaches a maximum of 110 points, representing the best achievable "well being".

Item reduction for the development of the short version of the PGWBI was started from the reference data set achieved during the year 2000 when the original (long) questionnaire was administered for the first time in Italy to a representative sample of the general population (development sample). The survey was carried out by DOXA, the Italian branch of the Gallup International association. Methods and results are available elsewhere [1315].

Based on these data, the twenty-two items of the questionnaire were analyzed in a linear multiple regression model with the objective to find the best combination of items to be most relevant for the determination of the summary score. For comparability purpose with the longer version, a score transformation was applied to convert the lowest and highest possible scores to 0 (worst possible level of well-being) and 110 (maximum level of well being), respectively.

The new shorter questionnaire was then administered in three different settings in Italy for the purpose of its further validation. All studies took place during the year 2004 and their characteristics are summarized in Table 1.

Table 1 Characteristics of studied samples

Development and validation samples

Study 1 (general population)

In 2004 a cross-sectional survey was carried out again by DOXA to norm the new short version of the questionnaire in a representative sample (n = 1015) of community-dwelling Italians. The approach used was similar to the one implemented in the previous DOXA survey [1315]. A multi-step random sampling method was adopted to draw a large representative sample from the Italian population. The universe, to which the National survey referred, were 49.2 million Italians of all regions aged 15 years or more, stratified according to region and size of the place of residence. The sampling units were chosen in the following way: in the first stage, the choice regarded the municipalities where the interviews were to be conducted, in the second stage in each municipality an adequate number of electoral wards were extracted at random so that various types of urban areas were represented (e.g., central, suburban, outskirts and isolated houses), finally, names and addresses of the persons to be contacted were extracted at random from the electoral lists of the areas selected in the second stage. Mean scores for all items and the global summary measures were calculated according to the established algorithm and weighted by gender, age and size of the municipality in the percentages as established in the universe which the study referred to.

Study 2 (student population)

The purpose of this class room experiment was to determine the self-perceived psychological and general well-being of a random sample of students in the second year of Psychology (n = 246) at the Catholic University of Milan. Additional 154 students in the second year of Faculty of Motor Sciences were included into the sample. For the purpose of comparison the original PGWBI and the new PGWB-S were self-administered by all students one hour apart. The order of questionnaires (long and short form) was randomly allocated and summary scores of both questionnaires were then compared.

Study 3 (patient population)

The study was performed in the hospital ward of Gastroenterology and Rheumatology in the Sacco Hospital in Milan. Twenty eight patients suffering from chronic inflammatory bowel disease were enrolled into the study. Both questionnaires were self-administered in the context of a planned medical visit and items and summary measures were calculated.

Analysis

Because the goal was to identify the best set of items that might reproduce the summary score of the longer version, we first selected the items in the development sample (DOXA 2000) using a multiple step-wise regression procedure: the goal was to select the minimum number of items that might explain at least 90% of the variance of the original longer (22 item) questionnaire. According to the previous experience in the context of the development and cross-cultural validation of the SF-12 [1719], items were identified by a step-wise selection starting with the item that would give alone the highest degree of variance of the original SF-36, adding items until their combination would explain at least 90% of the variance. In the model the items were matched to find out which of their combination would best reproduce the mean value of the summary score. The most predictive items were selected to be part of the new structure of the questionnaire PGWB-S and then aggregated in a new summary score. Following, the performance of the new shorter questionnaire was assessed in an additional DOXA sample and in two other independent settings.

As emphasised in the literature [18, 2022], great care was taken to ensure and document the basic characteristics of the questionnaire in terms of acceptability, internal consistency (Cronbach's alpha coefficient), known-group validity and stability of results across samples and sub-groups. Descriptive statistics, correlation coefficients, univariate and multivariate regression analyses were used to evaluate the performance of the long and short questionnaire, in each sample and across relevant subgroups.

Results

During the step-wise selection process six items were identified to predict 90% variance of the summary score when the original long questionnaire was applied to a random sample of the Italian population. Item 20 alone reached 60%, whereas Items 7, 21, 5, 6, 18 and 2 added an additional 15%, 8%, 3%, 3% and 2%, respectively (Table 2 and Figure 1). These items were confirmed to become part of the new 6-item structure of the questionnaire.

Table 2 Selection of items in the PGWB-S
Figure 1
figure 1

PGWBI short: items selection.

In order to evaluate the performance of the new instrument, one thousand four hundred and forty three subjects were evaluated in the different settings. Socio-demographic characteristics varied accordingly to the case-mix evaluated: the mean age ranged from 21.5 years in the study involving University students to 51.3 years for the sample representing the general Italian population. The gender distribution ranged from 11 % to 50 % of males among the studies (Table 1).

The step-wise selection process previously applied to Study 1 confirmed the relevance of the six items identified in the development sample of the previous DOXA Study. Six items predicted 88% variance of the summary score when the long questionnaire was applied to Study 2 sample. Item 20 alone reached 55%, whereas Items 21, 7, 6, 5, 18 and added an additional 11%, 9%, 6%, 4% and 3%, respectively.

As to the acceptability indicators, response rates of satisfactory 100% were reached in all studies (Table 1). No missing and out-of-range data at item level were registered for any of the samples.

Descriptive statistics of raw item scores are presented in Table 3 according to the different studies. As expected, the sample of the University students reported the highest mean value of the summary measure (75.3; range 44–106), while the lowest was reported by the sample of hospital patients with chronic inflammatory bowel disease (71.5; range 63.7–79.4). In the sample of the University students where both versions of the questionnaires were self-administered in a cross- over design, the mean value of the summary score of the long form (74.8; range 41–105) confirmed the results of the short form. The subjective mental health perception was measured without specific relation to the perceived physical health, therefore overall values did not show differences between groups and were relatively stable across studies.

Table 3 Mean values of PGWB-S Items and Summary scores according to studies

The patterns of correlation between the summary scores and the single items according to the different studies are presented in Table 4. Subjects in the general population had the lowest correlation between the single item and the summary score, whereas the highest correlation with the summary score was observed for the items of the University students. The lowest correlation estimates in all studies was observed consistently for 'item 18' regarding the question on self-control (range 0.52 – 0.72), whereas the highest correlation (0.89) was observed in study 2 for 'item 05' regarding anxiety.

Table 4 Correlations of Item and Summary scores according to studies1

The internal consistency measuring the extent to which the items are interrelated were expressed by the coefficients Cronbach's Alpha calculated for each study. Table 5 shows the coefficients Cronbach's Alpha obtained for the PGWBI and the PGWB-S in the individual study settings. The smallest value was 0.80 and the highest 0.92, indicating that the summary score showed good internal reliability. The coefficients Cronbach's Alpha were all above 0.80 showing acceptable reliability, also when compared to the one (0.94 in DOXA Study and 0.96 in Study 2) of the original instrument in full length (22 items). Finally, Table 6 reports the sex-adjusted summary scores for the age groups in study 1 when compared to the study in which the original PGWBI was administered. Mean values of the summary scores decreased with age ranging from 85.4 in the young to 71.7 in the elderly and 81.8 to 63.9 in the individual studies. The impact of ageing on the self-perceived mental health can be observed. In the past, a clear age trend has been documented for physical health measures, while the mental health measures have shown to be less sensitive to the age effect [13, 16, 17]. The raw summary scores given for study 2 and 3 although referring to restricted age groups and small sample sizes fitted into the overall age trend observed in the two large field trials in the Italian population.

Table 5 Cronbach's Alpha Coefficients of summary scores according to studies
Table 6 Mean values of summary scores according to age and study1

Discussion

The extensive international experience with the original PGWBI and the many data generated in recent years [2330] were the basis on which it has been possible to perform a meaningful item reduction resulting in the development of a new shorter instrument in Italy. The aim of this study was to identify the lowest number of items, which would be sufficient to maintain the validity of the original questionnaire. We identified 6 items that reproduced at least 90% of the variance of the PGWB summary Index through multiple step-wise regression analysis.

Compared to the PGWBI, where the global summary score is generated by summing up 22 items pertaining the six subscales, the new PGWB-S is constructed on the basis of only six items representing five of the six original subscales. When tested in various samples of the Italian population the acceptability (response rate, missing data) and validity of the PGWB-S demonstrated a satisfactory performance of the PGWB-S across strata. The good compliance expressed by absence of missing data was probably favoured by the structured person-to-person interviews in study 1 or the self-administration in controlled settings in study 2 and 3. Nevertheless it is worth mentioning that the relatively short time necessary to answer the six questions of the questionnaire might have contributed positively to this result.

With respect to validity, the global summary scores varied across the different groups, reflecting the expected degree of variation related to the baseline characteristics of the participants.

The relatively high values of the Cronbach's alpha coefficients observed in the samples indicate a good reliability of the questionnaire when compared to the Gold Standard of the original PGWBI. In spite of the slightly lower precision of the 6-item questionnaire in comparison with the original, the PGWB-S came out to be a robust instrument, suitable as a generic measure of HRQoL and a good tool for population surveys, where it can be easily administered.

Our study has a few limitations that should be considered.

The first pertains to the method adopted to select the relevant items. Alternatives methodologies are indeed often used alone or in combination for this purpose, such as item-total correlations using Cronbach's alpha coefficients, and principal components factor analysis. Our choice was essentially based on the experience on the development of the SF-12 that has the advantage to be straightforward, easy to be replicated and comprehensive to be understood by lay people. On the other hands, we cannot exclude that other methods could yield different outputs. One might also argue that present results, in terms of item selection and performance of the new shorter index might be result of the specific characteristics of the development and validation samples. Waiting for further independent validation of our exercise, in order to add information about the performance of the new questionnaire, we further tested the robustness of our findings by performing a cross-validation in an independent sample of 755 cases, extracted from an on-going clinical trial where the original (long) PGWBI was used together with other patient-reported measures [31]. In this data set, we first replicated the step-wise item procedure to cross-validate the selection of the PGWB-S and then estimated how well the PGWB-S developed in the original DOXA sample would explain the variance of the longer 22 item questionnaire. As to the item selection, the first 6 items explained 92% of the variance, 3 were the same as in the DOXA sample, while the other 3 were different but pertaining the same scale of the items present in the DOXA sample. The 6 original items ranked, indeed, in the first top-ten. In addition, the original 6 items explained more than the 90% of the variance of the longer index from the 22 item questionnaire. Finally, when the amount of variance explained was estimated in each sex and age strata, the figures ranged from 90.2 to 95.1%.

It is important to keep in mind that at the current development status the generalizability of the findings are exclusively confined to the Italian setting, and results cannot be transferred to other cultural and linguistic settings. We cannot exclude that additional analyses of foreign data from other countries could ultimately lead to a different item selection. Nonetheless, these results can be considered as a first step in the validation process of the PGWB-S, and as a promising starting point for future research on this matter.

Declaration of the authors

The author(s) declare that they have no competing interests.

Study funding

The study has been partially supported by Bracco SpA with a grant for the conduction of cross sectional population surveys by Doxa