1 Introduction

The measurement of subjective well-being (SWB) has become an area of growing prominence for academics, healthcare professionals and policymakers (Krueger and Stone 2014), who agree that it is vital to be able to monitor well-being adequately in order to develop health policies and strategies (WHO 2011). Two general approaches to understanding SWB have been proposed (Deci and Ryan 2008). The first approach, based on hedonic theories, has generated possibly the most studied models of well-being (Gallagher et al. 2009). Diener, who has published extensively on SWB (Diener 1984; Diener 2009), defines it as the experience of high levels of pleasant emotions and moods, low levels of negative emotions and moods and high life satisfaction. The alternative approach is that of the eudaimonic tradition (Waterman 1993); here, SWB is considered to be the outcome of positive goal pursuits (Ryan et al. 2008). Ryff (1989) presented a model of eudaimonic well-being built on the assumption that individuals strive to function fully and realise their unique talents. Recent research has also focused on the conceptual structure of ‘being, belonging and becoming’ (Chaplin et al. 2017), which can be seen as related to the assessment of positive affect, life satisfaction (hedonic) and personal growth (eudaimonic).

SWB is characterised as a multi-domain concept (Dodge et al. 2012). A recent review of adult measures by Linton et al. (2016) found 99 instruments which included 196 different dimensions of well-being. These were clustered around six key themes: psychological well-being, social well-being, physical well-being, spiritual well-being, activities and functioning, and personal circumstances, and these themes were measured with between 1 and 317 items (Linton et al. 2016). From an examination of the dimensions, it was found that they included both strength-based issues, such as joy and social acceptance, and deficit-based issues, such as anxiety and pain. This observation is also reported in reviews of paediatric SWB (Pollard and Lee 2003). Strength-based or deficit-based approaches can be used as a method to conceptualise well-being measurement. Despite extensive literature and the development of a large number of measures (Tsang et al. 2012; Amerijckx and Humblet 2014) there is little consensus on the theoretical grounding and dimensions of well-being (Ryff 1989). This has led to SWB often being considered synonymous with the concept of health-related quality-of-life (HrQoL) (Taillefer et al. 2003). SWB can however, be characterised as a strength-based concept (Tsang et al. 2012) with the purpose of quantifying continued growth and development, and HrQoL as a deficit-based concept (Schipper et al. 1996) with the purpose of quantifying the degree to which a medical condition or its treatment impacts the individual’s life (Guyatt et al. 1993).

It is useful to think of SWB as a strength-based concept within health research, as this refers back to the measurement of health as originally conceived by the WHO (1948), which requires not just the absence of clinically defined symptoms but also the positive aspects of good health. Recent patient-centred initiatives (Epstein 1999) and a shift towards a focus on health promotion (Coverdale and Long 2015) have also encouraged the documentation of an individual’s strengths and resources as measures of well-being.

Another clear trend in well-being research has been a focus on the relationship of physical activity to well-being especially in young people (Calfas and Taylor 1994; Nieman 2002; Kanning and Schlicht 2010). Schools-based research has illustrated a strong association between well-being, physical activity, cognition and day-to-day mood (Huppert 2009), although this association is not always found (Konijnenberg and Fredriksen 2018). A combination of these approaches suggests a model of well-being that may be more appropriate for a youth population, bringing together emotional reactions, sense of achievement and physical health or activity.

The methods used to measure SWB have also varied over time. Linton and colleagues (Linton et al. 2016) found that the majority of instruments use the Likert-type scale. Visual analogue scales (VAS) tend to be used for global well-being measurement, as in the EQ5D and SF-36, but there is wider scope for their application (Brazier et al. 1999). VAS scales have the advantage of being able to measure a continuum across the full range of the concept from negative to positive and therefore have important advantages for the measurement of well-being (Parkin and Devlin 2006).

In this research, we turn our attention to the measurement of child well-being. Age-sensitive models of well-being are important because the relevant dimensions change with age, cognitive development and biological maturity. A recent systematic review of well-being instruments for adolescents (Rose et al. 2017) has identified eleven instruments that meet the criteria of following a strength approach and covering both affective and functional aspects, although it is noted that measurement instruments for children tend to be more subjective than objective measures (Pollard and Lee 2003). Tsang and colleagues (Tsang et al. 2012) focused on SWB in children and described 17 measures categorised into indicators based on deficits or strengths, or both. Deficit-oriented indicators included poor emotional awareness, negative affect, inhibition and social anxiety, whereas strength-oriented indicators were fewer and included positive affect, emotional regulation, social and family involvement and personal adaptation in terms of resilience or coping.

Clinical-based outcome measures are usually deficit-oriented indicators related to symptoms, for example social exclusion, depression and physical limitations. Although the documentation of deficits is essential for eligibility requirements of clinical services, it can be questioned whether this is the best way to evaluate well-being when compared to the general population. Reports indicate that there is a reduced quality of life associated with medical conditions (Olsson et al. 2013); however, the use of deficit-oriented indicators and a focus on a clinical population may lead to the over-estimation of psychological or social problems in certain populations due to a focusing effect (Brazier and Tsuchiya 2010). One good example of this is in relation to the psychosocial effects of short stature (Erling 2004; Chaplin et al. 2012), where deficit-based measures designed for clinical populations have been claimed to exaggerate problems when applied to the general population (Sandberg et al. 1994). Well-being measurement as an outcome of treatment can therefore contribute to the ongoing debate in relation to the psychosocial outcome of growth hormone treatment for short stature (Bullinger et al. 2009; Chaplin et al. 2011; Chaplin et al. 2012; Quitmann et al. 2016, Sandberg and Voss 2002; Voss 2001; Voss and Sandberg 2004). In order to explore this issue and compare clinical and population samples, an instrument with questions and indicators that can identify positive strengths and abilities in both populations is needed.

To meet this need, the Gothenburg Well-Being scale for children (GWB Child) (Erling et al. 2002) was developed. The scale was designed as a multi-dimensional questionnaire to measure well-being with reference to a general population (Wiklund et al. 1994). The questionnaire uses a series of VAS scales in order to measure the continuum between negative and positive aspects of well-being with the aim that the instrument is applicable to both clinical and non-clinical populations. It has been used on an annual basis to follow up the progress of children with short stature in clinical trials of growth hormone treatment and in a number of research studies (Erling et al. 1994; Wirén et al. 1998; Wirén et al. 2001; Erling et al. 2002; Erling 2004; Chaplin et al. 2011).

GWB Child has shown to have good internal consistency (ICC > 0.8) and has been used in both child and young adolescent populations (Erling et al. 1994), having been validated previously on children reaching puberty aged 9–13 years. However, knowledge of the psychometric properties of the instrument in an older adolescent population (up to 20 years) is required for clinical follow-up. This study aimed to test the factorial structure of GWB Child (six factors) in an older adolescent population. The questionnaire was therefore included in a study of students in the final year of high school (age 17–20 years). This also provided the opportunity to examine measurement invariance across gender in a large population. In healthcare research, a common approach to well-being measurement has been to develop a new instrument for each population and context; however, this takes time and resources and potentially makes it difficult to compare results across conditions and populations; furthermore, it may also waste useful historical data. A better approach may be to adapt, update or incorporate the existing models that have already been developed for use in similar contexts or populations.

1.1 Hypotheses

Our first hypothesis (H1) was that the factor structure developed for GWB Child (Wiklund et al. 1994) would also show good model fit in the data from the adolescent population.

Our second hypothesis (H2) was that an improved item pool which included items previously excluded due to poor fit in the younger age group can be found to have better model fit. The theoretical model incorporates emotional reactions, sense of achievement and physical activity. The robustness of the model fit was tested in a second population sample and assessed for measurement invariance across datasets and gender.

Our third hypothesis (H3) was that convergent and discriminant validity can be demonstrated in the final model. Convergent validity via a positive correlation between well-being and resilience (Stewart-Knox et al. 2012; Gomez et al. 2013) and between well-being and body image as an important aspect of adolescent life (Frisén 2007; Mond et al. 2010; Delfabbro et al. 2011; Erling 1999; Frisén 2007). Although these concepts are related to well-being, they are not equivalent (Avalos et al. 2005; Holmqvist and Frisen 2012); therefore, a moderate correlation can be predicted. The well-being scores are expected to differ between boys and girls; therefore, the scale should discriminate across gender, with boys having significantly higher well-being scores (Moksnes and Espnes 2013).

2 Methods

2.1 Ethics

Ethical approval was obtained from the Regional Ethics Committee (approval no. 444–08). Informed consent was obtained from all participants prior to investigation which included a comprehensive questionnaire and anthropometric measurements.

2.2 Participants and Test Procedure

The study was embedded within a larger study of linear growth, weight status, health and related behaviours in a population of 18-year-olds who had been followed from birth (Sjöberg et al. 2012). This study gave the opportunity for the validity to be assessed but, due to the number of procedures and questionnaires included in the main study, there was no opportunity for the inclusion of additional well-being measures which might have been used to further establish the convergent validity. Forty high schools out of a total of 47 in Gothenburg and surrounding areas participated. These ranged in size and contributed between 14 and 395 pupils to the survey. Participants completed a questionnaire as part of a larger survey examining growth, health, lifestyle and body image (Sjöberg et al. 2012). A study team visited the schools during normal lesson times to administer the questionnaire and take anthropometric measurements. Students sat together in their classroom to complete the questionnaire.

2.3 Instruments

Sociodemographic information was collected using a self-administered questionnaire which included age, gender and lifestyle questions.

2.3.1 The GWB Scale

The item pool generated for the GWB instrument was composed of a series of 43 polar-opposite adjective pairs (e.g. sad–happy or tense–relaxed) aimed at describing behavioural attributes, feelings and important concepts of well-being identified through discussions with children and teachers (Wiklund et al. 1994). The GWB Child six-factor model was developed in three gender-balanced populations of 9-year-olds (N = 117), 11-year-olds (N = 116) and 13-year-olds (N = 109). Thirty-four of the 43 questions were retained from the exploratory factor analysis (EFA) (Wiklund et al. 1994). Convergent validity of the GWB Child was assessed by correlation with the ‘I Think I Am’ self-perception scale (Ouvinen-Birgerstam 1985; r = .64; p < .001). The internal consistency of this scale was shown to be excellent (Cronbach’s α = .88). The factor structure was identified by the test creators (Erling et al. 1994) using an EFA alpha factoring and varimax rotation. Thirty-four items with a factor loading of ≥0.40 were combined to form six factors: Alertness (11 items, e.g. contented/not contented), Self-esteem (seven items, e.g. shy/brave), Mood (four items, e.g. happy/sad), Elation (five items, e.g. talkative/quiet), Stability (four items, e.g. relaxed/tense) and Vitality (three items, e.g. quick/slow). No other information concerning the factor analysis is provided in the published works.

2.3.2 Resilience Scale (RS-11)

Resilience was measured using the RS-11 scale (Schumacher et al. 2005; von Eisenhart Rothe et al. 2013; Kocalevent et al. 2015), which is based on an earlier 25-item scale (Wagnild and Young 1993; Wagnild and Collins 2009). Construct validity of the RS-11 was assessed by correlation with the general self-efficacy (GSE) scale (Schwarzer and Jerusalem 1995; r = .70; p < .001). The internal consistency of this scale was shown to be excellent, with Cronbach’s α = .91 (Schumacher et al. 2005). The RS-11 uses an equally weighted summed index composed of all eleven questions. The resulting sum score was then transformed into a 0–100 scale, where higher scores represent greater resilience.

2.3.3 Body Image Scale

Body image was assessed via the ‘I Think I Am’ method (Ouvinen-Birgerstam 1985) which was developed in Sweden and used for studying various aspects of self-esteem. It had also been used for validation in the original GWB Child (Wiklund et al. 1994). Eight statements relating to body image were used. These statements constitute an internally consistent scale (Cronbach’s α = .83) and summarise body image, body development and satisfaction with body image. For each statement, four response alternatives were available (‘true’, ‘partly true’, ‘partly not true’ and ‘not true’). A summary variable was constructed, and the body image questions transformed to a score between 0 and 100, where 0 represents totally dissatisfied and 100 as the most positive perceived body image.

2.4 Data Analysis

Analysis was conducted using IBM SPSS Statistics for Windows, Version 21 (IBM Corp., Armonk, NY, USA). Descriptive statistics (means, standard deviations, medians, skewness and kurtosis) were used to compare groups. Item sensitivity was assessed through skewness and kurtosis for deviance from normal distribution (Kline 2011).

The population was divided into two equal-sized, randomly allocated sub-samples to be used as a training and a test dataset. The use of a second dataset allowed us to test whether the model would generalise to an independent population. Hypotheses H1 and H2 were tested using maximum-likelihood confirmatory factor analysis (CFA) using IBM SPSS AMOS 25. Model fitness was assessed using Goodness of Fit Index (GFI), Adjusted Goodness of Fit Index (AGFI), Normed Fit Index (NFI), Tucker-Lewis index (TLI) Comparative Fit Index (CFI), root mean square error of approximation (RMSEA) and standardised root mean square residual (SRMR). The model was considered to have good fit if GFI, AGFI, NFI and TLI were higher than 0.9 and CFI was higher than 0.95 (Kline 2011), and if RMSEA and SRMR are lower than 0.05 (Hooper et al. 2008). χ2 was reported, but because it is extremely sensitive to sample size and rejects reasonable models if the sample is large (Andrich 1988), it was not relied upon for model fit (Bergh 2015); therefore, TLI, CFI and RMSEA, which are less affected than other indices by sample size and model complexity (Bollen and Long 1993), were preferred. Reliability of the scale was assessed using Cronbach’s alpha.

Hypothesis 3 (H3) was that the model could differentiate between genders and demonstrate a relationship to the associated concepts of resilience and body image.

3 Results

A total of 5687 students completed the GWB scale (49 items); of these, 5% (291 cases) were excluded listwise: four had clearly spoiled responses to the VAS (reporting 0 for all questions following the first page); 77 had completely missing Body Image, Resilience and demographic data; 66 were missing all Body Image and Resilience items; 144 had greater than 40% item non-response, amounting to more than three pages of the questionnaire. Although this missing data could be considered to be missing at random, the large amount of missing data led us to conclude that these students had not fully participated in the survey, making the remaining responses untrustworthy. A binary logistic regression was used to assess whether it was possible to predict this group by demographic variables (age, gender, BMI or school). This revealed that both gender and school were predictors (χ2 = 29.5; df = 4; p < .001). Gender was another variable that had not always been completed by the students in this missing-items group, but of the 163 cases where gender was indicated in the sample with missing data, it was found that boys were twice as likely as girls to return questionnaires with missing data (N = 108/55). Of the 40 schools surveyed, 33 had questionnaires with missing data, with a mean rate of 3%. The two schools with the highest rate of missing data (9% of cases) were examined but there was no obvious reason for these two medium-size city schools should be treated differently from the others. Therefore, listwise deletion of these 291 cases does not appear to introduce a bias nor lower the power to detect effects.

Of the remaining 5395 cases, 13% (685 cases) completed the three questionnaires with missing data, resulting in 885 missing data points (0.2%). Values were imputed using both multiple imputation and linear imputation in SPSS following reversal of negative items. The difference between these two techniques for the VAS was on average 0.0001 for the mean of the scales. The study population therefore comprised 5395 students: 2709 girls (50%) and 2686 boys (50%), mean age 19 (SD = 0.45; age range 17–20 years), see Table 1. Table 1 also describes the training (N = 2661) and test datasets (N = 2734). The datasets could not be predicted based on demographic variables of age, gender, BMI, or school (χ2 = 3.88; df = 4; p > .05) and no significant differences between outcome variables was found (Table 3). This allowed the stability of the factor solution to be investigated in a second random subsample (test dataset). See Table 1 for a description of each of the datasets.

Table 1 Sample characteristics of the adolescent population, divided by gender and datasets for training and testing

Table 2 presents the descriptive statistics of the results from the well-being (GWB), resilience (RS-11) and body image questionnaires. The total dataset was normally distributed with a mean skewness of −0.18 (SE = 0.033; range from −1.192 to 1.056) and mean kurtosis of −0.310 (SE = 0.067; range from −1.098 to 1.480). Normality was also maintained across gender, although differences between gender were found on all well-being dimensions and on Body Image. Boys had significantly higher well-being scores, although Stress Balance was better for girls. There was no gender difference in resilience (Table 3).

Table 2 Descriptive statistics for outcome variables of well-being (GWBa), resilience (RS-11) and body image
Table 3 Gender differences across the outcome variables

3.1 Confirmatory Factor Analysis

Table 4 presents the three models. The full list of factor loadings for all three models is presented in Appendix 1. Model 1, a data-derived, six-factor solution with 34 items (Gothenburg Well-Being scale for children, GWB Child), was found to have poor model fit in this population of older adolescents (GFI = .832, TLI = .770, CFI = .790, RMSEA = .073). Modification of the scale by co-variance and removing the more poorly performing items did not improve the model fit sufficiently without compromising the factor structure (data not reported here).

Table 4 Goodness-of-fit statistics for original, hypothesized and, and final model across training/test dataset and gender

Therefore, Hypothesis H1 does not hold and the GWB Child model was rejected.

Model 2, a theoretically hypothesised, five-factor bifactor solution with 22 items (Gothenburg Well-Being scale for adolescents, GWB Adolescent), was found to fit the data fairly closely (GFI = .915, TLI = .888, CFI = .902, RMSEA = .066). Model 2 is shown in Fig. 1. This model was modified by removing the more poorly performing items to produce the final model, Model 3 (GWB Adolescent), which was composed of three factors with one general factor.

Fig. 1
figure 1

A theoretically hypothesised, five-factor model with 22 items (Model 2) The loading for each item is shown above the arrow on the left side. The correlation coefficients between the first-order and second-order factor are shown beside the lines between the factors. Standardised estimates are shown. Chi-square test (χ2/df = 12.71), Goodness of Fit Index (GFI = .915), Adjusted Goodness of Fit Index (AGFI = .894), Normed Fit Index (NFI = .894), Tucker-Lewis Index (TLI = .888) Comparative Fit Index (CFI = .902), Root Mean Square Error of Approximation (RMSEA = .066) and Standardised Root Mean Square Residual (SRMR = .054)

The CFA results from the training dataset (A) and the test dataset (B) are shown in Table 4. In the training dataset (A), Model 3 was found to have achieved good fit (GFI = .978, TLI = .970, CFI = .980, RMSEA = .059). The graphical presentation of Model 3 CFA is shown in Fig. 2. Next, we tested for measurement fit of Model 3 in the test dataset (B) using multi-group CFA (Table 5). This also indicated a good fit (GFI = .985, TLI = .979, CFI = .986, RMSEA = .049) (Fig. 3). Measurement invariance was tested between groups (Table 6) and between genders (Figs. 4 and 5, Table 7), based on delta CFI (Cheung and Rensvold 2016). Taking the practical approach of examining the difference in CFI using the threshold of < .002 (Meade et al. 2008), it is possible to draw the conclusion that Model 3 is invariant across the two groups. It can also be concluded, based on the CFI difference, that the model is invariant across gender (Table 7).

Fig. 2
figure 2

Model 3, training dataset (N = 2661) GWBa = Gothenburg Well-Being scale for adolescents. The loading for each item is shown above the arrow on the left side. The correlation coefficients between the first-order and second-order factor are shown beside the lines between the factors. Standardised estimates are shown. Chi-square test (χ2/df = 10.25), Goodness of Fit Index (GFI = .978), Adjusted Goodness of Fit Index (AGFI = .959), Normed Fit Index (NFI = .978), Tucker-Lewis Index (TLI = .970) Comparative Fit Index (CFI = .980), Root Mean Square Error of Aproximation (RMSEA = .059) and Standardised Root Mean Square Residual (SRMR = .026)

Table 5 Cronbach’s alpha coefficient for the GWBa scale and subscales and correlation with outcome variables
Fig. 3
figure 3

Model 3, test dataset (N = 2734) GWBa = Gothenburg Well-Being scale for adolescents. The loading for each item is shown above the arrow on the left side. The correlation coefficients between the first-order and second-order factor are shown beside the lines between the factors. Standardised estimates are shown. Chi-square test (χ2/df = 7.59), Goodness of Fit Index (GFI = .985), Adjusted Goodness of Fit Index (AGFI = .972), Normed Fit Index (NFI = .984), Tucker-Lewis Index (TLI = .979) Comparative Fit Index (CFI = .986), Root Mean Square Error of Approximation (RMSEA = .049) and Standardised Root Mean Square Residual (SRMR = .021)

Table 6 Model fit summary: measurement invariance across training and test datasets
Fig. 4
figure 4

Model 3, boys in the combined training and test datasets (N = 2686) GWBa = Gothenburg Well-Being scale for adolescents. The loading for each item is shown above the arrow on the left side. The correlation coefficients between the first-order and second-order factor are shown beside the lines between the factors. Standardised estimates are shown. Chi-square test (χ2/df = 8.71), Goodness of Fit Index (GFI = .982), Adjusted Goodness of Fit Index (AGFI = .966), Normed Fit Index (NFI = .982), Tucker-Lewis Index (TLI = .976) Comparative Fit Index (CFI = .984), Root Mean Square Error of Approximation (RMSEA = .054) and Standardised Root Mean Square Residual (SRMR = .019)

Fig. 5
figure 5

Model 3, girls in the combined training and test datasets (N = 2709) GWBa = Gothenburg Well-Being scale for adolescents. The loading for each item is shown above the arrow on the left side. The correlation coefficients between the first-order and second-order factor are shown beside the lines between the factors. Standardised estimates are shown. Chi-square test (χ2/df = 7.64), Goodness of Fit Index (GFI = .984), Adjusted Goodness of Fit Index (AGFI = .971), Normed Fit Index (NFI = .982), Tucker-Lewis Index (TLI = .977), Comparative Fit Index (CFI = .985), Root Mean Square Error of Approximation (RMSEA = .050) and Standardised Root Mean Square Residual (SRMR = .026)

Table 7 Model fit summary: measurement invariance across gender

An alternative approach provided by AMOS is to use a multigroup model (Table 6.2 and 6.3). This uses the traditional invariance-testing approach based on the χ2 difference test. Using this method, the delta χ2 value between the unconstrained and configural models was found to be 13.87 with 6 degrees of freedom. This χ2 difference value is statistically significant at p < .05. Based on this result, it could be concluded that one or more of the factor loadings are not operating equivalently across the two groups and that Model 3 is completely non-equivalent across the training and test datasets. Although the results are presented here for comparison, it was concluded by the authors that, given that the datasets were randomly assigned and could not be predicted based on demographic variables (as described above), the results from the CFI analysis were the most trustworthy. This was also true for the gender analysis.

3.2 Reliability and Validity

Table 5 presents reliability and convergent validation results. Reliability of the total scale was indicated by a Cronbach’s alpha of 0.59. Cronbach’s alpha for the internal scales ranged from .70 to .89. The correlation between the four well-being factors shows a good correlation between Mood and QWB adolescent (GWBa) total and between Activation and GWBa total but a poor relationship between Stress Balance and GWBa total. Table 5 also shows the correlation between the well-being factors and the theoretically linked Resilience and Body Image variables. GWBa total explains 18% of the variance in Resilience (r = .42) and 12% of Body Image (r = .35). This is in line with the expectations of a modest agreement between the concepts that these measures represent. Discriminant validity scores for gender differed as predicted. Table 2 shows that boys recorded a total mean score on GWBa total of 61.79 (1 SD 0.86) as opposed to the lower GWBa total score for girls of 58.78 (11.81 SD). This was statistically significant (p < .001) in the predicted direction. Boys scored significantly higher than girls on the factors Mood, Activation and GWBa total but lower than girls on the factor Stress Balance.

Evidence of convergent and discriminant validity was found; therefore, the third hypothesis (H3) was accepted. Model 3 does have divergent and convergent validity.

4 Discussion

The results suggest that the nine-item model (Model 3) had acceptable model fit to older adolescents, good validity and acceptable reliability and can therefore be used in the measurement of well-being in this age group. The new factor structure provides a better model fit to the data with fewer questions than the GWB Child scale. The GWB Adolescent scale is composed of nine VAS bipolar adjective pairs and has three factors and one general well-being factor. These factors cover emotional well-being, activation and stress balance. Measurement invariance was found, for Model 3, between the two datasets and between genders, despite significant differences between the gender scores on the well-being dimensions, suggesting that the model can be used for group comparisons and that the model equally applies across gender. The GWB Adolescent scale has some unique qualities: it is based on an approach grounded on the experience of a paediatric population and the results were analysed on the basis of a theoretically hypothesised factor-analytic model that has been validated in a school-based population.

The relationship between well-being and resilience (Sagone and de Caroli 2014) and between well-being and body image, especially in this age group (Delfabbro et al. 2011), is well known. The correlation between these variables, as found in this study, contributes to the evidence for convergent validity of the instrument.

One unexpected result is the poor correlation between Stress Balance and Resilience (r0.22). One explanation might be that well-being captures a psychological state at a single point in time; resilience, on the other hand, is a dynamic concept, taking into account how someone has reacted to stressors in the past. The poor correlation between Stress Balance and Resilience may indicate that these are non-related concepts. The possibility of high well-being and yet low resilience has been noted before and is referred to as the ‘resilience paradox’ (Mguni et al. 2012).

There are some limitations to this study which should be considered. In the collection of the items to be included in the GWB Adolescent scale we have referred back to the original GWB study (Wiklund et al. 1994) and taken the complete set of bipolar adjective pairs identified from discussions with children, young adults and teachers. These included issues that were relevant to older adolescents such as sexual interest; unfortunately, as in the original research, it was not possible to validate all of these items. This suggests that there is still work to be done to include this aspect of an adolescent’s experience in the concept of well-being. An important limitation is that the data is drawn from a narrow age category (17–20 years) due to a focus on young people in the last year of school. However, this is an interesting stage of life, on the cusp of adult life, and therefore provides a unique dataset.

The low Cronbach’s alpha for the total scale is disappointing. Other combinations of items that prioritised the alpha were possible but here we have focused on testing the model and applying model fit criteria. The intention with the GWB Adolescent scale is to capture a broad concept of well-being and, given that this model is theory based, an alpha of 0.6 can be considered acceptable for a new measure in a new population (Nunnally 1988). A two-item scale (Stress Balance) might not be considered to be sufficient to provide adequate identification for the construct, however, there were no other items that improved model fit theoretically and loaded on this construct. The Cronbach’s alpha for Mood and Stress Balance was high (> 0.80), which is surprising, given the small number of items in these scales and the fact that Cronbach’s alpha tends to underestimate the reliability of a scale when there are only a few items. This may indicate that some of these items are basically the same item repeated (Cortina 1993). Further cognitive debriefing for these items may therefore be indicated.

The difficulty of updating a questionnaire and factor structure have been explored in this study. Questionnaires must be allowed to evolve over time – either by renewing or updating the questions and analytic model at discrete time points or by allowing the questionnaire to operate dynamically, updating continuously in an online fashion. This process is made considerably easier where the measure is unidimensional and developed using item-response techniques (IRT). In an IRT framework, new questions could be added into existing measures without needing to re-validate the entire item bank. In this study we were aware that the original questionnaire was far too long and that a short form was required; the resulting 9-item questionnaire can be seen as a substantial improvement in terms of reduced burden on the respondent.

The importance of transparent reporting of the development, monitoring and validation of instruments should also be addressed here. We were in the privileged position of having access to some of the original focus group findings used in the development of the original questionnaire. However, this information was far from complete and there was no opportunity to complement it. Nonetheless, even this put us in a better position than other researchers wanting to update questionnaires. It should be a requirement of test construction that a repository for this information be created for the purpose of potential future revision.

This study has shown that the GWB Adolescent scale is appropriate for this age group, that it has good construct validity and is acceptable for use with a community of young people. The bidirectional nature of the questions means that positive affect and strength-based approaches are used,which is also in-line with the modern approach to well-being measurement. Future research is needed into the psychometric properties of the GWB Adolescent scale and how the questionnaire operates in different populations of young people. The relationship to resilience should be further investigated to determine the boundaries of these two concepts.

In conclusion, the nine-item, three-factor, strength-based GWB Adolescent scale offers reliable psychometric properties which will extend our understanding of well-being in older adolescents.