1 Introduction

The interdisciplinary nature of research on wellbeing has led to inconsistencies in its conceptualisation and measurement. Psychologists, economists, public health and policy researchers, and educators – amongst others – place varying, and often competing, emphases on different constituent elements. This complexity is compounded by the assumed multidimensional nature of wellbeing (Diener, 2000; Kern et al., 2015; Keyes & Waterman, 2003), and the use of unstandardised terminologies and metrics (Black et al., 2019; Keyes et al., 2002). This paper examines how constituent sub-domains of wellbeing combine, interrelate, or parallel one another to reflect a psychometric construct with different theoretical meanings and different proposed structural conceptions.

In the psychological sciences, clearly defined constructs serve as building blocks for further empirical and theoretical work. However, conceptual clarification of these constructs often plays a marginal role in psychological research (Bringmann et al., 2022). Wellbeing, in particular, can be conceived of as a superordinate construct, subsuming several subordinate constructs (Okun & Stock, 1987), hereon referred to as the ‘sub-domains’ of wellbeing. However, analyses considering predictors and distributions of wellbeing are often conducted with no clear focus on a chosen ‘structural conception’ of the construct – i.e., is it multidimensional, and if so, which sub-domains does it encompass and how do they relate to one another? With multiple schools of thought describing wellbeing in silos, underlying structural conceptions of wellbeing may often be chosen on an ad-hoc basis, or with no deliberate reasoning provided, leading to further measurement and analytical challenges. There is thus often a gap between theoretical assumptions and associated empirical constructs. By mapping out competing wellbeing measurement models, we can examine how varying theoretical conceptions may be supported by empirical analysis.

1.1 Hedonic and Eudemonic Wellbeing

Commonly, the study of wellbeing uses research paradigms of hedonia and eudemonia (Kashdan et al., 2008; Ryan & Deci, 2001; Ryff et al., 2021; Thorsteinsen & Vittersø, 2020). These represent distinct theories which can differentially drive the operationalisation of wellbeing sub-domains. Hedonic wellbeing, often associated with words like ‘pleasure’ and ‘happiness’, has a utilitarian basis and is commonly accepted as comprising the sub-domains of life satisfaction, positive affect, and negative affect (Diener, 2000; Kahneman et al., 1999). Eudemonic wellbeing, meanwhile, is based on Aristotle’s notion of ‘telos’, that the ‘good life’ is one that strives towards purpose and betterment. There is less consistent agreement on its constituent sub-domains, which focus on constructs related to flourishing and functioning, typically including (but not limited to) self-determination, motivation, autonomy, optimism, and positive relationships (Deci & Ryan, 2000; Keyes & Annas, 2009; Ryff, 1989). This paper covers the following sub-domains in line with the above literature and available data: life-satisfaction, positive affect, and negative affect (hedonic wellbeing; Diener, 2000; Kahneman et al., 1999); and autonomy, self-esteem, optimism, and positive relationships (eudemonic wellbeing; (Ryff, 1989; Ryff et al., 2021).

1.2 The Relationship(s) Between Hedonic and Eudemonic Factors in Measurement Models

Some researchers advocate for discrete measurement of hedonic and eudemonic wellbeing on the basis of their theoretical separation (Delle Fave & Bassi, 2009; Keyes & Annas, 2009; Ryan & Huta, 2009) and previous factor analytic studies have found distinctions between ‘happiness’ (hedonic) and ‘meaning’ (eudemonic) measures, with a moderate or weak relationship between the two (McGregor & Little, 1998; Ryff & Keyes, 1995). These distinctions have also been validated by evidence of distinct external correlates (Clarke et al., 2023; Huta, 2016; Joshanloo et al., 2021). However, others point to a high correlation between the two factors in measurement models e.g., recent factor analytic studies such as Disabato et al. (2016) and Strelhow et al. (2019) find high hedonic-eudemonic correlations of .96 and .90 respectively. Evidence is therefore mixed as to whether these constructs are distinguishable or not, and there is a need for further theory-guided structural analyses.

Often thought of as correlated but distinct constructs in factor analysis, Keyes et al. (2002) maintain that while hedonic and eudemonic wellbeing constructs are intercorrelated (at 0.84), any best-fitting model to assess positive psychological functioning would still maintain a distinction between the two, based on a study of model fit. However, this approach to modelling has been questioned. For one, there is the contention that best fit should not in fact be used to argue the presence of distinct factors in correlated factors models (Gignac & Kretzschmar, 2017). Second, as mentioned above, studies that find a lack of discriminant validity and a large correlation between the two do still suggest that hedonia and eudemonia might capture the same latent wellbeing construct (Disabato et al., 2016; Kashdan et al., 2008). The key tension highlighted in these ongoing debates is that while an exclusive research focus on either type of wellbeing paradigm may risk oversimplification, a composite score combining all measures together, or the use of a single general factor, may miss nuances between the two paradigms or, importantly, may not be psychometrically supported.

1.3 Measurement Issues

These mixed results may also depend on the variety and inconsistency of the chosen measurement scales representing the underlying factors. Content validity of psychological constructs is often justified on a theoretical basis but unable to be more concretely quantified (Steger et al., 2023), and constructs and their measures are thus easily conflated. The distinction between hedonic and eudemonic wellbeing needs to be analysed on both conceptual and measurement grounds, as such, attention must be paid to the variety of scales being used when assessing the distinction between the two conceptions of wellbeing. In the above studies, for instance, a number of different studies use combinations of the same scales to represent hedonic and eudemonic wellbeing factors respectively. The inconsistencies in these above studies could be a result of the range of different measurement scales chosen and their related psychometric properties, as well as due to prevalence of jingle-jangle fallacies (Brookman-Byrne, 2020) in the literature-base (e.g., ‘meaning’ as a direct substitute for ‘eudemonia’; the interchangeable use of ‘happiness’ and ‘wellbeing’).

For instance, the Satisfaction with Life Scale (SWLS; Diener et al., 1985) has been commonly used as a part of hedonic wellbeing, while Ryff’s Psychological Well-being Scale (PWBS; Ryff & Keyes, 1995), composed of six sub-scales, is the most frequent measure of eudemonic wellbeing. Some studies that use both of these find modest and high correlations e.g. Linley et al., 2009 find a hedonic-eudemonic correlation of 0.76 using SWLS, and Positive and Negative Affect Scales (PANAS; Watson et al., 1988) for hedonic wellbeing and PWBS for eudemonic; Disbabato et al. find a correlation of 0.96 using SWLS, The Subjective Happiness Scale (Lyubomirsky & Lepper, 1999) and The Center for the Epidemiological Studies Depression Scale (CES-D; Radloff, 1977) for hedonic wellbeing and PWBS and the Meaning in Life Questionnaire (Steger et al., 2006) for eudemonic. Other studies include the same two scales yet find distinct values and distinct correlates e.g., Joshanloo et al., 2021 use SLWS and Negative and Positive Affect Scale (NAPAS; Mroczek & Kolarz, 1998) for hedonic wellbeing and PWBS for eudemonic wellbeing and their findings support the distinctiveness of hedonic and eudemonic wellbeing based on differential relationships with other criterion under study.

Mixed results are more expected when the scales used do not overlap across studies e.g., Strelhow et al. (2019) still use PWBS for eudemonic wellbeing, but use the Brief Multidimensional Students’ Life Satisfaction Scale (BMSLSS; Seligson et al., 2003) and Core Affects Scale (CAS, Russell & Carroll, 1999) for hedonic wellbeing to find a high factor correlation between hedonic and eudemonic factors, while McGregor and Little (1998)’s principal component analysis uses the single Purpose in Life Scale to find two separate happiness and meaning factors within it. Vanhoutte and Nazroo (2014)’s study on elderly populations meanwhile uses a set of eudemonic scales called CASP (Hyde et al., 2003), alongside a range of hedonic measures covered above (SWLS; CES-D) and find the distinction between hedonic and eudemonic wellbeing to be less relevant in later life compared to cognitive-affective wellbeing models in their studies. In light of this, this paper adds to the theoretical and measurement literature by using commonly used and previously validated measurement scales (see Section 2.2), often specifically used for adolescents wellbeing, most of which have not been used in the studies covered above. Loevinger (1957) distinguishes between three components of construct validity: substantive (focused on content), structural (focused on dimensionality), and external (focused on relationships with other constructs). This paper focuses on the relationships between associated constructs (i.e., the various sub-domains of hedonic and eudemonic wellbeing) by testing the external discriminant validity of the scales used in an established survey programme (see Section 2). To aid the conceptual focus of this paper, we rely on the substantive and structural validity of the sub-domains under study having been established in previous studies of the same measures (Black et al., 2022), this then allows us to discuss the possibility of the hierarchical and multi-dimensional nature of overall wellbeing as a conceptual construct.

1.4 Adolescence

This paper focuses on adolescence, a developmental period wherein work on the structure of wellbeing domains is lacking, with existing studies of measurement models typically focusing on adults (Disabato et al., 2016; Vanhoutte & Nazroo, 2014). The measurement, conceptualisation, and determinants of overall wellbeing as well as its sub-domains may differ across ages, and research conducted on adult samples cannot be uncritically applied to children and adolescents (McLellan & Steward, 2015). Previous studies have found that low wellbeing in adolescence predicts negative outcomes later in adult life (Kansky et al., 2016; Layard et al., 2014), and there has been much focus on adolescents’ wellbeing in relation to their mental health (Kessler et al., 2005), physical health (Richmond-Rakerd et al., 2021), educational attainment (Patel et al., 2007; Smith et al., 2021) and employment (Witt et al., 2018). However, these studies have conceptualised and modelled wellbeing in varied ways, leading to gaps in consistent, systematic measurement and analysis. A more rigorous understanding of adolescent wellbeing could thus provide a fruitful avenue for early intervention.

1.5 Research Questions and Models

In light of the above, this study analyses three measurement models, questioning (1) is there empirical support for a hedonic-eudemonic distinction? and (2) is there the possibility of essential unidimensionality? Building on Disabato et al. (2016) and Kashdan et al. (2008), we ask whether the distinction between hedonic and eudemonic latent factors is empirically supported in multidimensional wellbeing measurement models. Notably, we also explicitly test whether wellbeing can be considered to be ‘essentially unidimensional’ (i.e., can the majority of variance between items be explained by a general wellbeing factor?) by estimating a bifactor model in which each item loads both onto its theorised subdomain and a general factor.

The three alternative models are presented as follows:

  1. (1)

    Model 1 (Fig. 1) is a correlated factors model (Brown, 2015) with seven first-order factors: life satisfaction, positive affect, negative affect, autonomy, optimism, self-esteem, and positive relationships. The sub-domains are correlated constructs with no superordinate wellbeing domain or distinction between hedonic and eudemonic wellbeing. This model is used to assess the strength and uniformity in factor covariances across all sub-domains.

    Fig. 1
    figure 1

    Model 1: Correlated factors model

  1. (2)

    Model 2 (Fig. 2) is a second-order model with two higher-order factors and the seven lower-order factors used in Model 1, operationalising the hedonic-eudemonic theoretical differences in a single hierarchical model (i.e., in line with Keyes et al., 2002’s two-factor structure). The higher-order factors explicitly model the shared variance between the subordinate factors (i.e.,. the sub-domains assigned to each paradigm). Based on the literature above that links hedonic and eudemonic wellbeing factors, the two higher-order factors are correlated. This model is used to assess the magnitude of the correlation between higher-order hedonic and eudemonic factors, considering whether findings are consistent with similar work done in other populations, such as by Disabato et al. (2016), Strelhow et al. (2019) or Vanhoutte and Nazroo (2014).

    Fig. 2
    figure 2

    Model 2: Second-order two-factor model

  1. (3)

    Model 3 (Fig. 3) is a bifactor model, with items loading onto each of the above sub-domains respectively, as well as simultaneously onto a single general wellbeing factor. Bifactor models model can be used to assess ‘essential unidimensionality’ under more realistic conditions than a strict unidimensional model; the latter is often unrealistic in psychological measurement (Rodriguez et al., 2016). While previous studies on conceptualising wellbeing have tested purely unidimensional models (e.g., Vanhoutte & Nazroo, 2014) or a higher-order general factor model (e.g., Vanhoutte & Nazroo, 2014; Disabato et al., 2016), the bifactor model is a more robust and direct test of essential unidimensionalilty (Rodriguez et al., 2016), presenting an avenue for building on the previous models. This model thus extends the literature by considering whether the use of a general wellbeing factor is in fact psychometrically supported.

    Fig. 3
    figure 3

    Model 3: Bifactor model

2 Materials and Methods

2.1 Sample

The current study draws on baseline survey data from the #BeeWell project, a three-year cohort study collecting data on the domains and drivers of wellbeing for adolescents aged 12–15 (i.e., in Year Groups 8 and 10 in the baseline survey) attending secondary schools in the Greater Manchester (GM) city region in the UK.

The baseline #BeeWell sample consisted of 37,978 adolescents from 165 schools across the 10 Local Authorities of GM. Demographic characteristics of the sample are presented in Table 1. The #BeeWell survey sample composition mirrors the 11–16 GM population in terms of sex, English as an Additional Language (EAL), Special Educational Needs (SEN), Free School Meal eligibility and Ethnicity (#BeeWell Research Team, 2022), and also broadly reflects national averages for England for these categories, with the exception of ethnicity (Black et al., 2022).

Table 1 Demographic characteristics of sample

Participants who had missing data for all of the variables in the study were removed, bringing the sample size down to 37,630.

2.2 Measures

The #BeeWell survey uses self-report measures, which were selected in consultation with young people across GM and a questionnaire advisory group (#BeeWell Research Team, 2021), considering issues of accessibility, completion burden, and validity. An analysis of the psychometric properties of each scale used in the survey has been undertaken by Black et al. (2022). Tables 2 and 3 present the relevant measures used in the current study.

Table 2 Measures of Hedonic wellbeing
Table 3 Measures of Eudemonic wellbeing

2.3 Analytic Strategy

All analyses were conducted in R Studio v.2021.09.0. The code and a synthetic dataset are available here: https://osf.io/q6d2g/. Under the project’s ethics, the real dataset cannot be made publicly available until the end of the project.

Reverse-worded items were recoded as necessary. Descriptive statistics for each of the items and reliability estimates for each measurement scale were computed.

Three alternative measurement models, as outlined in Section 1.4 in the introduction, were fitted using Confirmatory Factor Analysis (CFA; see Figs. 1, 2 and 3), using the latent variable analysis (lavaan v. 0.6–11) package in R (Rosseel, 2012). The Robust Maximum Likelihood (MLR) estimator was used so that missing data could be handled via full information (Kline, 2016; Lei & Shiverdecker, 2020), treating variables as continuous. Given most survey variables are measured on a response format of five or more, maximum likelihood estimation is considered appropriate (Rhemtulla et al., 2012).

2.3.1 Model Fit

Model fit for these CFA models was judged in line with Hu and Bentler (1999)’s widely-used cut-offs: RMSEA < 0.06; SRMR < 0.08; TLI and CFI > 0.95. Chi-square statistics, reported but not interpreted, were not considered a useful benchmark as they are sensitive to sample size and contain a restrictive hypothesis test (Schermelleh-Engel et al., 2003).

To consider relative measures of parsimony, and the trade-off between model fit and complexity of the model, as part of model selection, we additionally reported AIC (Akaike’s Information Criteria) and BIC (Bayesian Information Criteria) for all three models, with relatively lower AIC and BIC indicating lower levels of information loss and thus ‘better’ fit.

2.3.2 Interpreting the Bifactor Model

For Model 4, the bifactor model, fit statistics show bias in favour of the bifactor model (Greene et al., 2019; Rodriguez et al., 2016) and multidimensional bifactor models are arguably uninterpretable (Eid et al., 2017). Thus, while fit was reported for transparency, it was not used as a basis of comparison – instead, this model was used to judge essential unidimensionality based on additional indices representing factor strength.

These additional indices (Reise et al., 2013) were calculated using the bifactorIndices package on R. The explained common variance (ECV) represents the proportion of common variance that is due to the general wellbeing factor compared to the specific factors. The percentage of uncontaminated correlations (PUC) measures the number of unique correlations that are explained by the general factor alone. The ECV and PUC were calculated, with value > 0.70 for both considered indicative of ‘essential unidimensionality’ (Black et al., 2022; Rodriguez et al., 2016). Factor variance for each specific factor in the bifactor model was also assessed, with lower variance indicating items explained less by their specific factor and more by the general factor.

2.3.3 Factor Loadings and Covariances

We also considered the arrangement of latent variables in each model with respect to strength and pattern of factor loadings and covariances. For Model 1, the correlated factors model, covariances across the latent factors are judged in line with Vanhoutte and Nazroo (2014) who use similar criterion for assessment of models: correlations < 0.60 = weak, and > 0.75 = strong. Beyond strength, the uniformity and consistency of these associations can determine the possibility of further groupings (for instance, by ‘hedonic’ latent factors possibly showing stronger correlations to one another than to the ‘eudemonic’ latent factors).

For Model 2, the magnitude of the correlation between the hedonic and eudemonic higher-order factors in the two-factor model was used in line with the correlation criterion above to determine whether each higher-order construct represented separate types of well-being. For Model 3, the bifactor model, the pattern of factor loadings can be used to inform discussions on the strength and content of a general factor, for instance, are they even across all items or dominated by a few items belonging specific factors (Bornovalova et al., 2020)? Thus, an examination of each model’s factor loadings and structures further informs our discussions on the possibility of a hedonic-eudemonic distinction and the strength of a general factor.

3 Results

Descriptive statistics for the survey items are presented in Table 4.

Table 4 Descriptive statistics

Floor and ceiling effects are reported to assess the distribution of responses for each question, i.e., the proportion of adolescents’ answering the lowest (floor) or highest (ceiling) possible score for each item. There is commonly accepted 15% threshold consisting of an effect on either side (Terwee et al., 2007), however the effect on MLR estimation can be considered broader at less than 50% (Rhemtulla et al., 2012). These effects are also influenced by the range of the response format (see Tables 2 and 3). Thus, the higher scores for the five Negative Affect items should be considered in light of its 3-point response scale.

3.1 Confirmatory Factor Analysis

3.1.1 Model Comparisons

Model-fit indices were assessed across the three models to determine which of the three models best represented the data (Table 5).

Table 5 Model fit-indices comparisons

3.1.2 Model 1: Correlated Factors Model

The correlated factors model showed acceptable fit, χ2(385) = 33212.596, p < .001; RMSEA = 0.048 (90% CI [0.047, 0.048]); CFI = 0.930, TLI = 0.921, SRMR = 0.056. CFI and TLI were slightly lower than the acceptable threshold of 0.95; it is known that these indicators can be worsened when there is a large number of indicators (Kenny & McCoach, 2003). Factor loadings for all items to respective sub-domain factors were relatively strong (λ = 0.62 to 0.91), with the exception of two items on the Autonomy sub-scale which were 0.45 and 0.01. These items (A1 and A4) were negatively worded and reverse-scored. All items meet the threshold of significance, apart from A4 which has a p-value of 0.78. Covariances between latent factors ranged from 0.47 to 0.66, considered weak-to-moderate in line with the above criteria. The factors covary with one another in similar ways and within a low range, with no notable groupings formed between the traditionally hedonic or traditionally eudemonic sub-domains. Figure 4 depicts the model’s standardised results.

Fig. 4
figure 4

Model 1 – Correlated factors model

3.1.3 Model 2: Second-Order Two-Factor Model

The higher-order two-factor model showed acceptable fit, Χ2(398) = 34305.991, p < .001; RMSEA = 0.048 (90% CI [0.047, 0.048]); CFI = 0.928, TLI = 0.921, SRMR = 0.058. CFI and TLI are slightly lower than the acceptable threshold, as in Model 1. Factor loadings between first and second-order latent factors were relatively strong and within a small range (λ = 0.63 to 0.76). The standardized correlation between the second-order hedonic and eudemonic factors was high, r = .98. Figure 5 depicts the model’s standardised results.

Fig. 5
figure 5

Model 2 – Second-order two-factor model

3.1.4 Model 3: Bifactor Model

The bifactor model showed good fit, χ2(376) = 18280.079, p < .001; RMSEA = 0.039 (90% CI [0.038, 0.039]); CFI = 0.962, TLI = 0.956, SRMR = 0.029. Additional indices for the bifactor model to represent factor strength (Reise et al., 2013) were calculated: while PUC was above the acceptable threshold of 0.70 (PUC = 0.868), ECV was below the acceptable threshold (ECV = 0.587), indicating that the general factor explained 58% of the common variance between all the items, with the remaining variance explained by specific factors.

Following the low ECV result, and to reconcile it against the contrasting high PUC, additional model-based reliability indices were also taken into consideration post-hoc. Coefficient Omega reflects the proportion of total variance that is attributable to common sources of variance (McDonald, 1999). That is, how much of the variance in observed total scores can be explained by the modelled factors (Constantinou & Fonagy, 2019; Reise et al., 2013). Meanwhile, to explain how much of the variance in observed total scores can be explained by the general factor alone, we looked at Omega hierarchical. Omega and Omega hierarchical values > 0.70 were considered acceptable (Hancock & Mueller, 2001; Rodriguez et al., 2016). Results for both were acceptable, with Omega = 0.924 and Omega H = 0.729, suggesting essential unidimensionality. Furthermore, Factor Determinacy (FD) is an indicator of model-based reliability, showing the reliability of factor scores, or, the extent to which factor scores are good estimates of individual differences on the factor (Constantinou & Fonagy, 2019). Values closer to 1 indicate better determinacy (Grice, 2001) and have been previously used in the assessment of bifactor models to consider the reliability of using factor scores (Scopel Hoffmann et al., 2022; Panayiotou et al., 2020). FD for the general factor was above the acceptable threshold of 0.90 (Rodriguez et al., 2016) at 0.950, while the FD for all group level factors was below 0.90, further indicating strength of the general factor.

Items loaded onto the general factor and their respective specific factors in expected directions, apart from A1 and A4 onto the specific Autonomy factor at − 0.04 and − 0.19 respectively. Items loaded onto the general factor between 0.40 and 0.82, with the notable exception of A4 at 0.12. This uninterpretable pattern of irregular loadings for Autonomy was further corroborated by a factor variance of 0.00 for it – Autonomy is thus a ‘vanishing factor’, a common problem occurring in bifactor modelling (Eid et al., 2017). Additionally, the factor variance for Negative Affect was close to zero, at 0.03. Vanishing factors impact the convergence of a bifactor model, this was resolved by fitting the unity (std.lv = T), which allows factor loadings to be freely estimated; fit statistics and standardised estimates of loadings are reported accordingly. Figure 6 depicts the model’s standardised results.

Fig. 6
figure 6

Model 3 – Bifactor model

Tables describing these factor loadings and covariances for the three models are also available in the Appendix.

4 Discussion

We considered three alternative structural conceptions of wellbeing to examine the possible hedonic-eudemonic distinction, and to assess the possibility of essential unidimensionality of a general wellbeing factor. We found (1) the magnitude of the correlation between the higher-order factors in Model 2 confirmed that hedonic and eudemonic factors were statistically indistinguishable and (2) additional indices reported in Model 3 indicated essential unidimensionality i.e., a general wellbeing factor was psychometrically supported.

4.1 Models 1 and 2

The fit of Model 1 suggests that there is some value in treating the sub-domains of autonomy, self-esteem, optimism, friendships, positive affect, negative affect and life satisfaction at the sub-domain level, this presents us with a practical and parsimonious measurement model to use in any further analysis. However, the uniformity of these covariances (i.e., the factors covary with one another in similar ways and within a low range) suggests that any further higher-order grouping would not necessarily represent any notable distinctions. There were no identifiable patterns across how traditionally hedonic factors related to one another as compared to traditionally eudemonic. For instance, Negative Affect (M&MF Negative Affect items) had the same relationship with Positive Affect (PANAS-C) as it did with Optimism (EPOCH) (-0.56); or Autonomy (BNFSS-A) was associated almost as strongly to Friendship (CYRM) (0.51) as it was to Positive Affect (PANAS-C) (0.53). On a more granular level, of note were the two low factor loadings of items A1 and A4 in the BNFSS Autonomy sub-scale – we did not remove these to avoid ‘cherry-picking’ items, however more work is needed on the impact of negatively-worded reverse-coded items on the impact of measuring and conceptualising mental health and wellbeing. Particularly in the context of the BNFSS-A scale used for Autonomy, the scale has shown worse evidence for unidimensionality when used as a sub-scale, in comparison to the other scales used in these models (Black et al., 2022). Model 2, the second order model, was thus constructed on the basis of testing theoretical assumptions of hedonic and eudemonic distinctions (Ryan & Deci, 2001; Ryff et al., 2021) rather than on the empirical findings of the first order model. Second, the finding of a 0.98 correlation between the hedonic and eudemonic higher order factors in Model 2 mirrors those of Disabato et al. (2016) and Strelhow et al. (2019), with the high correlation suggesting negligible discriminant validity and laying the grounds for further testing of a general wellbeing factor. This finding also adds to Kashdan et al. (2008)’s concern that while the hedonic and eudemonic distinction is useful in theory, previous findings of separate factors in empirical practice (Waterman, 2008) is often a function of the disconnect in conceptualisation and measurement; differences in choice of measures and analytic strategies may yield different results. Indeed, other studies have shown that there may be alternative ways of conceptualising dimensions of wellbeing that are better supported empirically – for instance, Vanhoutte and Nazroo (2014) find that rather than a hedonic-eudemonic distinction, a cognitive-affective distinction is more useful in analysis. The de-facto distinction that is accepted and embedded into wellbeing literature (Ryan & Deci, 2001; Waterman, 2008) thus needs to be updated, and new theoretical and empirical research is needed. Huta (2020)’s study represents a start to this by addressing multiple levels within the hedonic-eudemonic dimensions, rather than a binary divide.

4.2 Model 3

A high correlation between two higher-order factors, as seen in previous studies and confirmed by our own finding in Model 2, supported our testing of an underlying general wellbeing factor and, furthermore, of essential unidimensionality, through a bifactor model (Model 3). Considering model fit as a basis of comparison, the bifactor model does best represent the data, as expected (Eid et al., 2017). However, a comparison of model fit alone, between a correlated-factors, second-order, and bifactor model, can lead to false conclusions due to proportionality constraints in higher-order models and biased model fit for bifactor models (Black et al., 2019; Eid et al., 2017; Gignac & Kretzschmar, 2017). Goodness-of-fit, as determined by parameters, should be considered in light of the pattern of loadings and of additional indices like ECV and PUC.

First, when considering the pattern of factor loadings, we find that the strength of factor loadings for the general factor in Model 3 are relatively homogeneous, rather than dominated by particular groups of items from any of the specific factors. Additionally, there are some notably low factor variances, specifically for Autonomy and Negative Affect, which further suggest strength of the general factor. While items on the Autonomy and Negative Affect scales may then be explained more by a general wellbeing factor and less by their respective specific factor, it is also important to note that these particular scales must be used with caution due to their irregular patterns of factor loadings (see, in particular, items A1 and A4). Second, with reference to the calculation of additional bifactor indices, the high PUC, Omega, and Omega H of the model and the high FD of the general factor suggest that the sub-domains do share a common core of wellbeing. Of note, however, is the ECV result. The ECV is an additional index of dimensionality and generally, data that have a strong general factor compared to group factors will have a high ECV. However, our result of 0.58 falls below the suggested threshold of 0.70 (Reise et al., 2013). This leads us to interpret our result of essential unidimensionality, as examined through the other reported indices, with caution.

In interpreting the ECV result, it is useful to keep in mind the highly parametrised nature of bifactor structures. The moderating effect of the high PUC, in combination with ratio of items to specific factors, can be used to interpret the nature of a low ECV: the proportion of common variance explained by the general factor (ECV) is likely to be high when the PUC is high, however common variance explained by general factor is limited by the number of specific factors present. Thus, after a certain number of specific factors, ECV tends to drop (Constantinou & Fonagy, 2019). Moreover, just as caution must be taken in applying thresholds to other goodness-of-fit indices, Reise (2012) argues that no specific benchmark values for ECV can be proposed as inherently high enough to apply unidimensional models to multidimensional data. What this means is that rather than arbitrary acceptance based on a proposed cut-off (in our case 0.70), the above considerations around the number of specific factors and the high PUC-low ECV relationship suggest that the low ECV is not, in fact, reason enough for us to reject essential unidimensionality. The presence of a low factor variance for five out of six specific factors, including a near-zero and zero result, the high Omega H and PUC, and a relatively moderate ECV in light of the number of factors and items, all present evidence for the use of a general wellbeing factor in further measurement studies. Moreover, this result is in line with recent findings of the applicability of the bifactor model for measuring children’s wellbeing across international samples (Savahl et al., 2023).

4.3 Implications

The finding of essential unidimensionality and a strong general factor has significant implications for measurement practice. First, as Calderón Garrido et al. (2019) note, if item scores are essentially unidimensional but are treated as multidimensional, the main potential consequences are unnecessary theoretical complexities, and weakened factor score estimates that do not allow accurate individual measurements to be made. Second, data that are essentially unidimensional can often be characterised by a total score. This means that the use of subscales, which are a reflection of variation from minor dimensions, is often unnecessary as the strength of our general wellbeing factor reflects minimal variation. In settings such as schools, where the wellbeing of adolescents is regularly assessed, ease of measurement and scoring is valued, however the evidence-base of the measures being used in practice is often not robustly validated (Black et al., 2022). The findings of essential unidimensionality in this paper suggests that the measures used in the #BeeWell survey may be suitable for sum scoring, presenting an avenue for further validation studies using the combination of scales covered here (ONS Life Satisfaction measure, PANAS-C, M & MF; BPNFS-A, RSE, EPOCH, CYRM). The idea that there is a general wellbeing construct necessitates further considerations around how best to conceptualise it in research and a re-thinking of unnecessary distinctions, both in measurement and in theory. The use of existing measures that cover both hedonic and eudemonic components might be encouraged, for instance, the Warwick Edinburgh Mental Wellbeing Scale (WEMWBS; Tennant et al., 2007) has items that are meant to reflect a combination of hedonic and eudemonic conceptions of wellbeing, however, difference in use once again widens the gap between theory and practice here: often its short version (SWEMWBS) is used interchangeably with the longer version; SWEMWBS has a focus only on eudemonic components but is still often used as an integrative measure. In measurement, our finding of essential unidimensionality can be of particular consideration in the future construction of new, robust wellbeing measures. In theory, by accepting a more overarching conception of wellbeing, much of the minor contradictions surrounding the murky construct can be smoothened out.

Rather than a proposed recommendation of a specific wellbeing model, this paper highlights the complexities and differences across competing measurement models and thus advocates for more robust preliminary analysis in analytical studies assessing distributions or predictors of wellbeing. The similarity of the fit statistics between the first-order correlated model (Model 1) and the second-order two-factor model (Model 2) signify that these measurement models may, in certain contexts, be used interchangeably. For instance, the goals of a study may indeed dictate model preference and Model 1 would be a useful model if specific sub-domains are of substantive interest. On the other hand, while it is clear that the higher-order distinction into separate hedonic and eudemonic factors does not necessarily add any specific psychometric value to a proposed standardised modelling approach, it can still be used in further validation studies of the similarity or distinction between the higher order factors. With respect to the bifactor model (Model 3), research on other psychological constructs (e.g., cognitive abilities) has shown that the differences in second-order and bifactor models imply different theories around that structure, and therefore how that measured variable should be represented in further structural equation modelling (Mansolf & Reise, 2017). This paper extends that difference to wellbeing and so emphasises the importance of preliminary analysis of measurement models in further wellbeing studies.

4.4 Strengths, Limitations, and Avenues for Further Research

Wellbeing can be measured in different ways, dependent on the chosen conceptual perspective. This paper unpacks some of the complexities in varying conceptualisations of wellbeing, paying particular attention to the theoretical distinction often drawn between hedonic and eudemonic wellbeing. We used a large dataset (n = 37,630), within the unique context of an adolescent sample, and extended the use of a higher-order model used in previous studies (Strelhow et al., 2016; Disabato et al., 2016; Vanhoutte & Nazroo, 2014) through the additional use of a bifactor model. In doing so, considerations are raised around the use of different wellbeing measurement models and associated limitations. This study uses a bifactor model; future studies can test additional measurement models for further clarity. For instance, Black et al. (2019) suggest the use of a bifactor S-1 model to test general covariances more robustly; an S-1 model is even more useful in the presence of vanishing factors (Panayiotou et al., 2020).

There is also a need for further theoretical research in three aspects. First, deeper theorising in defining wellbeing is required, as opposed to the commonly accepted approach of defining wellbeing using the instruments used to measure it – the use of construct validation, as employed by this paper, is therefore limited by the issues of circularity (Alexandrova & Haybron, 2016; Fabian, 2022). Using Confirmatory Factor Analysis in fields where theory is still nascent, or as inconsistent as it is in wellbeing, may be premature, and methods such as Exploratory Factor Analysis may be more useful for theory generation.

Second, the field of eudemonic wellbeing in particular lacks clarity and consensus around constituent sub-domains, creating difficulties in choices of data collection and model-testing. Reflecting this, we were limited to the eudemonic measures that were available to us, and the use of different measures may yield different results. The construction of models is often constrained by the availability and quality of the data; choice of statistical modelling cannot overcome the fundamental limitations of data or the measures itself, only work around it. However, we are often beholden to these constraints in using secondary data based on existing measures.

Finally, further research is also needed into alternative ways of distinguishing between wellbeing, rather than traditional hedonic-eudemonic divide which, as is the case here, is not always borne out in empirical research. There is already some significant theoretical (Huta, 2020) and empirical (Vanhoutte & Nazroo, 2014; Joshanloo et al., 2021) work that is currently being done, including using structural and correlational designs; this paper highlights the use of bifactor modelling as an additional useful method to further study structural conceptions of wellbeing.

5 Conclusion

The aim of the current study was to unpack some of the interpretations currently used in multiple structural conceptions of wellbeing, and in doing so it made important contributions to the field. First, it provided further empirical evidence that the discriminant validity between higher-order hedonic and eudemonic wellbeing constructs is low. Second, we tested a bifactor model and found evidence of essential unidimensionality. Third, this study added to the literature on wellbeing by studying issues of measurement and modelling in the specific context of adolescence. The findings of this paper encourage the use of a first-order correlational model in some avenues (for instance, correlational study designs with structural models), acknowledging that the similar fit-statistics of Model 1 and 2 allow for context-based decisions, and further encourages the use of bifactor models and the consideration of a strong general wellbeing factor. The findings also highlight the need for further theoretical and empirical work on conceptualising wellbeing.