Abstract
Background
Adolescent subjective well-being is a topic that has gained significant focus over recent years. This focus is reflected in forming measurement tools and interventions to better understand better and improve adolescent mental health. While these steps are necessary, problems still exist regarding adolescent subjective well-being measurement. Notably, current measurement tools are limited in their content, applicability to various populations, and accessibility.
Aims
This paper examines the psychometric properties of the Survey on Flourishing (SURF) when used with an adolescent sample from the United States.
Method
A sample of 334 participants participated in the present study. We examined the reliability and validity of the SURF by examining its internal consistency, convergent validity, and discriminant validity. We also examined the factor structure of the SURF using a confirmatory factor analysis (CFA).
Results
The SURF demonstrated high internal consistency (α = 0.92), a strong positive correlation with convergent measures, and a weak negative correlation with a discriminant measure. A one-factor model best fits the observed data.
Conclusion
The SURF demonstrated good psychometric properties and addressed several shortcomings in current measures. Preliminary data suggests the SURF may be a useful and practical measure of adolescent subjective well-being.
Avoid common mistakes on your manuscript.
1 Introduction
Flourishing is defined as a state of well-being where individuals thrive in the social, emotional, and psychological areas of their lives [1,2,3]. It goes beyond just focusing on the absence of pathology; it also examines positive outcomes such as emotional regulation, supportive relationships, meaning and purpose, and life satisfaction. Historically, much research on well-being and flourishing has been conducted using adult populations rather than youth [4]. However, there has been a push to understand adolescent well-being better in recent years.
Well-being research shows that flourishing is associated with positive life outcomes and circumstances for youth and adults, including supportive social networks and relationships, positive work life, higher physical and mental health levels, and improved school performance [3, 5,6,7,8]. Given the prevalence of psychosocial stressors present during adolescence and the nature of adolescence as a critical period for social and emotional growth, it is essential to understand and measure adolescent well-being accurately to help them take steps toward flourishing.
Improving adolescents' well-being also has broader societal implications. Historically, developmental science, psychology, education, and other fields have underestimated adolescents, tending to focus on the problems they face (e.g., learning difficulties, mental illness, low motivation, substance use) rather than the strengths they possess [9, 10]. However, positive youth development research and similar research areas identify adolescents as having unique resources that they can use to contribute meaningfully to their community [9, 11]. Although working to improve adolescents' well-being and enabling them to use their strengths to contribute meaningfully to society is an important task, it may be challenging if it cannot be measured. Thus, the purpose of the present study is to examine the psychometric properties of the Survey on Flourishing (SURF), a measure of subjective well-being, using an adolescent sample from the United States.
1.1 Measuring well-being
Well-being is a broad, multifaceted construct and has historically been challenging to define. For decades, well-being was operationalized as the absence of physical or mental malfunction. However, more recent research indicates that well-being is not just the absence of problems but includes assets, strengths, values, and other positive characteristics [12]. In a classic paper on subjective well-being, Diener defined subjective well-being as a combination of positive emotion and life satisfaction [13]. Currently, definitions of subjective well-being most commonly include two components: emotional well-being, which consists of the presence of positive emotion and life satisfaction, and positive functioning, which provides for social functioning (e.g., social integration and contribution) and psychological functioning (e.g., autonomy and personal growth) [12].
These components of well-being also apply to adolescents. Researchers have identified specific developmental tasks that may indicate whether a child is doing well. These critical tasks in adolescence include academic achievement, forming close peer relationships, learning to follow rules, participating in extracurricular activities, and forming a sense of self-identity [14]. Critical to healthy adolescent development, these tasks generally align with well-being's social, emotional, and psychological components.
In addition to Diener’s definition of well-being, other models of well-being have been proposed, such as Seligman’s five-factor PERMA model, the Ryff model of psychological well-being, and Keyes’ model of social well-being [2, 15,16,17]. Several measures have been developed using these models, such as the PERMA Profiler, the EPOCH measure of adolescent well-being, the Ryff Scales of Psychological Well-being, and the Keyes Social Well-being Scale [15,16,17,18]. Recent literature reviews have identified many other measures of adolescent well-being, some notable measures including the Youth Quality of Life Instruments (Y-QOL), the Flourishing Scale, the Mental Health Continuum scales (MHC), the Warwick-Edinburgh Mental Well-being Scale (WEMWBS), the Child and Adolescent Wellness Scale (CAWS), the Social-Emotional Health Survey (SEHS), and the World Health Organization 5 well-being index (WHO-5) [19,20,21,22].
Although these measures differ somewhat in their underlying theories and content, researchers generally agree that subjective well-being is a subjectively experienced, multifaceted construct. There is substantial overlap among these measures, which suggests subjective well-being consists of social (e.g., connection, supportive relationships), psychological (purpose, achievement), and emotional (e.g., life satisfaction, positive emotion) components. Statistical approaches used to examine some of the most commonly referenced models suggest that a bi-factor approach may be effective for measuring subjective well-being [23, 24]. Specifically, after accounting for general subjective well-being or positive emotion, other components may explain additional (although relatively small) differences in people’s levels of subjective well-being. It is important to note that the literature did not provide a consensus on how many secondary factors exist or what they represent. However, the measures identified above appeared to generally examine social, psychological, and emotional well-being domains.
1.2 Limitations to current measures
Although the development of measures of adolescent subjective well-being represents significant progress, some limitations may impact their utility. First, measure content must be considered. Well-being researchers agree that subjective well-being is a broad, multifaceted construct. While three generally agreed-on domains include social, emotional, and psychological well-being, many essential domains, such as gratitude, transcendence, or mindfulness, have not yet been tapped into. Other measures of subjective well-being focus on a particular facet of well-being. Although this approach may be intentional, these measures may be too narrow to capture certain elements essential to broader adolescent subjective well-being. Similarly, Seligman suggests that no single measure can capture the breadth and depth of well-being [25]. Thus, although these measures may provide valuable information, they may be most useful when used with measures that examine alternate facets of well-being. The depth and breadth of subjective well-being suggests a need for additional measures that expand on the content of current measures.
Second, the generalizability of adolescent well-being measures depends on the sample used to examine their psychometric properties. Several measures, such as the EPOCH, the SEHS, and the Y-QOL, have thoroughly examined the measure properties within a broad U.S. sample and other populations [16, 26, 27]. Sample diversity is an increasingly important priority in the development of well-being measures. However, other measures relied on samples from outside the United States or did not collect nationally representative data when examining the psychometric properties of their measures. As it may be inappropriate to expect that a measure validated in one setting would perform equally across cultural groups, collecting a diverse sample is essential to test performance and validity. Relatedly, improvements regarding the sample characteristics are a common area of improvement identified in recent well-being literature [20]. Overall, current research suggests a continued need to develop and validate well-being measures across diverse samples.
Finally, other relatively minor limitations may affect the utility of these measures. First, some measures, such as the 100–150 item CAWS, are extensively lengthy [28]. Shorter measures may be more time-sensitive while demonstrating good psychometric properties with little item overlap. Second, accessibility determines the extent to which the measures can be used for many practical purposes. Out of these measures, the PGI is not available in the public domain, and the CAWS and WEMWBS are free to use with developer permission. The SEHS, EPOCH, MHC-SF, Ryff scales, and the WHO-5 are open access measures with developer acknowledgment [19]. Although paid measures may be effective, free measures, such as the EPOCH and the SURF, may display similar effectiveness and allow more widespread use.
In summary, although developing measures of adolescent subjective well-being is a step forward, some measures have some limitations, such as content domain, sample population, accessibility, and length, which impact their utility as adolescent subjective well-being outcome measures. Because adolescent well-being is becoming a greater priority in society, accurate measurement tools must be available to help individuals understand and improve it. In this paper, we examine the psychometric properties of the Survey on Flourishing adolescent version (SURF) as a novel measure of subjective well-being that is accessible and quick to administer. We also examine its reliability, validity, and factor structure.
1.3 The Survey on Flourishing
This study uses current research on adolescent well-being to examine the psychometric properties of the Survey on Flourishing (SURF) in an adolescent sample within the United States. The original SURF questionnaire was designed to measure subjective well-being by including items reflecting positive functioning and emotional well-being. When used in prior studies involving adolescents and adults, the SURF was shown to have good reliability and validity [29]. Thus, we expect the SURF to demonstrate similar psychometric properties and structure in the present study. Thus, this study aimed to examine the utility of the SURF by examining its internal consistency, factor structure, and convergent and discriminant validity.
Regarding the reliability of the SURF, we expected that the SURF would demonstrate good internal consistency by having a Cronbach’s alpha score (average inter-item correlation) between 0.80 and 0.90. Having a Cronbach’s alpha statistic in this range means that the test displays strong internal consistency, which is one facet of reliability.
Regarding the factor structure of the SURF, we expected the items to load onto a single general factor of adolescent subjective well-being based on a prior study using the SURF in an adult population (see Methods). This past research suggests that the SURF measures a unitary construct, which may represent the “higher level” construct described in bi-factor models.
Regarding the SURF’s validity, we expected the SURF to show a strong positive correlation (r > 0.70) with similar measures of well-being, such as the PANAS Positive Affect subscale (PANAS-Pos) and the Satisfaction With Life Scale (SWLS) while showing a weak negative correlation (r < − 0.5) with discriminant measures such as the PANAS Negative Affect Subscale (PANAS-Neg). These predictions were based on a previous study which showed that the SURF demonstrated similar psychometric properties when used with adolescents [29]. Good convergence with the PANAS Positive Affect subscale and the SWLS suggests that the SURF measures a similar construct. In contrast, a low correlation with discriminant measures would indicate that the SURF does not measure constructs differently from adolescent subjective well-being. These measures provide evidence that the SURF is measuring what it purports to measure.
We planned to examine the test–retest reliability and calculate the SURF test–retest reliability and Reliable Change Index (RCI). However, we could not conduct these analyses due to invalid second-phase data from the data collection site. We discuss this further in the discussion section.
2 Method
2.1 Participants
A total of 380 participants completed an online questionnaire using Qualtrics Online Sample from July to October 2021. Before delivering the data, Qualtrics removed 17 individuals who failed an attention check item (i.e., “Please answer ‘Strongly Disagree’ to this item”) and 11 participants who completed the study measures faster than two standard deviations from the average. To ensure consistent and reliable responses, we created a response validity scale using two matched item pairs. Eighteen participants were excluded as their average response deviation exceeded two standard deviations from the average. Therefore, 46 (12.1%) individuals who initiated the survey were removed from the analysis, resulting in a final sample size of 334 participants. The age of participants ranged from 12 to 17 years, with a mean age of 14.8. Out of the participants, 176 (52.7%) were female.
The study participants' race and ethnicity were recorded as follows: 47% identified as male, 52.7% identified as female, and 0.3% identified as transgender, nonbinary, or another gender identity. 63.2% identified as White, 14.1% identified as Hispanic or Latino, 13.8% identified as Black or African American, 4.5% identified as Asian, 0.6% identified as Native American/American Indian or Alaska Native, 0.3% identified as Native Hawaiian or Pacific Islander, and 3.4% identified as another race or multiple races, with 0.30% choosing not to respond. Regarding the participants' regions, 19.8% lived in the Northeast, 20.7% in the Midwest, 18.9% in the West, and 40.7% in the South. The sample was representative of the entire nation based on region, race/ethnicity, and sex, with the exception of adolescents who did not speak English, as the survey was conducted in English. Table 1 provides more detailed demographic information and corresponding percentages based on the US 2020 census.
2.2 Procedures
The data were collected through an online survey after the Institutional Review Board at Brigham Young University approved the study procedures. All individuals contacted were allowed to participate in the study, though participation was completely voluntary. Inclusion in the study required participants to speak English. Before beginning the study, both parent consent and child assent forms were completed on the first page of the survey. All participants completed consent/assent forms before participation in the study. Upon completion of the consent forms, the participants were either shown a study completion page if they opted out or directed to the first page of the study measures. Participants were asked to complete the measures, lasting approximately 25 min, in one sitting. Participants were also sent identical study measures 2 weeks after the initial measures were completed and were given 1 week to complete the second session. Participants were compensated for their participation by Qualtrics Online Sample after each session was completed.
2.3 Measures
2.3.1 Survey on Flourishing (SURF)
The Survey on Flourishing (SURF) measured subjective well-being (see Table 2). The SURF is a 20-item Likert scale designed to measure subjective well-being that is relatively brief, sensitive to change, and representative of the breadth of domains that contribute to human flourishing. To go beyond a generic cognitive self-assessment of hedonic well-being (happiness, satisfaction, feeling good), the SURF also taps into domains of eudaimonic well-being (engagement, growth, living well). Consequently, items were designed to assess important areas such as social connection, purpose, contribution, transcendence, and vitality, in addition to typical subjective well-being items assessing positive emotions and life satisfaction. The intention was to produce a global measure of flourishing that acknowledges this construct's multidimensional nature while being sufficiently brief and appropriately sensitive to change, facilitating its use as a research tool for evaluating well-being interventions. The SURF contains four negatively worded items reflecting the content of four positively worded items. These items provided a more robust measurement of the domains represented by those items. Including negatively worded items also may protect against some types of response bias. For all questions, respondents rate their agreement on a 7-point scale ranging from “strongly disagree” to “strongly agree.” The final score is calculated by taking the total of all items. The SURF requires approximately 5–10 min to complete.
In a study examining the psychometric properties of the SURF with adults using multiple samples (manuscript in preparation), the SURF demonstrated high internal consistency (⍺ = 0.93–0.96) [30]. It also demonstrated convergent validity by correlating significantly with other measures of subjective well-being, including the PERMA profiler (r = 0.82), the Satisfaction with Life Scale (r = 0.74), and the PANAS Positive Affect subscale (r = 0.74). It also negatively correlated with discriminant measures, such as the Negative Affect subscale of the PANAS (r = − 0.61). In a previous study involving a small sample of adolescents from a high school in the Mountain West, the SURF also demonstrated high internal consistency (⍺ = 0.94). It also demonstrated convergence with the PERMA Profiler (r = 0.79), the SWLS (r = 0.75), and the PANAS-Pos (r = 0.69) [29].
2.3.2 Positive and negative emotion schedule, short form (PANAS) [31]
The PANAS measured positive and negative affect because affective experience is essential to subjective well-being. The positive affect subscale of the PANAS was used as evidence for convergent validity, and the negative affect subscale was used for discriminant validity. The PANAS-SF contains 20 items, each of which is a positive affect word (e.g., “enthusiastic” or “inspired”) or a negative affect word (e.g., “scared” or “hostile”). Respondents then used a five-point Likert scale to report the extent to which they were currently experiencing each emotion. Previous research has estimated the test–retest reliability after 8 weeks to be 0.54 on the positive affect scale and 0.45 on the negative affect scale [31].
2.3.3 Satisfaction with Life Scale (SWLS)
The Satisfaction with Life Scale was used to measure overall life satisfaction and convergent validity. The SWLS is a five-item Likert-style scale, which sums the item responses to estimate a total subjective well-being score [32]. It is the most commonly used instrument to measure life satisfaction and research has supported its reliability and validity in many populations, including adolescents [33]. The SWLS has demonstrated good test–retest reliability both after 2 weeks (r = 0.83) and after 1 month (r = 0.84) [34, 35]. Those people expected to report low life satisfaction scores (e.g., prison inmates, women experiencing intimate partner violence, and psychiatric patients) demonstrated low scores on the SWLS [35]. The SWLS also demonstrated convergent validity; the SWLS correlated significantly with other subjective well-being measures, including the Andrews/Withey Scale (r = 0.52–0.68) and the Fordyce Global Scale (r = 0.55–0.82), as well as interviewer ratings (r = 0.43–0.66) and informant reports of well-being (r = 0.28–0.58) [32, 35,36,37].
2.4 Data analyses
The data collected was analyzed using the Stata v16.1 statistical package. Internal consistency was determined through Cronbach’s alpha (α) and Pearson bivariate correlations. To investigate the factor structure of the SURF, we compared the fit of four competing models using confirmatory factor analysis.
The primary model we examined was a one-factor model with the latent variable of “subjective well-being” predicting scores on each item. Our decision to run a one-factor model was grounded in previous research examining the performance of the original SURF in an adult population, which demonstrated that the SURF items loaded onto a single factor identified as subjective well-being. Additionally, research on similar measures of well-being found that a general factor of subjective well-being explained a large portion of the variance in users’ scores [38,39,40]. Although prior research also suggests that a bi-factor model may fit observed data well, we tested a one-factor model of subjective well-being because the SURF was not designed with discrete factors in mind. After we conducted our a priori analysis, we used modification indexes in an exploratory fashion to identify other options for improving our primary model’s fit. However, we determined that no modifications were necessary.
The second model we examined was a bi-factor model with all items loading onto a general subjective well-being factor and specific items loading onto secondary factors representing social, emotional, and psychological well-being. These secondary factors were based on conceptual definitions of subjective well-being, and the item assignments were determined by a qualitative examination of each item’s content. We included this model after examining previous literature, which suggested bi-factor models have shown effective fit with alternate measures of well-being, such as the MHC, PERMA, and the WEMWBS discussed above.
In addition to the two models listed above, we ran two other models to investigate the impact of negatively worded items on the SURF. Research suggests that having an unequal amount of negatively and positively worded items may cause an unintended, “negatively worded item” factor to emerge during a CFA due to response bias [41, 42]. The third model we examined was a two-factor model consisting of a broad, subjective well-being factor and a negatively worded item factor. This model examined whether negatively worded items resulted in a statistical artifact. This model type has been used in a similar measurement study to investigate the impact of negatively worded items on the measurement instrument [18].
The final model we examined was a one-factor model of broad, subjective well-being, similar to Model 1. However, with this model, we aimed to account for the effect of negatively worded items by allowing the error variances of the four negatively worded items to covary. We compared this model to the two-factor model to explore whether the negatively worded items in the measure comprised an independent factor or fit better in a one-factor model.
Lastly, the SURF mean scores were correlated with the PANAS positive affect subscale, PANAS negative affect subscale, and the Satisfaction with Life Scale total scores to determine the measure’s convergent and discriminant validity.
3 Results
This study aimed to assess the reliability, validity, and factor structure of SURF. After cleaning the data, we evaluated its internal consistency, convergent and discriminant validity, and factor structure.
3.1 Data preparation
Before running our principal analysis, we conducted preliminary analyses to determine whether our data met the assumptions of normality for our planned statistical tests. We first identified outliers in the mean scores for the SURF, PANAS scales, and SWLS. We defined outliers as observations outside the bounds of two standard deviation units greater or less than the median score. We then fenced outliers to these outer bounds (median plus or minus two interquartile ranges). Fencing was used, as opposed to removing outliers, to minimize the possibility of skewed results resulting from participant responses significantly beyond the sample's median value. Fencing responses also allowed us to retain participant data, which was an important factor considering this study’s relatively small sample size. Ultimately, we identified and fenced 11 observations to the lower bound of the surf total score, 10 observations to the upper bound of the PANAS Negative Affect subscale, and nine observations to the upper bound of the SWLS.
We then examined the univariate normality of the SURF, PANAS subscale, and SWLS scores. A joint chi-squared probability test for normality demonstrated that the SURF means (p > 0.01), the PANAS Negative Affect subscale mean (p > 0.01), the PANAS Positive Affect subscale (p > 0.01) mean, and SWLS mean (p > 0.01) were each not normally shaped data. However, because of the nature of these measures, we concluded that abnormally shaped data was to be expected. For example, most participants likely demonstrated low levels of negative affect. Thus, we determined that no data transformations would be necessary.
We also conducted a chi-square difference test to determine whether our sample was distinguishable from national statistics. We determined that the sample matched with overall US Census data based on race/ethnicity (χ2 (7, N = 334) = 7.99, p = 0.33), gender (χ2 (2, N = 334) = 0.10, p = 0.95), and region (χ2 (3, N = 334) = 5.84, p = 0.12).
3.2 Internal consistency
Results demonstrated that the SURF’s internal consistency was high (α = 0.92; see Table 3). The average inter-item correlation for the SURF was 0.36. These results support our expectation that the measure would have good internal consistency. Of note, the SWLS (α = 0.86), PANAS positive affect subscale (α = 0.94), and the PANAS negative affect subscale (α = 0.92) also showed high internal consistency.
3.3 Factor structure
Although Cronbach’s alpha provides evidence for internal consistency, it alone does not provide adequate information about the dimensionality or factor structure of the SURF. To obtain information about the factor structure of the test items, we conducted a confirmatory factor analysis (CFA). Previous research examining the factor structure for the original SURF questionnaire found that all test items loaded strongly onto one general factor, which was identified as subjective well-being. Because we expect that the SURF items are broad enough to allow for differences in interpretation between the general population and adolescents yet specific enough to retain good interpretability, we expected a one-factor model would demonstrate a good fit with the observed data. We compared the fit of our primary model with an alternate bi-factor model based on the current literature.
3.3.1 Model 1
Model 1 consisted of a one-factor model, with each item loading onto a latent variable representing subjective well-being (see Fig. 1). This model was identified according to the three-indicator rule [43]. According to this rule, a single-factor model is identified if it has three or more indicators and if no error terms are correlated. Our primary one-factor model of subjective well-being demonstrated adequate fit to the data (χ2 (170, N = 334) = 528.51, p < 0.001; model fit statistics can be seen in Table 4). The model’s root-mean-square error of approximation (RMSEA) = 0.08, which suggests moderate fit when considering the parsimony of the model (RMSEA values of < 0.08 indicate “acceptable” fit, while values < 0.05 indicate “good” fit) [44]. The standardized root mean squared residual (SRMR), which reports the average difference between the observed and implied covariances for the surf items, was 0.06, which suggests a moderate fit. The confirmatory fit index (CFI), which compares how well the identified model compares to a null model, was 0.87. This suggests that the identified model fits 87% better than a null model (a CFI above 0.90 is said to have adequate fit) [45]. The Bayesian Information Criterion (BIC), which can be compared to the BIC of other models to examine relative fit, was 22,106.87. These fit statistics suggest the model demonstrated adequate fit to the data. Per our a priori specifications, we examined model fit indices to examine possible changes to our model to be conducted post hoc, although we determined that no changes were theoretically supported.
3.3.2 Model 2
The second model we examined was a bi-factor model of subjective well-being. In this model, all items loaded onto a broad factor of well-being and one of three factors representing social, psychological, and emotional well-being (see Fig. 2). These factors were determined based on previous literature regarding the factor structure of subjective well-being [38,39,40]. It is important to note that upon running this model, we identified that item 13 had a negative residual variance. To obtain model convergence, we set that item’s error variance term to zero and ran the model. This model was theoretically identified using the t-rule (the number of observed values in the covariance matrix exceeded the number of estimated parameters). The bi-factor model demonstrated adequate model fit, with χ2 (151, N = 334) = 433.23, p > 0.00; RMSEA = 0.08; SRMR = 0.06; CFI = 0.90; BIC = 22,122.00. These data suggest that this model demonstrated a similar fit to Model 1.
3.3.3 Model 3
The third model we examined was a two-factor model of subjective well-being, with one factor representing subjective well-being and a second representing negatively worded items (see Fig. 3). We ran this model to explore whether negatively worded items may have created an artifact in the data. Because the SURF consisted of an unequal number of positively and negatively worded items, we suspected this might cause some item variance due to response bias instead of true scores. This model was also identified according to the three-indicator rule [43]. According to this rule, a two-factor model is identified if each latent variable has three or more indicators, no error terms are correlated, and each indicator loads onto only one factor. Model 4 demonstrated good fit to the data, with χ2 (169, N = 334) = 326.88, p > 0.00; RMSEA = 0.05; SRMR = 0.05; CFI = 0.94; BIC = 21,911.05. This model demonstrated a good fit for the data and accounted for approximately 98% of the variance in SURF total scores.
3.3.4 Model 4
Lastly, Model 4 was also a one-factor model of subjective well-being similar to Model 1, although we correlated the error variance terms between the negatively worded items (see Fig. 4). This model was theoretically identified by using the t-rule. Model 4 demonstrated good fit to the data, with χ2 (164, N = 334) = 307.09, p < 0.001; RMSEA = 0.05; SRMR = 0.04; CFI = 0.95; BIC = 21,920.317. This model demonstrated a good fit to the data and accounted for approximately 92% of the variance in SURF total scores. BIC comparisons suggest that this model demonstrated the best fit relative to the other models. Item factor loadings and variance explained by each item can be viewed in Table 5.
To summarize, our analysis revealed that both the primary one-factor model and the bi-factor model showed satisfactory fit. Additionally, the two-factor model, which included the latent variables of subjective well-being and negatively worded items, as well as the modified one-factor model, demonstrated a good fit as well. After assessing all the models, we concluded that the modified one-factor model (Model 4) had the best fit with the data. We will discuss these findings in more detail below.
3.4 Convergent and discriminant validity
To examine the validity of the SURF, we correlated the total scores of the PANAS subscales and the SWLS with the SURF to estimate convergent and discriminant validity. The SURF total scores demonstrated a significant positive correlation with the SWLS (r = 0.70, 95% CI [0.64, 0.75], p < 0.001) and the PANAS positive affect subscale (r = 0.61, 95% CI [0.54, 0.67], p < 0.001). SURF total scores also demonstrated a significant weak negative correlation with the PANAS negative affect subscale, a measure of impaired subjective well-being (r = − 0.20, 95% CI [− 0.30, − 0.09], p < 0.001). All convergent validity correlations can be found in Table 3.
4 Discussion
4.1 SURF psychometric properties and structure
This study aimed to examine the reliability, factor structure, and validity of the SURF, a measure of subjective well-being, in an adolescent population. Results from this study provided evidence for good internal consistency and convergent/discriminant validity. Regarding factor structure, our primary one-factor model performed similarly to the alternate bi-factor model. The two-factor and modified one-factor models showed a good fit and suggested that accounting for bias resulting from negatively worded items resulted in the models’ good fit. We ultimately suggest that Model 4 demonstrates the best fit while also balancing parsimony.
Our comparison of the one-factor model (Model 1) to the bi-factor model (Model 2) showed that they fit the observed data similarly. Additionally, the specific latent variables included in the bi-factor model accounted for a very small portion of the variance in the scores after extracting the variance accounted for by the general subjective well-being factor. Interestingly, this finding supports prior research examining other bi-factor well-being models where the secondary factors are weak relative to the primary factor. This suggests that the second-level factors in Model 2 provided little utility beyond what the general factor could account for.
We also examined two additional CFA models in an exploratory manner to better understand the impact of item bias on model fit. Regarding the two-factor model, we expected that bias in the participant’s response patterns may result in a negatively worded item factor. After examining this model’s performance, we expect this would likely be the case. However, we also recognize that a second possibility as to why this model displayed a good fit resides in the content of the negatively worded item factor. The negatively worded item factor may represent a substantive construct such as depression or low mood. Items in this factor appear to reflect the absence of positive emotion, supportive relationships, and meaning. However, these negatively worded items were created as counterparts to positively worded items assessing the same content to provide a more robust measure of those item domains. Thus, we would expect that the positively and negatively worded items would load onto the same factor without response bias. Hence, we conclude that the most likely explanation for the good fit of this model is the presence of response bias, although the content of these items may have also impacted model fit.
Model 4 comprised a single factor representing subjective well-being, which allowed the negatively worded items to covariate and demonstrate a good fit. Regarding the model fit, after accounting for the error introduced through item response bias, the negatively worded items still loaded significantly onto the broad, subjective well-being factor. This suggests that the SURF items represent a unitary construct. In comparison to the two-factor model, we determined that this model demonstrates a similar fit, but it also maximizes parsimony. We determined that this model demonstrated the best fit with the data. These results also converge with prior research supporting a one-factor model.
Overall, the results from our examination of the SURF’s structure suggest that a one-factor model best fits the data and that the SURF items reflect a unitary construct representing subjective well-being. While the two-factor and modified one-factor models demonstrated similar fit, we expect these fit well because they accounted for the presence of item response bias. We retained the modified one-factor model because it appears to be the most parsimonious option while still demonstrating a good fit with the data (see Table 4).
These results have important implications when considering its interpretability and relationship with other measures. Because the factor analysis results suggest that the SURF measures a unitary construct, it allows us to compare its scores to other broad measures of subjective well-being, such as the SWLS and the PANAS subscales. Based on the results, the SURF does correlate with these measures in the expected direction. However, the unitary nature of the SURF also minimizes our ability to compare SURF scores to measures with more specific subscales, such as the EPOCH subscales. Although the SURF was developed incorporating various domains of well-being, these results suggest the SURF should be interpreted as a unitary construct, and efforts to improve the convergent and discriminant validity of the SURF should aim to include other broad-based measures of well-being, or at least measures which include a total score representing an overarching construct. Some possibilities include the YOQ-30.2, the Flourishing Scale, the EPOCH, or the Y-QOL.
4.2 Contributions to adolescent well-being measurement
As we describe how the SURF contributes to this body of literature, we first wish to highlight again Seligman’s statement that due to the breadth of well-being, no single measure can fully capture it. The SURF is one of many measures of subjective well-being currently in use. Although there are many quality measures of adolescent well-being, we believe the SURF has a place among these measures as a helpful tool to understand youths’ well-being better. The SURF displays several characteristics that may make it a valuable resource in youth subjective well-being measurement.
First, the SURF represents an additional measure of adolescent subjective well-being accessible, it is quick to administer, and users would have open access to the measure with acknowledgment. Many current measures of adolescent subjective well-being need to be improved in one or more of these areas, making them inappropriate for widespread use. The SURF provides another valuable alternative to these measures, broadening the pool of accessible instruments which clinicians, researchers, or others can use to measure subjective well-being as it fits their purposes.
Similarly, the SURF items contain content that has yet to be included in many adolescent subjective well-being measurements. This includes items on mindfulness, transcendence, and gratitude, essential to well-being. The items examining these constructs performed well within the factor analysis and accounted for an amount of variance in total score similar to other items typically seen in subjective well-being measures (see Table 5). These items’ showed strong factor loadings suggest that they fit into the intended construct (i.e., well-being), and the amount of variance explained by these items suggests that they contribute to the overall SURF score. Because it is difficult to capture the breadth of this construct with a single measure, it may be necessary to use multiple measures to understand adolescent subjective well-being best. The SURF can overlap with other measures to understand adolescents' subjective well-being better, especially given the inclusion of items that examine unique relevant to well-being that are not often included in well-being instruments. Overall, the accessibility of the SURF combined with the measured domains provides an effective and research-supported measure that can help researchers and practitioners better understand adolescent well-being and how to improve it.
The SURF also provides a reliable measurement tool whose psychometric properties have been examined using a broad sample within the United States. Because of the difficulty in collecting data on adolescent participants, many researchers use convenience sampling methods or existing infrastructure (e.g., school systems) to provide them with participants. Although data may be more easily gathered through these methods, often, this may limit the generalizability of the findings. Collection of census-matched data in the SURF’s early development provides promising data regarding the measure’s utility. However, additional data with a greater sample size must be collected to accurately assess the national response patterns to SURF items and maximize generalizability.
5 Conclusion
Overall, this study contributes to the current literature by providing a reliable and valid measure of subjective well-being. This study also had several strengths worth noting. First, this study employed a sample which matched 202 US Census data, which adds to the generalizability and utility of this measure. Although this is important, it must be tempered by considering our small sample size. Although we met Clarke and Watson’s suggestion that researchers collect at least 10 responses per scale item, with an ideal ratio of 15:1 or 20:1, a larger sample size will provide greater understanding of the SURF’s performance as described in this study, and provide greater understanding of how it will perform among various populations [31, 32]. A second strength of our study relates to our adherence to recommendations suggesting transparency in research methods and planned analyses [46, 47]. Unfortunately, we did not pre-register the analyses for this study. However, we specified a priori which statistical procedures we planned on running before conducting any analyses given our research questions.
Additionally, we specified which exploratory analyses were performed after running our primary analyses. Having a data-analysis plan reduces bias, which may result from questionable research practices, which researchers have shown to be extremely common among social scientists [48, 49]. Thus, our commitment to adhering to our data-analysis plan and transparently reporting the results helps increase replicability and gives evidence for the robustness of our findings.
5.1 Limitations
Although our study has notable strengths, it is also important to recognize limitations that may have impacted our results. Many of these limitations relate to our decision to use Qualtrics Online Sample to distribute our study and collect the data. First, although working with online panels allowed us to collect a sample matched with national proportions, some research regarding Amazon’s Mechanical Turk (MTurk), a similar online data collection site, suggests that response quality from online data collection agencies may often be low. In addition to data screening methods introduced by Qualtrics, we also introduced safeguards (e.g., validity metrics) to ensure high response quality [50].
Second, some research suggests that online data collection sites may saturate the responses with participants who are not representative of the intended sample despite responses to the demographic questions [50]. Although we did not detect any abnormalities that might suggest participants were not from the intended population, this still represents a risk and is a limitation of our study.
Third, because the online panel oversaw the distribution of the study measures, the study's second administration was distributed to participants at an incorrect time, not following our outlined methods (see Methods section). This ultimately invalidated the retest responses and restricted us from analyzing essential results such as test–retest data.
5.2 Future directions
This study highlights several future directions to improve the psychometric properties of the SURF. First, we recommend collecting additional data samples to replicate these results. Although this study provides a first step, the robustness of these findings would be increased if other adolescent samples displayed similar results. Future studies may also consider collecting samples less influenced by bias and of higher quality.
In addition to replication, several things will extend our understanding of the SURF’s psychometric properties. First, prioritizing the collection of data will provide additional evidence for the reliability and stability of SURF scores over time. Future studies might also include more convergent and discriminant validity measures, possibly including objective measures, such as those that examine the physical environment, access to resources, and physiology. Due to limitations regarding funding, and our desire to limit the length of the questionnaire to maximize participant completion rates, measures such as the EPOCH or the Flourishing Scale were not possible for this study. Thus, using other similar and dissimilar measures would provide greater evidence that the SURF measures what it purports to measure.
Regarding the sample, the current study provides good initial data regarding the psychometrics of the SURF. However, gathering data on the SURF from a larger sample would provide further information regarding the robustness of the measure and provide greater support for broad use of the SURF. As more data are collected from various demographics (i.e., age ranges, gender, race/ethnicity, etc.) conducting invariant analyses may help clarify whether the SURF performs as expected across various populations.
Lastly, we recommend that researchers include the SURF in intervention studies to improve adolescents' subjective well-being. One goal we had when developing the SURF was that the measure would provide a way for researchers to track changes in adolescents' subjective well-being over time and in response to intervention. By including the SURF as an outcome measure in future research, researchers may calculate the SURF's reliable change index (RCI) and determine whether this measure is an appropriate tool for that use. Addressing these issues will help further establish the SURF as a valid and reliable measure of adolescent subjective well-being.
Data availability
The authors confirm that all data generated or analyzed during this study are included in this published article.
Code availability
The authors confirm that the custom code used during this study is included in this published article.
References
Dodge R, Daly AP, Huyton J, Sanders LD. The challenge of defining wellbeing. Int J Wellbeing. 2012;2:222–35.
Seligman ME. Flourish: a visionary new understanding of happiness and well-being. Policy. 2011;27:60–1.
Witten H, Savahl S, Adams S. Adolescent flourishing: a systematic review. Cogent Psychol. 2019;6:1640341.
Moore KA, Keyes CLM. A brief history of well-being in children and adults. In: Well-Posit Dev. Life Course. Mahwah: Lawrence Erlbaum Associates Publishers; 2003. p. 1–11.
Diener E, Biswas-Diener R. Happiness: unlocking the mysteries of psychological wealth. Hoboken: Wiley; 2011.
Diener E, Ryan K. Subjective well-being: a general overview. South Afr J Psychol. 2009;39:391–406.
Waigel NC, Lemos VN. A systematic review of adolescent flourishing. Eur J Psychol. 2023;19:79.
Kim T, Jang C-Y, Kim M. Socioecological predictors on psychological flourishing in the US adolescence. Int J Environ Res Public Health. 2020;17:7917.
Damon W. What is positive youth development? Ann Am Acad Pol Soc Sci. 2004;591:13–24.
Lerner JV, Phelps E, Forman Y, Bowers EP. Positive youth development. In: Handb. Adolesc. Psychol. Individ. Bases Adolesc. Dev. Vol 1 3rd Ed. John Wiley & Sons Inc, Hoboken, NJ, US, 2009, pp 524–558
Shek DT, Dou D, Zhu X, Chai W. Positive youth development: current perspectives. Adolesc Health Med Ther. 2019;10:131–41.
Magyar JL, Keyes CLM. Defining, measuring, and applying subjective well-being. In: Posit. Psychol. Assess. Handb. Models Meas. 2nd Ed. American Psychological Association, Washington, DC, US, 2019, pp 389–415
Diener E. Subjective well-being. Psychol Bull. 1984;95:542–75.
Masten AS, Coatsworth JD. The development of competence in favorable and unfavorable environments: lessons from research on successful children. Am Psychol. 1998;53:205.
Butler J, Kern ML. The PERMA-Profiler: a brief multidimensional measure of flourishing. Int J Wellbeing. 2016. https://doi.org/10.5502/ijw.v6i3.526.
Kern ML, Benson L, Steinberg EA, Steinberg L. The EPOCH measure of adolescent well-being. Psychol Assess. 2016;28:586–97.
Fernandes HM, Vasconcelos-Raposo J, Teixeira CM. Preliminary analysis of the psychometric properties of Ryff’s scales of psychological well-being in Portuguese adolescents. Span J Psychol. 2010;13:1032–43.
Ryff CD, Keyes CL. The structure of psychological well-being revisited. J Pers Soc Psychol. 1995;69:719–27.
Rose T, Joe S, Williams A, Harris R, Betz G, Stewart-Brown S. Measuring mental wellbeing among adolescents: a systematic review of instruments. J Child Fam Stud. 2017;26:2349–62.
Orth Z, Moosajee F, Van Wyk B. Measuring mental wellness of adolescents: a systematic review of instruments. Front Psychol. 2022;13: 835601.
Bentley N, Hartley S, Bucci S. Systematic review of self-report measures of general mental health and wellbeing in adolescent mental health. Clin Child Fam Psychol Rev. 2019;22:225–52.
Diener E, Wirtz D, Tov W, Kim-Prieto C, Choi D, Oishi S, Biswas-Diener R. New well-being measures: short scales to assess flourishing and positive and negative feelings. Soc Indic Res. 2010;97:143–56.
Jovanović V. A bifactor model of subjective well-being: a re-examination of the structure of subjective well-being. Personal Individ Differ. 2015;87:45–9.
Savahl S, Casas F, Adams S. Considering a bifactor model of children’s subjective well-being using a multinational sample. Child Indic Res. 2023;16:2253–78.
Seligman M. PERMA and the building blocks of well-being. J Posit Psychol. 2018;13:333–5.
You S, Furlong M, Felix E, O’Malley M. Validation of the Social and Emotional Health Survey for five sociocultural groups: multigroup invariance and latent mean analyses. Psychol Sch. 2015;52:349–62.
You S, Furlong MJ, Dowdy E, Renshaw TL, Smith DC, O’Malley MD. Further validation of the social and emotional health survey for high school students. Appl Res Qual Life. 2014;9:997–1015.
Copeland EP, Nelson RB, Traughber MC. Wellness dimensions relate to happiness in children and adolescents. Adv Sch Ment Health Promot. 2010;3:25–37.
Linford L, Bekker J, Ameen J, Warren J. Implementation of a positive psychology curriculum in a high school setting: a mixed methods pilot study. J Posit Sch Psychol. 2022;6:25–37.
Warren JS, Linford L, Salazar GC, Jackman K. The survey on flourishing (SURF): development of a comprehensive measure of well-being. 2024
Watson D, Clark LA, Tellegen A. Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol. 1988;54:1063.
Diener E, Emmons RA, Larsen RJ, Griffin S. The satisfaction with life scale. J Pers Assess. 1985;49:71–5.
Neto F. The satisfaction with life scale: psychometrics properties in an adolescent sample. J Youth Adolesc. 1993;22:125–34.
Alfonso VC, Allison DB, Rader DE, Gorman BS. The extended satisfaction with life scale: development and psychometric properties. Soc Indic Res. 1996;38:275–301.
Pavot W, Diener E, Colvin CR, Sandvik E. Further validation of the satisfaction with life scale: evidence for the cross-method convergence of well-being measures. J Pers Assess. 1991;57:149–61.
Larsen RJ, Diener E, Emmons RA. An evaluation of subjective well-being measures. Soc Indic Res. 1985;17:1–17.
Diener E, Colvin CR, Pavot WG, Allman A. The psychic costs of intense positive affect. J Pers Soc Psychol. 1991;61:492–503.
Reinhardt M, Horváth Z, Morgan A, Kökönyei G. Well-being profiles in adolescence: psychometric properties and latent profile analysis of the mental health continuum model – a methodological study. Health Qual Life Outcomes. 2020;18:95.
Bartholomaeus JD, Iasiello MP, Jarden A, Burke KJ, van Agteren J. Evaluating the psychometric properties of the PERMA profiler. J Well- Assess. 2020;4:163–80.
Shannon S, Breslin G, Prentice G, Leavey G. Testing the factor structure of the Warwick-Edinburgh Mental Well-Being Scale in adolescents: a bi-factor modelling methodology. Psychiatry Res. 2020;293: 113393.
DiStefano C, Motl RW. Further investigating method effects associated with negatively worded items on self-report surveys. Struct Equ Model Multidiscip J. 2006;13:440–64.
Merritt SM. The two-factor solution to Allen and Meyer’s (1990) affective commitment scale: effects of negatively worded items. J Bus Psychol. 2012;27:421–36.
Davis WR. The FC1 rule of identification for confirmatory factor analysis: a general sufficient condition. Sociol Methods Res. 1993;21:403–37.
Whittaker TA. A beginner’s guide to structural equation modeling (3rd ed.). Struct Equ Model Multidiscip J. 2011;18:694–701.
Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model Multidiscip J. 1999;6:1–55.
Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci. 2018;115:2600–6.
Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv Methods Pract Psychol Sci. 2018;1:337–56.
John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012;23:524–32.
Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66.
Kees J, Berry C, Burton S, Sheehan K. An analysis of data quality: professional panels, student subject pools, and Amazon’s mechanical Turk. J Advert. 2017;46:141–55.
Acknowledgements
We would like to acknowledge the Brigham Young University Department of Psychology for providing us with funding for recruitment purposes. We would also like to acknowledge Dr. Sam Hardy and Dr. Gary Burlingame for the assistance they provided in the development of this research project.
Funding
The Brigham Young University Department of Psychology provided funding for this project.
Author information
Authors and Affiliations
Contributions
All authors contributed substantially to the conception and design of the work, analysis, interpretation of data, manuscript preparation, and revision. All authors approved this work for publication.
Corresponding author
Ethics declarations
Institutional approval
The BYU IRB approved this study (approval #2021-185), and all procedures followed established ethical standards.
Informed consent
Parents/legal guardians provided informed consent, and all youth provided informed assent.
Competing interests
The authors have no competing financial or non-financial interests directly or indirectly related to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Salazar, G.C., Warren, J.S. Preliminary development of the Survey on Flourishing: measuring subjective well-being in an adolescent sample. Discov Psychol 4, 85 (2024). https://doi.org/10.1007/s44202-024-00190-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44202-024-00190-x