Introduction

The COVID-19 pandemic has not only created threats to physical health, but it has also had a negative impact on the mental health of the population [1, 2]. The effects of COVID-19 on mental health and well-being are profound and long-lasting [3, 4], extending beyond individuals who have been directly affected by the disease [5]. The COVID-19 pandemic has provoked similar reactions in terms of emotions and concerns at the population level in different countries around the world [6]. Recent studies during the pandemic have reported a decrease in well-being compared to pre-pandemic level, also correlating negatively with the presence of symptoms of anxiety and depression [7]. However, after the peak periods of diagnosed cases and deaths during the first waves of the pandemic, an increase in well-being is observed as a consequence of decreased symptoms of anxiety and depression [7]. Thus, the lower number of deaths during the second wave compared to the first wave of the pandemic and the flexibility of prevention behaviors seem to support the hypothesis that subjective well-being varies as a function of the intensity of the COVID-19 pandemic and associated social constraints [8].

In this context, well-being is an important dimension of perceived quality of life, which can be used as an outcome measure in different populations or as an indicator of the effectiveness of different treatment conditions [9]. Thus, findings on well-being are useful for improving mental health services [10] and guiding governmental decisions in health [11]. Therefore, there is an urgent need to assess changes in well-being on a multinational scale during the COVID-19 pandemic [12]. In this sense, in order to assess possible differences between various cultural groups, culturally valid measurement scales must be available. To do this, scales must be examined with different samples to determine which aspects have universal utility and which are applicable only to certain groups [13]. Without such assessments, it is not possible to be certain of the applicability of the results of cross-cultural studies [14, 15].

One of the most widely used scales to measure well-being in clinical and non-clinical studies is the WHO-5 well-being index (WHO-5) [16], which has been translated into more than 30 languages worldwide [17, 18]. There is no single definition of well-being, since it can be interpreted according to the sociocultural context in which an individual operates [19]. Faced with this, for several years, it has been suggested to develop brief scales that globally assess subjective well-being in a single dimension [20]. The WHO-5 evaluates well-being, understood as the degree of well-being experienced by each person according to a subjective evaluation of their life, which includes a set of cognitive judgments and affective reactions, according to previous experiences, the current state of life and the expectations [21]. The WHO-5 is a brief (less than 1 min), generic rating scale that measures subjective well-being over a 2 weeks period [9] and was developed in response to the need to have a measure that reflects a single dimension with high clinical validity [17]. The WHO-5 is derived from the 10-item version (WHO-10) that included positively worded items to measure subjective well-being and negatively worded items to measure distress [18]. For the case of the WHO-5, only positively worded items were considered that are in accordance with the World Health Organization (WHO) definition of good health, which considers positive well-being as a reflection of mental health [22].

Psychometric evidence for the WHO-5 has been evaluated in different countries, settings, and populations [9, 17, 22,23,24,25,26,27], resulting in a robust measure of subjective well-being. This has led the WHO-5 to be used in different studies measuring the level of subjective well-being and its relationships with numerous psychological and social variables during the current COVID-19 pandemic [7, 28,29,30]. However, no studies have been reported using the WHO-5 in multiple Latin American countries during the COVID-19 pandemic. Moreover, a recent study that evaluated the validity evidence of the WHO-5 in 35 countries did not include Latin American countries [9]. This highlights the need to evaluate the usefulness of the WHO-5 as a cross-cultural measure of subjective well-being during the COVID-19 pandemic in the Latin American context.

The comparability of the WHO-5 across countries is an important issue, as different cross-national studies use the scale to assess and compare subjective well-being across countries [9]. However, measurement invariance (MI) is a prerequisite for conducting comparative studies [31]. The absence of MI would not allow for certainty that the presence (or absence) of differences in a construct between different groups can be attributed to real differences in the construct rather than caused by differences in psychometric characteristics of the measurement instrument [32]. Specifically, the absence of MI could be caused by different meanings or understandings of the construct between groups, differences in the degree of social desirability or social norms, different reference points when making self-statements, different responses to extreme items, the presence of items more applicable to one group than another, or the incorrect translation of one or more items [33]. Thus, establishing MI for measures is a growing need [34].

Despite the importance of MI in cross-cultural studies [35], there are still few instruments that assess aspects of mental health and have evidence of cross-cultural MI. Thus, while the WHO-5 was used to compare subjective well-being in 31 European countries before the pandemic [36] and during the pandemic in England, Ireland, New Zealand, and Australia [37], no evidence of MI between countries has been reported in these studies. Even a systematic literature review regarding the WHO-5 has not reported outcome information about MI [17]. Only a recent study [9], but with data collected in 2015, reported the presence of metric invariance but not scalar invariance among 35 European countries. Furthermore, based on Item Response Theory (IRT) models, low levels of differential WHO-5 functioning were observed at medium levels, increasing at more extreme levels.

Cross-country MI and the assessment of certain item characteristics are some of the important psychometric issues that remain unclear about the WHO-5 [38]. Therefore, this study examined the MI of the WHO-5 in samples from 12 different Latin American countries. Additionally, the characteristics and performances of the WHO-5 items were evaluated based on IRT. The use of Classical Test Theory (CTT) models allows for confirming previous psychometric results of the WHO-5, while IRT results improve the understanding of its psychometric properties, since IRT provides information about the difficulty and discrimination capacity of the WHO-5 items, as well as the identification of the items that are the most accurate to measure subjective well-being.

Method

Participants

For this study, 5183 individuals from twelve countries (Argentina, Bolivia, Chile, Colombia, Cuba, Ecuador, El Salvador, Guatemala, Mexico, Paraguay, Peru and Uruguay) selected by convenience sampling, with diverse occupational backgrounds, participated. The inclusion criteria to be part of the study were: (1) be of legal age, according to the legislation of each participating country; (2) be a resident of one of the 12 participating countries and; (3) give informed consent. Regarding the inclusion of the 12 countries in the study, a systematic selection was not made, since the participation of as many countries as possible was sought. The inclusion of the countries was the result of a negotiation process based on the potential interest of the researchers from each country in participating and the possibility of meeting the research requirements. The Soper software [39] was used to calculate the minimum number of participants in each country. For this, we considered the number of observed variables (5 items of the WHO-5), the number of latent variables of the model to be evaluated (subjective wellbeing), the anticipated effect size (λ = 0.3), the probability (α = 0.05) and statistical power (1 − β = 0.95). The software recommended a minimum sample size of 100 participants in each country. The average sample size in each country was 432 and ranged from 252 (Bolivia) to 877 (Paraguay). Furthermore, the sample size in each country far exceeded the recommended 5:1 ratio of number of participants to number of items [40].

Only 1509 men (29.11%) participated and the mean age was 33.52 years old (SD = 12.90 years). Participants from Argentina and Guatemala had the highest mean age, while participants living in Cuba and Ecuador had the lowest mean age. Table 1 provides country-specific demographic information.

Table 1 Sociodemographic characteristics of participants in the Americas

Instruments

WHO-5 well-being index (WHO-5) [16]. The WHO-5 is a five-item, self-administered measure that assesses general subjective well-being over the past 2 weeks. The Spanish version was used [21]: (1) “I have felt cheerful and in good spirits” [“Me he sentido alegre y de buen ánimo”]; (2) "I have felt calm and relaxed" [“Me he sentido tranquilo(a) y relajado(a)”]; (3) “I have felt active and energetic” [“Me he sentido activo(a) y con energía”]; (4) "I have woken up feeling well and rested" [“Me he levantado sintiéndome bien y descansado(a)”]; (5) “My daily life has had interesting things for me” [“Mi vida diaria ha tenido cosas interesantes para mí”]. People answer the five positively worded items of the WHO-5 on a four-alternative Likert-type scale, from "0 = never" to "3 = always". Thus, the total score ranges from 0 to 15, with higher scores indicating greater subjective well-being.

Procedure

The study followed all the guidelines for the communication of results of online questionnaires and surveys (CHERRIES) [41] in its adaptation to Spanish [42]. In the 12 countries, data was collected through an online survey, administered using Google Forms© during February 15 through March 25, 2021. The online survey was disseminated via social media and online communication channels, such as Facebook, Instagram, WhatsApp and email. Likewise, the online survey began with a section explaining the objective of the study and the request for informed consent. The study ensured the confidentiality of the participants' information and allowed participants to stop answering the questions at any time.

The evaluations and procedures performed in the study were reviewed by the Institutional Committee for the Protection of Human Subjects in Research (CIPSHI) of the University of Puerto Rico (No. 2223-006), which approved the research protocol to ensure confidentiality of the data, sampling and informed consent. All methods were performed in accordance with the relevant guidelines and regulations. All subjects participated anonymously and voluntarily. In addition, they gave their informed consent online at the beginning of the survey.

Data analysis

A Confirmatory Factor Analysis (CFA) was performed using the Weighted Least Squares Diagonally Weighted Mean and Variance Corrected Mean (WLSMV) estimator due to the ordinal nature of the items [43]. Model fit was assessed based on the chi-square test (χ2), RMSEA index, SRMR index, CFI and TLI. Regarding the RMSEA and SRMR, values lower than 0.05 indicate an excellent fit; whereas, values between 0.05 and 0.08 express an acceptable fit [44]. Likewise, values greater than 0.95 in the CFI and TLI indices indicate a good fit; while values greater than 0.90 express an acceptable fit [45]. Internal consistency reliability was estimated by calculating Cronbach’s alpha and omega coefficients for categorical variables [46]. Values above 0.70 indicate adequate reliability [47].

The evaluation of MI between countries was carried out based on Multigroup Confirmatory Factor Analysis (MGCFA). The MGCFA consists of a sequence of hierarchical variance models, ranging from configurational invariance, metric invariance, where equality of factor loadings is assumed, scalar invariance, where factor loadings and thresholds are equal, and strict invariance, where in addition to equality of factor loadings and thresholds, equality of residuals is also assumed. The comparison of the different sequences of models was performed with the variation of the chi-square statistic (Δχ2), whose non-significant values (p > 0.05) suggest the presence of MI between groups. Likewise, a modeling strategy was used, based on the variations of the CFI index (ΔCFI). A difference of less than < 0.010 would indicate the presence of model MI between different groups [46]. Finally, the variation of RMSEA values (ΔRMSEA) was used, where a difference less than < 0.015 is indicative of MI of the model between groups [48]. Once the MI was tested, composite scores were calculated from the sum of the scale items with the objective of assessing differences in subjective well-being between countries. The magnitude of the differences between countries was calculated using Cohen's d test.

Item and test analysis based on IRT was performed with a 2-parameter Graded Response Model (GRM) [49] (2-PLM) specific for ordinal and polytomous items [50]. Discrimination (a) and difficulty (b) parameters were estimated. The a parameter evaluates the slope at which item responses vary as a function of the level of the latent trait; whereas, the b parameters evaluate the amount of the latent trait necessary for the item to be responded to. Because the WHO-5 has four response options, there are three b-parameter estimates, i.e., one estimate per threshold. The threshold estimates identify the level of the latent trait at which a person has a 50% chance of scoring equal to or greater than a specific response option. Finally, item information curves (IIC) and test information curves (TIC) were calculated.

Statistical analyses were performed in the RStudio environment for R. Specifically for the CFA, the “lavaan” package was used [51], the MI was performed with the “semTools” package [52] and the “ltm” package was used for the GRM [53].

Results

Validity based on internal structure and reliability

Table 2 presents the descriptive statistics of the WHO-5 items (mean, standard deviation, skewness and kurtosis) and the polychoric correlation matrix for each of the countries. Table 3 shows that the unidimensional model of the WHO-5 presents adequate fit indices in all countries, especially in Guatemala (RMSEA = 0.000 [0.000–0.067]; CFI = 1.00; TLI = 1.00) and Mexico (RMSEA = 0.075 [0.027–0.125]; CFI = 0.95; TLI = 0.99). In addition, all items have high factor loadings in all countries.

Table 2 Descriptive analysis of the items and polychoric correlation matrix
Table 3 Fit indices, factorial weights and reliability of the unidimensional model in American countries

Based on the CFA results, reliability was estimated for each model in each of the countries. Table 3 reports adequate reliability indices for the WHO-5 unidimensional model in each of the countries evaluated (α ≥ 0.94; ω ≥ 0.77).

Factorial invariance by country

Table 4 presents the sequences of invariance models proposed for each of the countries participating in the study. It was found that the factor structure of the WHO-5 shows evidence of metric invariance (ΔCFI = 0.01), scalar invariance (ΔCFI = − 0.01) and strict invariance (ΔCFI = 0.00).

Table 4 Unidimensional model fit indices and invariance models by country

Additionally, Fig. 1 demonstrates graphically the WHO-5 scores in each country. Most of the differences were of moderate and small size. Among the most important results, it was found that Chile presents lower subjective wellbeing scores than Guatemala (d = − 0.77), Colombia (d = − 0.60) and El Salvador (d = − 0.53).

Fig. 1
figure 1

Comparison of scale scores by country

Item response theory model: graded response model (GRM)

The results of the CFA provide evidence of two important assumptions for IRT, namely the presence of unidimensionality and, consequently, of local independence. In this sense, a 2-PLM GRM was used for polytomous and ordinal items. It was found that all items present parameters greater than 1, which is considered a good discrimination (Zickar et al. 2002). Also, all b parameters increased monotonically. Therefore, a greater presence of the latent trait is necessary for people to respond to the higher response options. All these results are observed in Table 5.

Table 5 Discrimination and difficulty parameters for the scale items

Finally, Fig. 2 shows the IIC and TIC. The IIC indicates that items 3, 2 and 1 are the most accurate in assessing subjective well-being; whereas, the TIC indicates that the WHO-5 as a whole is more reliable in the range of the scale between − 1 and 1.5.

Fig. 2
figure 2

Item and test information curves for the scale

Discussion

The COVID-19 pandemic has generated public health problems, economic, political and social crises that affect Latin American countries; in addition to having a significant impact on the mental health of the population. For example, it is estimated that, in Latin America, about 231 million people lived in poverty by the end of 2020 [54], in addition to there being a high number of people with severe mental illness who do not have adequate treatment [55]. This leads to the urgent need to have measurement instruments that are useful to identify strategies that promote, prevent, and treat adverse psychological consequences in Latin American countries. Thus, this study aimed to examine the MI of the WHO-5 in samples from 12 different countries.

The results give further support to the evidence of validity and reliability of the WHO-5, demonstrating the presence of solid psychometric properties for the Spanish version applied to Latin American countries. The evaluation of the factor structure of the WHO-5 confirmed the unidimensionality of the scale in the 12 participating countries. This leads to suggest that the Spanish version of the WHO-5 supported the structure of the original scale [17] and that applied in other samples and languages [9]. Similarly, reliability coefficients are very high in each of the countries (α and ω ≥ 0.90), except in the countries of El Salvador (α = 0.94; ω = 0.77) and Bolivia (α = 0.91; ω = 0.89) where they are still acceptable. The unidimensionality of the WHO-5 in the participating countries suggests that all items measure the same construct from a single factor [56]. This would allow for, as with other satisfaction or well-being scales, summing item responses into a total score for use in epidemiological and psychological studies [57]. However, it is important to mention that the RMSEA values, in the factor model, were higher than those recommended in most countries, except Mexico and Guatemala [43, 45]. This is to be expected, since in factorial models with few degrees of freedom, such as the one evaluated in this study made up of five indicators, the RMSEA tends to present a low performance, even if the model is adequately specified [58, 59]. In this regard, it is a mistake to discard factorial models that have high RMSEA values and small degrees of freedom without taking into account other types of information, such as the other fit indices or the factor loadings of the model, which in the case of the current study are very adequate [58].

The results of the IRT analysis indicate that all items of the WHO-5 were highly discriminative, especially item 3 (“I feel active and energetic” [“Me siento activo y enérgico”]). That is, item 3 allows us to adequately distinguish between individuals who have different levels of subjective well-being. This result is consistent with a study conducted in 35 countries where item 3 was also one of the items that allowed for a better and more accurate assessment of people with moderate and high levels of subjective well-being [9]. Item 3 would provide more information on subjective well-being during the COVID-19 pandemic because feeling active and engaging in activities improves quality of life and well-being [60]. Thus, those people who experience subjective well-being may respond more to this item compared to others. Likewise, the difficulty parameter for responding to the items was ascending. This would indicate that a higher level of the latent trait (in this case, subjective well-being) is needed to respond to the higher response categories (high subjective well-being). Finally, the item information curves indicate that the WHO-5 items are more informative at medium or high levels of subjective well-being.

To conduct cross-cultural studies, it is important to conduct MI analysis to assess the possibility that the latent constructs remain the same in different samples from various countries and to generalize the findings to other cultural contexts, as well as to be able to compare levels of subjective well-being across country populations [61, 62]. Overall, the results of the present study indicate that the WHO-5 is invariant at the strict level across samples from different Latin American countries. Specifically, configural invariance was supported, indicating that the unidimensional structure is equivalent across the 12 countries. That is, participants from every country conceptualize subjective wellbeing in a similar way on a single common underlying factor. Furthermore, there is evidence to support metric invariance, where factor loadings were equal across all samples and indicate that people in different countries respond to the items in the same way. The presence of metric invariance is an important prerequisite for meaningful comparisons between different groups [35, 63]. Thus, this finding would allow us to compare regression coefficients and covariance between different groups. Likewise, the presence of scalar invariance indicates that the observed scores are related to the latent scores. In this sense, people who have the same score in the latent construct (subjective wellbeing) would obtain the same score in the observed variable derived from the WHO-5, regardless of whether they belong to one group or another. Scalar invariance is necessary to compare latent means between groups [15]. Finally, strict invariance equated factor loadings, thresholds, and item residuals. The fit of the strict invariance model, observed in this study, would indicate that item measurement errors are the same across countries and that internal consistency is equivalent across the countries assessed [64]. The aforementioned findings support the idea that, when comparing different Latin American countries, it can be assumed that the WHO-5 measures the same psychological construct (subjective wellbeing) in all groups. Therefore, comparisons are valid and differences and/or similarities between countries can be interpreted in a meaningful way [15].

While the main objective of the study was to demonstrate the MI of the WHO-5, we also assessed the differences in scale scores between the participating country samples. For this, composite scores were calculated and not latent variables. Calculating latent variables would have meant choosing a reference group to compare all the other groups [65]. Thus, since it was not possible to identify a single country as a reference group, and considering the importance of assessing the differences between countries, we chose to compare composite measures. The results indicated the presence of moderate and small size differences in subjective well-being among most countries, meaning that people in these countries differed relatively little in their subjective well-being scores during the COVID-19 pandemic. The differences may be partly explained by different orientations towards happiness associated with cultural differences in subjective well-being [66]. This suggests that while the WHO-5 allows for a general assessment of subjective well-being, more in-depth assessments are needed in future studies due to the complexity of subjective well-being [67].

Among the countries that show a greater difference, it was found that Chile presents the lowest subjective wellbeing score, compared to Guatemala (d = − 0.77), Colombia (d = − 0.60) and El Salvador (d = − 0.53). This is explainable, since a previous study showed that Chile is one of the countries with the highest symptoms of dysfunctional anxiety due to COVID-19 in Latin America [68], associated to the significant increase in the number of new infections, due to the false perception of security in the population due to the successful start of the vaccination campaign. This situation could have caused a decrease in the perception of subjective wellbeing of the Chilean population. In general, the differences in means between all countries should be interpreted with caution. More studies are needed to investigate the validity of the WHO-5 in samples that vary, for example, in demographic background. Nevertheless, despite group-level differences, the findings have the potential to add further evidence to the construct validity of the scale.

The current study also has a number of limitations that should be mentioned. First, the study included only a few Latin American countries, mainly from South America (Argentina, Bolivia, Chile, Colombia, Ecuador, Paraguay, Uruguay and Peru), and very few from Central and North America (Cuba, El Salvador, Guatemala and Mexico). Future studies should work with samples from more Latin American countries to obtain more solid conclusions. In addition, the samples in each country were largely comprised of university-educated individuals, who may have certain privileges in terms of socio-economic status and access to health care, which do not necessarily represent the different characteristics in the general populations of each of the countries evaluated. Also, the majority of participants, in all countries, were female. Previous studies have suggested that higher rates of mood, anxiety and stress disorders occur in women [69]. Therefore, gender differences could account for part of the results in this study. However, the difference in the number of men and women in each country did not allow us to examine gender differences in the study. Thus, future studies should investigate the MI of the WHO-5 by sex within countries to assess whether men and women respond to the items differently. Similarly, participants were selected by purposive sampling. All of the above may limit the generalizability of the results and suggest careful interpretation. Additionally, no other measurement instruments were used in this study. Therefore, it was not possible to examine how the WHO-5 is associated with other constructs, which does not provide information about the convergent or divergent validity of the WHO-5. We also did not assess the possible effect of social desirability of responses, which may have been minimized by assuring anonymity in data collection. Also, the cross-sectional design did not allow us to control for cohort effects, nor to assess test–retest reliability and longitudinal MI. Finally, while the total sample (> 5000) may be large [70], the number of participants in some countries may be considered small [71]. Small or moderate sample sizes are common in social science and psychology research [72]. This could generate inadequate conditions for estimating psychometric parameters and the replicability of findings [73]. However, it is important to consider different aspects to assess replicability, such as the large magnitude of factor loadings, and the convergence of methods (CTT and IRT) to assess psychometric properties, which can generate greater confidence in the results obtained. The limitations mentioned here should be considered by future studies to better understand the replicability of the results and to obtain other psychometric evidence needed to complement the substantive WHO-5 research.

Despite these limitations, this study has several strengths. Including several Latin American countries provides more generalizable results with respect to WHO-5 MI than previous studies. Similarly, the findings attempt to fill a gap in the existing literature on the measurement invariance and cross-cultural applicability of the WHO-5 in Latin American countries, thus improving future research in this region. Furthermore, assuring MI is a prerequisite for having an unambiguous interpretation of differences in mean scores and examining the relationships of the WHO-5 with other variables of interest in different settings. Thus, research during the COVID-19 pandemic could incorporate an assessment of subjective well-being as a valid outcome measure in different Latin American countries. In addition, the results may be useful for planning interventions to promote subjective well-being in different Latin American countries. For example, the WHO-5 could be administered to the general population during different periods of the current pandemic or after the pandemic to monitor changes in subjective well-being.

In conclusion, the WHO-5 showed MI in Argentina, Bolivia, Chile, Colombia, Cuba, Ecuador, El Salvador, Guatemala, Mexico, Paraguay, Peru, and Uruguay. This may contribute to the progress of the study of subjective well-being from a cross-cultural perspective. Therefore, this instrument may be useful for assessing subjective well-being in these countries during the COVID-19 pandemic, since the differences between scores in the twelve countries can be attributed to differences in subjective well-being and not to other characteristics of the scale, such as comprehension of the items or familiarity with their response formats. Furthermore, for practical purposes, having a short measure (the WHO-5 has five items) is beneficial for people who have little time to complete longer surveys. Thus, researchers and practitioners can benefit from using the brief and empirically sound WHO-5 to assess subjective well-being in different countries. However, despite the results, future studies on the possible cultural variations in the conceptualization and assessment of subjective well-being could use other more emic approaches based more on the creation of measurement instruments that consider participants' specific cultural perspectives rather than on adaptation or translation.