Measurement Invariance of the Meaning in Life Questionnaire Across 17 Countries

The Meaning in Life Questionnaire assesses presence of and search for meaning in life. Although the questionnaire has shown promising psychometric properties in samples from different countries, the scale’s measurement invariance across a large number of nations has yet to be assessed. This study is aimed at addressing this gap, providing insight into how meaning in life is constructed and experienced across countries and into the extent to which cross-country comparisons can be made. A total of 3867 adult participants from 17 countries, aged 30–60, balanced by gender, and with at least secondary education, completed the questionnaire as part of the Eudaimonic and Hedonic Happiness Investigation. Single sample confirmatory factor analysis, multigroup confirmatory factor analysis, and alignment optimization were applied to investigate the scale’s performance across the samples. Good psychometric properties and high levels of approximate measurement invariance emerged for the Presence subscale after removal of item 9, the only reverse-phrased item. Performance of the Search subscale varied more across samples, suggesting caution in interpreting related results supporting approximate measurement invariance. The conceptualization of presence of meaning operationalized in the corresponding subscale (without item 9) appears consistent across countries, whereas search for meaning seems to be less universally homogenous and requires further exploration. Moreover, the Meaning in Life Questionnaire does not reflect the conceptual distinction between “purpose” and “meaning” currently acknowledged by researchers. This issue should be further explored in studies addressing the scale’s performance across cultures.


Introduction
Previously, studies investigating well-being across countries and populations were primarily based on the assessment of life satisfaction and/or positive emotions, namely the hedonic dimensions of well-being. More recently researchers have paid increasing attention to other components of well-being, belonging to the conceptual realm of eudaimonia (Vittersø, 2016). Among them, the experience of meaning in life has emerged as a core facet of eudaimonic well-being, in light of its relevance for mental health, goal setting and pursuit, and existential fulfilment. Given the important role of culture in establishing people's values, assumptions, and needs (Markus & Kitayama, 1991), human processes and patterns may carry different manifestations across cultures. Far from referring to a neutral concept, meaning in life is a value-laden construct, derived from cultural worldviews and referring to an experience deeply embedded in social relationships (Baumeister & Landau, 2018;Marujo & Neto, 2014;Van Tongeren et al., 2018;Wissing, 2014;Wissing et al., 2019;Yu & Chang, 2018). The analysis of data collected in different nations and continents may therefore shed light on the dynamics between intrapersonal, interpersonal, contextual, and cultural dimensions contributing to the construction and experience of meaning in life. To pursue this goal, it is however necessary to verify whether psychometric instruments assessing meaning in life operate well across cultures and contexts.
The most frequently used measure of perceived meaning in life is represented by the Meaning in Life Questionnaire (MLQ), developed by Steger and colleagues (2006), which was locally validated in specific countries or in comparative studies involving two or three countries, primarily the target country and the US (examples of studies follow later). The growing international popularity of the MLQ as a highly representative measure of perceived meaning in life, however, requires a more careful exploration of its invariance properties, in order to establish how comparable scores on this scale will be. This investigation is especially relevant in the current international scenario, characterized by increasing efforts of world agencies and organizations to identify and use reliable and easily administered well-being measures, in order to develop large scale interventions and policies aimed at promoting citizens' individual and collective welfare (e.g., OECD; WHO; European Social Survey; Gallup Poll and the World Happiness Report). To this purpose, international studies conducted in the basic research domain to evaluate well-being measures' solidity, at both the conceptual and cross-cultural levels, are of paramount importance. Due to the above-mentioned focus on measures of life satisfaction and emotion, there is still a lack of information on measures of meaningfulness that can be reliably used in multinational studies and, subsequently, in larger social surveys. One of the projects recently devoted to the understanding of well-being facets across countries is the Eudaimonic and Hedonic Happiness Investigation (EHHI; Delle Fave et al., 2011Fave et al., , 2013Fave et al., , 2016, based on a large international collection of both qualitative and quantitative data across 17 countries, including -among other measures -the Meaning in Life Questionnaire. The assessment of the MLQ's measurement invariance is a basic prerequisite to adequately compare perceived meaning across samples, and to use this indicator for more complex explorations of well-being related to demographic characteristics. While the countries included in the EHHI (see the Method section for the list of countries) were selected based on connections and collaborations between local research teams and may not be representative of the world population, the countries come from six different continents and significantly increases the geographic reach of the MLQ's cross-cultural assessment, with samples from North America, Latin America, Western Europe, Eastern Europe, Oceania, South Asia, the Middle East, and Africa. The countries represent individualistic and collectivistic cultures, include diverse dominant religions, and have varying socioeconomic profiles as indicated by measures such as gross domestic product and the human development index. In addition, studies of the measurement invariance of the MLQ across countries to date typically included two or three countries, and 17 countries by far extend the number of countries included in such a study. Also, each country's sample of adults was diverse with respect to age, gender, and level of educational attainment.
Overall, the findings from a study of the cross-country invariance of the MLQ across 17 countries may provide researchers and policy makers across countries solid evidence of the adequacy of the MLQ for basic and applied studies, as well as for interventions.

The Meaning in Life Questionnaire
In the past few decades, attention to meaning in life grew considerably in clinical and well-being research, leading to the development of several conceptual models and operational definitions of the construct. One of the most frequently used models of meaning in life was developed by Steger and colleagues (Steger et al., 2006(Steger et al., , 2008b. It comprises two dimensions: the presence of meaning in life, representing the individual's perception of their life's meaning and purpose; and the search for meaning in life, that refers to the person's desire and efforts to attain and/or increase the purpose and meaning of life. Theoretically, these dimensions have been postulated to be distinct and relatively independent (Steger et al., 2006).The model was operationalized in the Meaning in Life Questionnaire (MLQ, Steger et al., 2006), which consists of 10 items, five pertaining to the presence of meaning and five to the search for meaning. The two subscales were designed to be relatively orthogonal (Steger et al., 2006). Compared to other instruments evaluating meaning in life, the MLQ has been considered to have outstanding psychometric properties (Brandstätter et al., 2012).

Psychometric Properties
In the original validation study, conducted among US college students, the MLQ displayed adequate internal consistency and test-retest reliability, as well as structural, convergent, and discriminant validity (Steger et al., 2006). The English version of the scale also showed good psychometric properties among Australian adolescents (Rose et al., 2017), South African students (Temane et al., 2014), and Latina/o 1 3 youth attending college in the United States (Vela et al., 2017). Portuguese versions displayed overall good psychometric properties in Portugal among older adults (Simões et al., 2010) and Brazil (Damásio & Koller, 2015;De Aquino et al., 2015), but in the Brazilian context better fit was achieved when the subscales were analyzed separately (Damásio & Koller, 2015). Chinese versions displayed good psychometric properties among Hong Kong students, caregivers, and clinical samples (Chan, 2014(Chan, , 2017Datu & Yuen, 2021 [in the latter the residual terms of items 6 and 7 were allowed to correlate]) and among Chinese students attending middle school (Wang, 2013) and college (Liu & Gan, 2010;Wang & Dai, 2008). The Hausa version performed well among internally displaced persons in Nigeria (Chukwuorji et al., 2019). The Spanish translation attained good psychometric properties among college students (Steger et al., 2008a), while among adults and adolescents in Argentina its model fit improved after removal of item 9 (Góngora & Solano, 2011). Good psychometric properties were also found for the Italian version, tested among adolescents (Di Fabio, 2014) and adults (Negri et al., 2020), as well as for the scale translations in Turkish (Boyraz et al., 2013), Hindi (Singh et al., 2016), Hungarian (Martos & Konkolÿ Thege, 2012), Japanese (Steger et al., 2008c), Greek (Pezirkianidis et al., 2016), Latvian (Kolesovs, 2019), and Arab and Hebrew (Abu-Raiya et al., 2021). The scale also performed well in Romania (language of administration not specified, Balgiu, 2020).

Cross-cultural Measurement Invariance
Overall, these studies attest to the MLQ's potential for use in different countries, languages and cultural groups. Data are however lacking as concerns scale performance similarity or mean scores comparability across a larger number of nations. Differences in individual and social values may affect the way in which participants perceive items or interpret response scales; this issue is particularly relevant for a culturally rooted construct such as perceived meaning in life. Therefore, the measurement invariance of the MLQ must be evaluated in order to correctly interpret comparative findings derived from different cultural and linguistic groups. Measurement invariance refers to the equivalence of a construct across groups or time (Putnick & Bornstein, 2016). It indicates that a construct has the same meaning for different groups or over time. There are four levels of measurement invariance: configural (equivalent model form), metric (equivalent factor loadings), scalar (equivalent item intercepts), and strict (equivalent items' residuals). Meaningful comparisons between groups usually require that at least scalar invariance is supported, because it means that items have a comparable factor intercepts and loadings among the compared groups (Putnick & Bornstein, 2016). Achieving measurement invariance in cross-cultural research is quite challenging. Many factors can cause lack of measurement invariance, such as using instruments developed in one culture in other cultural contexts, different reference frameworks in self-judgments in diverse cultures, varying cross-cultural response patterns, social desirability and inappropriate translation (Chen, 2008).
The few studies including attempts to assess the measurement invariance of the MLQ, never involved more than three cultural groups. Full metric, partial scalar, and partial invariance of residual variances were obtained between Turkish and US samples (Boyraz et al., 2013); in addition, the two groups did not differ significantly in their Presence and Search subscale scores. Metric invariance of the MLQ was supported among US adolescents with Latin, Asian, and European background (Kiang & Fuligni, 2010); the mean score for Presence of meaning was significantly higher for Latin and Asian Americans compared to European Americans, while Search for meaning was significantly higher for Asian Americans than for the other two groups. Scalar invariance was detected for US and Japanese college students (Steger et al., 2008c), with US students reporting significantly higher Presence of meaning, and Japanese students reporting higher Search. Considering the findings altogether, some level of measurement invariance was generally detected across cultural groups, while subscale mean scores varied across the samples.

Aims of the Study
Although these findings suggest a certain level of measurement equivalence of the MLQ across cultures, to the best of our knowledge no studies have been conducted yet to assess the instrument's measurement invariance across a large number of countries. In addition, the presently unclear pattern of the two subscales' mean scores across cultural groups may be better disentangled through an exploration of country rankings on a large set of countries.
To address these knowledge gaps, the first aim of the present study was to evaluate the measurement invariance of the MLQ across 18 samples from 17 countries. The second aim, if supported by findings of invariance, was to compare the mean scores of the Presence and Search subscales. The study contributes to existing literature in three ways. First, the measurement invariance of the MLQ was explored for the first time across a large number of countries, providing information on the scale's cross-country utility. Second, information was provided on noninvariant items across samples/countries, beyond the general model fit, offering insight for possible scale revisions. Third, the estimated factor mean scores across the 18 samples were compared.

Method
The present study analyzed data from the international Eudaimonic and Hedonic Happiness Investigation (EHHI) project, aimed at exploring different facets of well-being across countries through a mixed methods research design (Delle Fave et al., 2011, 2013, 2016. For this study, data collected through the Meaning in Life Questionnaire and a sociodemographic questionnaire were analyzed.

Participants
The total sample consisted of 3867 participants obtained from 17 countries: Australia, Croatia, Italy, Portugal, South Africa, Argentina, Brazil, India, Mexico, New Zealand, the United States of America, Norway, Hungary, Chile, Israel, Peru, and the Czech Republic; India contributed with two samples, one from the Northern region of India (Delhi and Haryana) and one from the Southern state of Tamil Nadu, which were analyzed separately based on their language and cultural differences. Data for each sample were collected by local researchers. Most countries contributed with n = 216 participants, except for South Africa (n = 215), Argentina (n = 208), New Zealand (n = 215), and Israel (n = 205). Each local sample was balanced by gender, age group (following the threefold partition 30-39, 40-49, and 50-60 years) and education level (secondary and tertiary education).
The global sample consisted of participants whose mean age was 44.27 years (SD = 8.81); 87.1% were employed; 70.8% reported having an average standard of living; as for civil status, 71.4% of the participants were married or cohabitating, 16.8% single, 9.4% separated or divorced, and 1.8% widowed; and as for religion, 35.2% reported never practicing, 32.0% occasionally, and 31.3% regularly.

Procedure
The MLQ was administered in the local language in all countries, except in the multilingual South Africa, where data were gathered in English, the lingua franca commonly used by people with secondary education. In countries where a version of the scale in the local language was not available, translation was performed by a person fluent in both the local language and English, and back translated by another bilingual person; adaptations were made to the translated version if needed ( Van de Vijver & Leung, 1997).
Ethics approval for the study was obtained in each country according to the local institutional and legal requirements, and in line with the Helsinki Declaration. Local researchers recruited the participants through nonprobability sampling methods, such as face-to-face interaction and poster advertisements in public areas, snowball sampling, and word-of-mouth. Participation was voluntary. After signing a written informed consent form, participants completed the questionnaires autonomously and returned it to the local researcher in person, by mail or email; in Australia, the USA, and New Zealand it was also possible to complete the questionnaire online.

Measures
Sociodemographic Questionnaire Participants were invited to provide information concerning their gender, age, education level, standard of living, employment status and job typology, marital status, number of children, and religion.

Data Analysis
Following Cieciuch et al. (2018), we first applied confirmatory factor analysis to examine the fit of the baseline model for each sample. Then, multigroup confirmatory factor analysis (MGCFA) was applied to assess the measurement invariance of the MLQ across the 18 samples. MGCFA (cf., Byrne & Van de Vijver, 2010;Van de Vijver & Leung, 2011;Vandenberg & Lance, 2000) involves a hierarchical set of steps. The first step consists of fitting a configural model, which assumes that the number of factors and the configuration of fixed and freely estimated parameters hold across all groups without restricting any parameters to be equal. If configural invariance is confirmed, metric invariance is tested, where the factor loadings are restricted to be equal across groups. If metric invariance does not hold, a stepwise procedure can be implemented where free estimation of loadings is allowed one-byone until a sufficient level of partial metric invariance is obtained. The third step, scalar invariance, requires item intercepts to be equal across all groups. Stepwise freeing of intercepts that prove not to be equal can result in partial scalar invariance. Scalar invariance is a prerequisite for comparing factor mean scores across groups.
With a large number of groups it is highly improbable to obtain complete scalar invariance, that requires exactly the same values for a large set of parameters. Also, making post hoc adjustments to obtain partial invariance in a stepwise fashion is problematic, as these procedures may result in models that are not replicable (Marsh et al., 2018). The recent increase in cross-cultural and multinational studies has prompted an evolution in the statistical techniques that are more appropriate for analyses conducted on a large number of groups. Kim et al. (2017) have provided an overview of five approaches that serve this purpose.

Alignment Optimization
Taking into account these guidelines, the approach deemed as most appropriate for the present study is the alignment optimization method developed by Asparouhov and Muthén (2014). This method has been recently used to assess measurement invariance across many groups (e.g., Glassow et al., 2021;Jang et al., 2017;Raudenská, 2020;Sischka et al., 2020;Zakariya et al., 2020). Alignment optimization starts with a well-fitting configural model and then seeks an optimal invariance pattern utilizing a simplicity function similar to the rotation criteria applied with exploratory factor analysis. The algorithm searches for the largest amount of inequivalence in a minimal number of parameters, while allowing the majority of the parameters to be approximately invariant. The final aligned model has the same fit as the configural model, just like the rotated model in exploratory factor analysis is a simplification of the loading matrix without compromising the 1 3 fit of the unrotated model. The method also produces estimates for the factor mean scores that can be used to compare groups on the relevant construct (presence of and search for meaning in this study), if evidence is found for adequate levels of measurement invariance.
Following Asparouhov and Muthén (2014) and Muthén and Asparouhov (2018), in order to evaluate the quality of estimation when the alignment method was applied, the estimated parameter values were used to generate data for a simulation study. Two approaches to aggregation were applied. First, the correlation between the true factor means and the estimated factor means was calculated for each replication and averaged over all replications. This value signifies the performance of the alignment method for the extent of noninvariance as well as the sample size. Second, the correlation was computed between the true factor means and the average estimated factor means, with the average calculated over the replications. This value is largely independent of sample size and provides an indication of the performance of the alignment method for the extent of noninvariance studied . Guidelines suggest that the correlation values should be at least 0.980 .

Statistical Program and Analysis Specifications
All the analyses were done using Mplus Version 8.0 (Muthén & Muthén, 1998-2017. Since the MLQ comprises more than five response options, the robust maximum likelihood (MLR) estimator, which assumes continuity of data, was deemed as appropriate to correct for any deviations from normality (Blunch, 2008;Rhemtulla et al., 2012). As recommended by Asparouhov and Muthén (2014), free alignment was specified, where all samples' factor means and variances were estimated freely. Missing data were handled by applying full information maximum likelihood estimation (for the total sample, 0.25% of the data on the MLQ were missing).
Upon request, the data used for this study are available from the second author and analysis code from the corresponding author.

Single Sample CFA
To assess model fit for the single sample CFAs, we examined the global fit indices reported in Table 1. Nonsignificant χ 2 statistics (i.e., yielding large p-values) and comparative fit index (CFI) values larger than 0.900 (earlier guidelines) or 0.950 (later guidelines) support good fit; root mean square error of approximation (RMSEA) values of 0.050 or less indicate good fit, and values less than 0.080 reasonable fit (Byrne, 2012). Since the χ 2 statistic is largely dependent on sample size, it is reported but not interpreted. Possible sources of misfit were identified from 1 3 small factor loadings on the target factor, large item residual variances, small item R 2 values, and large modification indices (Byrne, 2012). Table 1 clearly shows that a 2-factor CFA model, where the Presence and Search factors were allowed to correlate, did not yield good fit for several samples (CFI < 0.900 for 11 samples; 0.900 < CFI < 0.950 for 4 samples; RMSEA > 0.080 for 15 samples). Item 9 ("My life has no clear purpose"), the only reverse-phrased item in the scale, yielded a relatively small factor loading on the target factor (Presence), a relatively large residual variance, and a relatively small R 2 value for almost all samples. After dropping the item, the fit improved for several samples, but in many cases it was still inadequate (CFI < 0.900 for 7 samples; 0.900 < CFI < 0.950 for 5 samples; RMSEA > 0.080 for 12 samples).
In the next step, therefore, the Presence and Search subscales were analyzed separately. Considering the CFI values, the global fit for the Presence subscale was good for almost all samples (CFI < 0.900 for 1 sample; 0.900 < CFI < 0.950 for 2 samples). In some cases, large RMSEA values did not support good fit (RMSEA > 0.080 for 6 samples), but when the degrees of freedom are small (which is the case with these single factor models), RMSEA tends to falsely indicate poor model fit (Kenny et al., 2015). Although global model fit was generally good for the Presence subscale, item 9 tended to manifest small factor loadings, large residual variances and small R 2 values. Dropping the item produced a model with acceptable fit for all samples considering the CFI values (CFI > 0.900 for all samples). (Notably, RMSEA > 0.080 for 10 samples, but this model had only 2 degrees of freedom, confirming the above stated issue with RMSEA tending to produce false rejection of model fit.) For the Search subscale model fit varied across the samples, with five samples attaining CFI values lower than 0.900. Modification indices suggested that fit may be improved by correlating the residual variances of items 2 ("I am looking for something that makes my life feel meaningful") and 10 ("I am searching for meaning in my life") for some samples, and of items 3 ("I am always looking to find my life's purpose") and 8 ("I am seeking a purpose or mission for my life") for other samples. In both cases we deemed the correlations justifiable based on linguistic similarity as items 2 and 10 both refer to "meaning" and items 3 and 8 to "purpose". These adapted models were applied to all samples, but the small incremental improvement obtained across the samples did not justify the addition of a parameter.

Measurement Invariance across the 18 Samples using MGCFA
The results of the MGCFA measurement invariance testing are shown in Table 2. Since the single sample CFAs did not support the 2-factor model, MGCFA was performed separately for the two subscales. We were hesitant to apply MGCFA to the Search subscale, since the baseline model did not yield good fit for all samples. Nonetheless, we decided to report the related findings, with a cautionary note concerning their possible lack of reliability.
The configural model fit was interpreted using the guidelines for assessing measurement invariance across 10 or more groups (Rutkowski & Svetina, 2014). CFI values larger than 0.950 and RMSEA values less than 0.100 indicate adequate fit.

3
Measurement Invariance of the Meaning in Life Questionnaire…

3
The smaller the values of the Akaike information criterium (AIC) and the Bayesian information criterium (BIC), the better the fit. The results of the chi-square test are also reported, but not interpreted due to the test's sensitivity to sample size. For both subscales, the configural models displayed adequate fit without the need to remove any items. However, based on findings from the single sample CFAs which suggested that item 9 may be problematic, we repeated the analyses for the Presence subscale without item 9, and support was found for configural invariance.
To assess if metric and scalar invariance hold, the likelihood ratio test (difference in chi-square between nested models) was applied. It enables comparing the fit of the metric model with the fit of the configural model and the fit of the scalar model with the fit of the metric model. Results are reported but not interpreted, because of the test's dependency on sample size. The difference between the CFI and RMSEA values of nested models (ΔCFI and ΔRMSEA, respectively) will be instead used for interpretation (Chen, 2007;Cheung & Rensvold, 2002). Rutkowski and Svetina (2014) suggested that ΔCFI < 0.020 indicates adequate levels of invariance when the number of groups is large, while ΔRMSEA should be less than 0.030 to support metric invariance and less than 0.010 to support scalar invariance. In this study, support was found for metric invariance for each subscale, but not for scalar invariance. In view of the lack of exact scalar invariance, we applied alignment optimization .

Approximate Measurement Invariance using Alignment Optimization
Alignment optimization was applied to each subscale, but findings for the Search subscale should be interpreted with caution. Table 3 displays the contribution of each item's intercept and loading to the optimized simplicity function, where higher values indicate higher levels of measurement noninvariance . Table 3 also shows the R 2 values, where R 2 indicates the proportion of the parameter's variation that is attributable to the variation of its underlying factor's mean and variance rather than error variance (Ivanova et al., 2018). Higher R 2 values indicate higher levels of measurement invariance, with R 2 = 1 implying that all the variance in the parameter is attributable to variation in the underlying factor's mean and variance (Ivanova et al., 2018). Items 4 (Presence) and 8 (Search) displayed the highest level of measurement invariance, while item 9 (Presence) exhibited the lowest level of measurement invariance, especially due to the intercepts' noninvariance. Table 4 displays the (non-) invariance results for the measurement intercepts and factor loadings. Samples obtaining a significantly noninvariant measurement parameter are denoted by an X. For the full Presence subscale, the intercepts were invariant for items 5 and 6, but noninvariant for item 1 in two samples (Mexico and Norway) and for item 4 in two samples (North and South India). Noninvariant intercepts for item 9 were identified in 10 of the 18 samples. The factor loadings of all Presence items were invariant, except for the loading of item 9 for New Zealand. Overall, 30.56% of the item 9 parameters were noninvariant, which is in excess of the 25% rule of thumb suggested by Muthén and Asparouhov (2014). After omitting Measurement Invariance of the Meaning in Life Questionnaire… item 9 from the Presence subscale, only 3 intercepts and no loadings were noninvariant, suggesting that estimated factor mean scores can be compared across samples in subsequent steps. For the Search subscale, the intercepts of items 3 and 8 were invariant, as well as the loadings of items 2, 8, and 10. Overall, item 8 was invariant, while the number of noninvariant parameters ranged between one (2.78%) for item 3 and five (13.89%) for item 10. No Search item had more than 25% noninvariant parameters, supporting subsequent comparison of estimated factor mean scores.

Table 4 Approximate measurement (non-) invariance for intercepts and loadings across samples Approximate measurement (non-) invariance for intercepts and loadings across samples
Scale Item Australia (1) Croatia (2) Italy (4) Portugal (5) South Africa (7) Argentina (8) Brazil (9) North India (10) Mexico (13) New Zealand (14) USA (15) Norway (16) Hungary (17) South India (18) Chile (19) Israel (20) Peru (22) Czech Republic (23) Intercepts Presence (all items) 1 X X 4 X X 5 6 9 X X X X X X X X X X Presence (item 9 omitted) Note. X indicates that the particular sample obtained a significantly noninvariant measurement parameter. Sample numbers represent their numbers in the original dataset and are therefore non-sequential Table 4 also shows that the number of noninvariant parameters was very low across samples. For the Presence subscale, after removal of item 9, 15 samples did not present any noninvariant parameter, while three samples (North India, Mexico, and Norway) had one (10%). For the Search subscale, nine samples yielded no noninvariant parameters, six samples (Australia, Croatia, Italy, Portugal, New Zealand, and Chile) had one (10%), and three samples (North India, Mexico, and the Czech Republic) had two (20%). Table 5 displays the estimated factor mean scores (item 9 removed from the Presence subscale) for all samples when the alignment method was applied, with the scores being ordered from the highest to the lowest. For Presence of meaning, Mexico attained the highest mean score, and Israel the lowest one; for Search for meaning, North India scored the highest and Portugal the lowest. The table also shows, for each sample, which of the other samples had significantly lower estimated factor mean scores (α = 0.05). For example, as concerns Peru, three samples (Australia, Italy, and Israel) yielded significantly lower factor mean scores on Presence. When interpreting the estimated factor mean scores and corresponding country rankings, however, it is important to keep in mind that the samples were not representative of each country's population and that the findings may therefore not represent true population scores.

Monte Carlo Simulation Check
The simulation study allowed for checking how well the alignment method worked under conditions that closely resemble the data under examination; the two approaches to aggregate the correlations produced diverging results. In both cases, we used n = 216, 500 replications, and 18 samples. First, the correlations between the true factor means and the estimated factor means, calculated for each replication and averaged over all replications, were 0.961 and 0.971 for the Presence and Search subscales, respectively. These values fell somewhat below the guideline value of 0.980, possibly because the sample sizes were too small to support trustworthy comparison of factor means across the groups . To estimate what sample sizes could provide more trustworthy results, we ran additional simulation studies. As reported in Table 6, sample sizes of 400 and 300 may result in the recommended correlation values of 0.980 for the Presence and Search subscales, respectively. Second, the correlations between the true factor means and the average estimated factor means, with the average calculated over the replications, were 0.999 and 0.998, respectively. These values are above the suggested guideline values and therefore support adequate performance of the alignment method.

Discussion
Single sample CFAs of the MLQ showed that, for several national samples, a 2-factor model did not produce good fit. When analyzing the subscales separately, the Presence subscale yielded good fit for all samples, especially after removing item 9, the only reverse-phrased item. The Search subscale produced good fit for some but not all samples, suggesting that findings pertaining to Search in subsequent invariance analyses should be interpreted with caution. MGCFA applied separately to the Presence and Search subscales provided support for configural and metric, but not scalar invariance. When applying alignment optimization, items 4 (Presence) and 8 (Search) were highly invariant, while item 9 (Presence) was significantly noninvariant. Several samples yielded no noninvariant parameters, and the remaining ones a very low number. After removal of item 9, adequate levels of approximate measurement invariance supported comparison of the samples' estimated factor mean scores. Samples were then ranked based on their estimated factor mean scores for Presence and Search, respectively.

Performance of the Presence and Search Subscales
While the Presence subscale holds promise for future use across countries, particularly after removing item 9, the Search subscale's usefulness across countries is questioned. As also noted by other researchers (cf. King & Hicks, 2021), the conceptualization of search for meaning should be further refined. In this regard, Wong (2018) distinguishes between negatively-oriented search for meaning, that prompts people to find explanations and solutions for problems, and positively-oriented search for meaning, that encourages people to discover their true purpose in life. Further exploration of the construct of search for meaning and its manifestations in different cultural contexts is warranted. For example, the Indian perspective includes four major life purposes called Purusharthas: Dharma (righteousness), Artha (wealth), Kama (pleasure), and Moksha (liberation). In such a world view, meaning varies according to individuals' developmental stage and is grounded in a self-transcendent perspective.
In addition, the available literature concerning the cultural understanding and correlates of the two meaning in life facets has provided equivocal evidence. Some of the studies conducted on this topic suggested a recurrent negative association between presence and search for meaning in individualistic countries (Negri et al., 2020;Steger et al., 2006Steger et al., , 2008b, while in collectivistic ones positive or absent correlations emerged between the two dimensions (Kiang & Fuligni, 2010;Steger et al., 2008c). Other studies however provided less straightforward results, with negative associations between presence of and search for meaning detected in participants from collectivistic Turkey (Boyraz et al., 2013) and India (Singh et al., 2016), and in a multicultural sample form South Africa (Temane et al., 2014). Variations also emerged in relation to age, with adolescents seeming more prone to perceive both presence and search for meaning than adults (Brassai et al., 2012). Finally, some studies identified an interaction between gender and ethnicity in the association between presence and search for meaning (Kiang & Fuligni, 2010): Among participants with Latin American background, women had lower levels of search for meaning than men, whereas the opposite trend was detected in European American and Asian samples. Alltogether, these findings call for a more systematic comparison of the MLQ across countries and samples. Furthermore, although Steger et al. (2006) used the words "meaning" and "purpose" as synonyms (e.g., "greater meaning or purpose in life" on p. 81 and "presence of meaning or purpose" on p. 83), participants might understand them differently. In the MLQ, the term "meaning in life" is used in four items, whereas "purpose" is used in five items. The associations and denotations of these two constructs may overlap or differ in various cultural contexts. Reker et al. (1987) distinguished between meaning, that refers to "making sense, order, or coherence out of one's existence" (p. 44), and purpose, that refers to "intention, some function to be fulfilled, or goals to be achieved" (p. 44). More recently, Martela and Steger (2016) propose a tripartite structure of "meaning in life", including coherence (the cognitive component), purpose (the motivational component), and significance (the existential component). Further exploration is therefore necessary to ascertain whether the equation of meaning and purpose in the MLQ influences the scale's performance across countries and cultures.
In addition, people may understand presence of meaning from two different perspectives: Firstly, "having meaning" -a more abstract sense of having an overall meaning in life, that is more cognitive and includes being able to identify personal values and sources of meaning; and secondly, "experiencing meaning" -a more concrete, experiential presence of meaning in life (cf. Russo-Netzer, 2019). Reflecting these nuances in the formulation of a scale assessing presence of meaning may further enhance our understanding of the construct and its concomitants and dynamics.

Performance of Item 9
Consistent with previous evidence (e.g., Góngora & Solano, 2011;Hallford et al., 2018;Schutte et al., 2016), item 9 (Presence) displayed questionable fit in singlesample analyses, and it was the most noninvariant item across countries. It is the only reverse-phrased item in the MLQ ("My life has no clear purpose") and represents the direct negation of item 4 ("My life has a clear sense of purpose").
Although reverse-phrased items may hold certain benefits (e.g., control acquiescence and careless responding, broaden the content domain covered), formulating an item as the mere negation of a positively phrased one does not broaden the content domain, while possessing inherent disadvantages such as cross-cultural differences in response styles such as acquiescence (Weijters & Baumgartner, 2012). Alternatively, item 9 could tap into meaninglessness, but equating meaninglessness with low levels of presence of meaning is questionable (King & Hicks, 2021). While further research is needed to better understand whether item 9 performs poorly due to methodological issues related to its reverse phrasing, or whether it may signify a conceptual difference between meaninglessness and the absence of presence of meaning, we recommend removing item 9 from the scale, particularly in cross-cultural/cross-country studies. A possibility might be to replace item 9 with a new positively phrased item that refers to "significance", which is now recognized, together with understanding / coherence and purpose, as one of the three core components of meaning in life (George & Park, 2016;King & Hicks, 2021;Martela & Steger, 2016). In the current version of MLQ, "significance" only features in the Search subscale, "I am always searching for something that makes my life feel significant" (item 7).

Comparison of Meaning Scores across Samples
A high level of invariance was attained when alignment optimization was applied, allowing for the comparison of the estimated mean scores across the samples. Invariance was particularly good for the Presence subscale (item 9 removed), that also displayed good fit in most single sample CFAs. Different cultural factors may influence MLQ scores, one being the degree of a culture's individualism/collectivism. In order to explore this possibility, the average individualism scores (Hofstede-Insights, 2018, July 13) for the examined countries are provided in the Appendix. Comparisons should however be interpreted with caution, since both sets of data were obtained from non-representative and non-overlapping samples.
On average, participants from countries with higher levels of individualism reported lower scores on Presence of meaning in life. Several studies have shown that family and other close interpersonal relationships are primary sources of meaning for people from various cultural contexts (e.g., Baumeister et al., 2013;Delle Fave et al., 2013;King & Hicks, 2021). It has also been argued that meaning is in essence a collective rather than individual phenomenon (Baumeister & Landau, 2018) and that meaning and relatedness are core components of positive functioning, rooted and transmitted in the interaction between biology and culture (Delle Fave & Massimini, 2015;Wissing et al., 2019). Considering the entwinedness of meaning and relatedness, it could be unsurprising that people from cultures that place more emphasis on interconnectedness may experience higher levels of meaning in life. However, the association between presence of meaning and collectivism is not consistent across the countries. For example, although Norway is more individualistic according to the scores presented in the Appendix, it scored relatively high on Presence of meaning, whereas the relatively collectivistic Portugal scored low, when compared to the other countries. Other contextual factors, such as political system, resource distribution and social stratification may also play a role, and they should be investigated in future research. As concerns the Search subscale, no clear pattern emerged in its associations with individualism / collectivism. This finding should be however observed with caution, considering the lack of good fit of the Search subscale for several samples.
In line with previous inconclusive evidence on the association between meaning and individualism / collectivism (Boyraz et al., 2013;Kiang & Fuligni, 2010;Steger et al., 2008c), the cultural roots and correlates of meaning in life seem to be intricate. Additionally, the understanding and significance of Search for meaning could interact with the level of experienced Presence, and this could, in turn, be associated with cultural values such as individualism / collectivism. In this regard, evidence was found for the relationship of Search for meaning with higher subjective well-being in more collectivistic countries (Li et al., 2021). Other studies suggested that Search for meaning contributes to well-being only when Presence of meaning is very high (Park et al., 2010) or when people who have already found meaning continue to actively search for meaning (Steger et al., 2011).
Overall, the dynamic interplay between meaning and culture deserves further attention. A shared network of meaning is being ceaselessly constructed, distributed, and reconstructed among the set of interconnected individuals constituting a given culture (Chiu & Hong, 2007). Culture, thus, represents a framework or a web of meaning that enables individuals to function in a given ecology (Fiske, 2000).

Limitations and Future Directions
Although this study adds to the body of literature on meaning in life, it is not without limitations. As concerns samples, the inclusion of geographically and culturally diverse countries was not derived from a random or representative selection, and some geographical regions are underrepresented. Moreover, the participants were adults between 30 and 60 years of age, living in urban areas and with at least a secondary education diploma. Future research should investigate the replicability of the findings among participants from rural areas, with different age and education levels, and from other countries and regions. The size of each sample was small, as also suggested by the simulation study; larger sample sizes may have resulted in more reliable results. In addition, the MLQ was translated to the local language for many of the samples, and the words used in translated versions may differ from the original scale in nuance and connotations. More research is needed to address the cross-cultural complexities of how social constructs such as meaning in life are understood and expressed in language. This is particularly important regarding search for meaning for which measurement was found to be largely noninvariant across countries. Qualitative approaches may be particularly informative to pursue this goal. This study did not systematically compare the correlation between presence of and search for meaning in life scores across countries or sociodemographic variables such as age and gender. Such investigation in the future can shed light on this debated topic.
An exploratory examination of mean score comparisons across the countries based on aggregate collectivism and individualism scores was performed, but sample differences between local datasets may call comparability into question. Additional insight may derive from research involving representative samples and assessing meaning in life together with other facets of well-being, cultural values, and other sociodemographic determinants of human functioning. Linked to this, the current study reports on the cross-country rather than cross-cultural measurement invariance of the MLQ, since cultural values and dimensions were not assessed, and country was used as a proxy for culture. Including measures of cultural values and dimensions in the future will allow for a deeper understanding of how culture may influence the experience, expression, connotations, and correlates of different facets of meaning in life.
At the analytical level, the alignment optimization method, although promising, is relatively new and more research is needed to validate interpretation guidelines, such as the 25% cut-off criterion for noninvariance (Jang et al., 2017).

Conclusions
The Presence subscale of the MLQ displayed good psychometric properties across 18 samples from 17 countries, particularly after removal of item 9, the only reverse-phrased item. In contrast, the Search subscale's performance varied more across the samples. Based on the study findings, we recommend that item 9 should be removed from the Presence subscale. Search scores can only be used if model fit is supported for the specific country sample. The good psychometric performance of the Presence subscale suggests not only that the subscale holds promise for use across countries and contexts, but also that presence of meaning in life may be a universal phenomenon ingrained in human experience. The Search subscale seems to be less appropriate for cross-country studies, possibly due to a higher heterogeneity of the construct and of its cultural understanding. More exploration is needed to better understand the meaning, manifestations, and correlates of search for meaning within and across countries and cultures, as well as the role of religion in finding meaning. Conceptually, the MLQ does not reflect the recently developed tripartite structure of meaning in life (Martela & Steger, 2016); therefore, the impact of equating the words "purpose" and "meaning" on the MLQ performance across cultures remains to be further studied.
Author Contribution Lusilda Schutte played a lead role in the conceptualization of this paper, formal analysis, methodology, writing -original draft, and writing -review and editing. Ingrid Brdar played a lead role in the investigation -data collection in Croatia, funding for data collection in Croatia, and writing -review and editing, and a supporting role in the conceptualization of this paper and methodology. Marié P. Wissing played a lead role in the investigation -data collection in South Africa, funding the South African data collection, and writing -review and editing, and a supporting role in the conceptualization of this paper. Marko Tončić played a lead role in writing -review and editing, and a supporting role in the conceptualization of this paper, methodology, and formal analysis. Ulisses Araujo, Erik Carlquist, Alejandro Castro Solano, Teresa Freire, María del Rocío Hernández-Pozo, Paul E. Jose, Tamás Martos, Jeanne Nakamura, Pamela Nuñez del Prado Chaves, Pninit Russo-Netzer, Kamlesh Singh, Alena Slezackova, Lawrence Soosai-Nathan, Wenceslao Unanue, and Dianne A. Vella-Brodrick played a lead role in the investigation -data collection in their respective countries, funding data collection in the respective countries, and a supporting role in writing -review and editing. Antonella Delle Fave played a leading role in the coordination of the Eudaimonic and Hedonic Happiness Investigation (EHHI) project, and contributed to the present study with supervision, investigation -data collection in Italy, funding data collection in Italy, writing -review and editing, and a supporting role in the conceptualization of the specific manuscript. All authors reviewed and approved the final version of the manuscript.