Seligman (2011) proposed that flourishing was comprised of five components: Positive emotion, Engagement, Relationships, Meaning, and Accomplishment (i.e., PERMA). Seligman’s framework has been commonly measured by the PERMA-Profiler (Butler & Kern, 2016), along with a number of variants (e.g., for the workplace and in childhood/adolescence; e.g., Kern et al., 2014, 2015) and translations of the original measure (e.g., Giangrasso, 2021; Lai et al., 2018). While this is only one measure amongst others that have captured positive mental health, the model and the measure have been used in numerous studies, research programs, and applied programs.

Seligman (2018) noted that PERMA is not an exhaustive list of the building blocks of well-being, and encouraged researchers to explore additional evidence-based building blocks that may improve the PERMA framework (Seligman, 2018). Donaldson and Donaldson (2021a) suggested that there was a theoretical rationale for testing four additional building blocks of well-being (PERMA + 4): (1) physical health (i.e., I typically feel physically healthy), (2) mindset (i.e., setbacks are opportunities to grow), (3) environment (i.e., access to nature, natural light), and (4) economic security (i.e., stable resources).

First, physical health was included to measure the impact of health on well-being above and beyond the absence of disease (Seligman, 2008). Physical health focuses on measuring health assets by determining factors that can lead to a variety of positive health outcomes, such as lower health care expenditures, better prognosis when illness does strike, and higher quality of life. Second, mindset was included to measure an open, developable “future-oriented” construct characterized by prospection, growth mindset, and a proclivity towards persevering in the face of setbacks, especially over long periods of time (Duckworth et al., 2007; Dweck, 2006; Luthans et al., 2007). Third, environment was added to measure physical, restorative factors that have been found to maximize our best selves at work (Hartig et al., 1997). Elements may include an abundance of natural light, access to nature, assurance of physiological safety, and organization in the physical arrangement of the workplace (Hartig et al., 1997). Fourth, scholars in the field of positive psychology have long examined the impact of income on well-being. Diener and Seligman (2004) found that individuals who are well-off financially are on average happier than poor people. However, Diener and Seligman (2004) found that differences in income made the most dramatic impact on well-being at varying levels of poverty, presumably as it relates to meeting basic needs. To further examine the impact of income on well-being, economic security was added and defined as an individuals’ perception of dimensions they believe are critical to their economic security, such as income stability, job security, and buffers against medical spending shocks.

Donaldson and Donaldson (2021a) developed a 29-item measure of PERMA + 4 and found that PERMA + 4 was predictive of important work outcomes, such as individual, team, and organizational adaptivity, proactivity, and proficiency. A recent systematic review further supported the original work by Donaldson and Donaldson (2021a) by showing strong associations between PERMA + 4 and desirable workplace outcomes (Cabrera & Donaldson, 2023). It has been suggested that PERMA + 4 may serve as robust framework for the design, measurement, and evaluation of work-related well-being programs and interventions (Donaldson et al., (2022).

In the workplace, brief measures are critical, both to allow repeated measurement to occur (e.g., assessing the “pulse” of employees), and because time available for employees to complete surveys is limited (Lang & Tay, 2021). Past research has shown that findings from short surveys can be more valid, reliable, and result in higher response rates compared to longer surveys (Kost & Correa da Rosa, 2018). In addition, studies have suggested that short scales create opportunities for more advanced research designs (e.g., ecological momentary assessment) and theory building (Ziegler et al., 2014).

To date, measurement studies on PERMA and PERMA + 4 have used Classical Test Theory (CTT), with methods such as factor analysis to develop and validate the PERMA measures (cf. Butler and Kern, 2016; Giangrasso, 2021; Iasiello et al., 2017; Kern and Khaw, 2015; Ryan et al., 2019; Umucu et al., 2020), including the research that validated the four additional building blocks (Donaldson and Donaldson, 2021b). While CTT is useful for understanding the reliability and validity of latent traits for new tests, interpretations of respondents abilities are dependent on the test used (Diener et al., 2018). In other words, if items with poor discrimination between participants are used to assess a construct, interpretations of such findings may misclassify those who are high versus low on the latent trait. Item response theory (IRT) can ensure that a scale is suitable for respondents possessing different levels of the construct under scrutiny (i.e., sufficient information for very high or low scorers). Additionally, CTT assumes that errors of measurement are constant for all respondents and does not account for respondents that possess varying levels of the latent traits, and thus varying levels of error (Zanon et al., 2016). To compliment CTT, IRT can provide ample information across the latent spectrum, which enables predictive validity with desired outcome measures.

Item response theory is a scale development tool that has been used to shorten existing scales, and has been shown to measure the well-being and performance of employees (Lang & Tay, 2021; Nima et al., 2020). Item response theory compliments CTT by providing a person-by-item interaction, showing the quality of measurement at specific points on the latent trait (Oishi, 2006). Unlike CTT that applies measurement error across the entire sample, measurement error in IRT varies depending on the latent score (Oishi, 2006). Past research has shown that IRT has been successfully used to create short versions of existing instruments (Donaldson et al., 2021; Petersen et al., 2006; Sekely et al., 2018). However, to our knowledge, IRT has not been applied to the measurement of PERMA or PERMA + 4.

The current study used IRT to develop and evaluate the psychometric properties of a short scale of PERMA + 4, including an examination of item difficulty and discrimination, test information, and a direct comparison of test information in the short version of PERMA + 4 compared to the longer version of PERMA + 4. Findings from this study may be used to help workplace well-being coaches and practitioners accurately screen for nine dimensions of well-being in the workplace to best inform programs and interventions.

1 Methods

1.1 Participants and Procedure

Participants were recruited from two independent data sources, representing full-time employees in Canada (n = 1,003) and Australia (n = 942). Table S1 provides a sociodemographic breakdown of the Australian and Canadian samples. Participants were recruited using an online panel agency, which collected a representative sample based on sociodemographic characteristics. Participants were paid between $2.00-$3.00 for completing the survey. Participants were provided with a link and were directed to an online survey. After providing informed consent, participants completed survey items assessing their work-related well-being. The last part of the survey asked participants to report demographic characteristics. The survey took approximately 20 min to complete. All research materials and procedures were approved by Claremont Graduate University’s Institutional Review Board.

1.2 Measures

PERMA + 4. One item from each of the PERMA dimensions of the PERMA-Profiler (Butler & Kern, 2016) were included, along with an additional item on physical health. Three items on each additional PERMA building block–mindset, environment, and economic security–were adapted from the Positive Functioning at Work Scale (PF-W) (Donaldson & Donaldson, 2021a), resulting in a 15-item measure. Past research has supported the internal consistency, reliability, and convergent and discriminant validity of the PERMA + 4 (Donaldson & Donaldson, 2021a). For each item, participants were asked to consider how well they have felt and functioned at work over the past two weeks. They were instructed to indicate the extent to which they agreed with each statement, ranging from 0 (not at all, 0%) to 10 (completely, 100%).

Demographic variables. Gender was coded as male, female, non-binary, or not-applicable. Age was coded as 25–34, 35–44, 45–54, 55–65, or 66–99. Education was coded as still at school, less than a high school diploma, high school diploma, some college, but no degree, vocational training, Bachelor’s, Master’s, Doctorate, or other. Race/ethnicity was measured as White/Caucasian, Asian, Indigenous, Black, Hispanic/Latinx, Middle Eastern or North African, Pacific Islander, Bi-Racial/Multi-Racial, other, or not applicable. Work sector was coded as privately funded organization, publicly listed organization, government funded organization, not for profit, or other.

1.3 Analytic Strategy

All analyses were conducted in the R statistical program using the mirt, psych, lordif, mokken, and stats packages (Chalmers, 2012; Chambers et al., 1990; Choi et al., 2011; R Core Team, 2021; Revelle, 2017; van der Ark, 2007). Descriptive statistics, including means, standard deviations, response frequencies, and skewness and kurtosis were computed. Byrne (2010) argued that data is considered to be normal if skewness is between -2 to + 2 and kurtosis is between ‐7 to + 7. Missing data was handled using listwise deletion.

Prior to implementing a parametric IRT model, a Mokken Scale Analysis and differential item functioning procedure were performed to examine the dimensionality and item invariances of PERMA + 4 (Choi et al., 2011; Mokken, 2011). To assess unidimensionality, item-pair scalability coefficients were computed using H. A strong unidimensional scale is denoted by H > 0.5, 0.4 < H < 0.5 denotes a medium scale, and H < 0.40 denotes a weak scale (Mokken, 2011). An automated item selection procedure was used to calculate inter-item covariances and the relationship between items and the latent trait (Meijer & Baneke, 2004). Latent monotonicity for each item was examined using a visual plot of item step response function by rest score group. The rest score group is achieved by summing the overall score minus the score on each item (Junker & Sijtsma, 2000). Per the recommendation of Robinson et al. (2019), a residual correlation matrix with a cutoff < 0.25 was used to assess local independence between the PERMA + 4 items. A one-way analysis of variance test was conducted to check for balance between the age and gender categories on PERMA + 4. Nonsignificant differences were found for age (p = 0.189) and gender (p = 0.450), suggesting that there were no pre-existing differences in age and gender on PERMA + 4.

To test for measurement invariance between the Canadian and Australian samples before performing parametric IRT on the combined samples, a differential item functioning procedure was used to detect uniform and non-uniform differential item functioning (Choi et al., 2011). A chi-square likelihood ratio test was used as the detection criterion at the alpha level of 0.01, and McFadden’s pseudo (R2) was used as the magnitude measure. Four plots were produced for each item to visualize item response models between the two groups. The first plot showed the item characteristic curves (ICCs) between the two groups, the second plot showed the absolute differences between the ICCs for the two groups, the third plot showed the item response functions for the two groups, and the four plot showed the absolute differences between the ICCs weighted by the score distribution.

A Samejima’s graded response model was used to estimate item discrimination and ability on the latent trait of the PERMA + 4 (Woods, 2006). The graded response model is an extension of the two-parameter logistic model for items with two or more response categories (Samejima, 1997). Three parameters were estimated and used to evaluate each item of the 15-item measure of PERMA + 4: 1): the ability level of PERMA + 4 was denoted as theta (Θ); 2) an index of difficulty was denoted as bi, with items that represent the top end of the PERMA + 4 being considered more difficult; 3) the discrimination of the item was denoted as a. Items that accurately differentiate an individual that is low on dimensions of PERMA + 4 from individuals that are high on dimensions of PERMA + 4 received a higher discrimination score. Test information was used to estimate the items precision across the latent trait, and item and option characteristics curves were used to screen the ability of items to progress along the latent trait given the response categories (i.e., 0–10). Conditional reliability was used to compare the reliability of the 15-item and 9-item measure of PERMA + 4 along the latent trait.

Based on past research (Sekely et al., 2018), two approaches were used to select the best item that represented mindset, environment, and economic security: a theory-based approach (i.e., item content was evaluated by the research team) and an empirically driven approach (i.e., items with the highest test information, discrimination, and difficulty). Using both approaches allowed for the selection of items that possessed the most information along the latent trait (PERMA + 4) and maintained the construct validity of each dimension as defined byDonaldson and Donaldson (2021a).

2 Results

2.1 Item Descriptive Statistics and Response Frequencies

Item means, standard deviations, and response frequencies for the 15-item measure of PERMA + 4 are presented in Table 1. Skewness and kurtosis values were considered normal for the 15-item measure of PERMA + 4. The mean scores ranged from 4.8 to 7.2. Response frequencies across PERMA + 4 items tended to favor the higher response options (i.e., above response option 5 compared to below 5). However, the environment (nature) item and all three economic security items had a considerable number of responses at the lower end of the Likert-scale. All response options were included for item response modeling.

Table 1 15-item measure of PERMA + 4 with item means, standard deviations, and response frequency values

2.2 Dimensionality

A Mokken Scale Analysis of the 15-item measure of PERMA + 4 conducted separately for the Australian and Canadian samples showed a medium item-pair scalability coefficient (H = 0.420). An automated item selection procedure and visual inspection of plots further supported the dimensionality of the 15-item measure of PERMA + 4, by showing a monotonic relationship with the rest score group. All residual correlations between items were below the recommended < 0.25 (Robinson et al., 2019), suggesting local independence. A differential item functioning analysis found that seven items were flagged for uniform and non-uniform differential item functioning. However, a graphical display of the differential item functioning on each item of the 15-item measure of PERMA + 4 showed that the absolute differences between the item characteristic curves mainly occurred at low levels (theta values < -2) of PERMA + 4, suggesting minimal impact. The two samples (i.e., Canadian and Australian) were then combined for item response modeling.

2.3 Item Response Modeling

Table 2 shows the graded response model item parameters for the 15-item measure of PERMA + 4. The discrimination parameters for the five items used to measure PERMA ranged from 2.16 to 3.34, and were classified as possessing very high discrimination (Baker, 2001). The discrimination parameters for the four additional PERMA factors ranged from 0.80 (moderate) to 2.80 (very high) (Baker, 2001). In terms of item difficulty thresholds (i.e., the point at which each item is 50% likely to be endorsed from the preceding response options along the continuum of theta) 0 versus 1 ranged (b1) from − 2.96 to -1.64, 1 to 2 (b2) ranged from − 2.84 to -1.48, 2 to 3 (b3) ranged from − 2.63 to -1.25, 3 to 4 (b4) ranged from − 2.25 to -0.85, 4 to 5 (b5) ranged from − 1.91 to -0.53, 5 to 6 (b6) ranged from − 1.07 to 0.19, 6 to 7 (b7) ranged from − 0.65 to 0.74, 7 to 8 (b8) ranged from − 0.07 to 1.5, 8 to 9 (b9) ranged from 0.7 to 2.42, and 9 to 10 (b10) ranged from 1.19 to 3.21.

Table 2 Graded response model item parameters for the 15-item measure of PERMA + 4

Option characteristic curves were plotted for the 15-item measure of PERMA + 4 (see Fig. 1). The item discrimination coefficient (a) reflects the steepness or the slope of the curve. Items that had higher discrimination (e.g., positive emotion [a = 3.06], engagement [a = 3.34]) were endorsed along theta as the response options increased. On the other hand, items that had lower discrimination (e.g., economic security (medical) [a = 0.80], economic security (savings) [a = 0.86]) were not able to distinguish between response options as theta increased.

Fig. 1
figure 1

Option characteristic curves for 15-item measure of PERMA + 4

Note. P1 = the probability of respondents endorsing response option 0 (not at all); P11 = the probability of respondents endorsing the response option 10 (completely)

2.4 Test and Item Information Functions

Based on results from the item response modeling and an examination of item content, a 9-item short version of PERMA + 4 (see Table 3) was tested against the 15-item measure of PERMA + 4: positive emotion, engagement, relationships, meaning, accomplishment, health, mindset (prospection work), environment (focus), and economic security (income). Figure 2 shows a conditional reliability plot comparing the 15-item measure of PERMA + 4 and 9-item measure of PERMA + 4. Visual inspection of Fig. 2 shows that score estimates were most reliable in the − 3 to + 2 theta range. The item information function for the 9-item measure of PERMA + 4 shows that items positive emotions, engagement, meaning, and accomplishment possessed the most test information relative to the total PERMA + 4 score (Fig. 3).

Table 3 9-item short measure of PERMA + 4
Fig. 2
figure 2

Conditional reliability plot comparing the 15-item and 9-item measure of PERMA + 4

Note. Θ = the level of PERMA + 4; rxx = reliability value from 0 to 1

Fig. 3
figure 3

Item information function for the 9-item measure of PERMA + 4

Note. Θ = the level of PERMA + 4; I = information; information is an arbitrary value estimated using the latent trait

3 Discussion

This study developed and tested a 9-item short scale of PERMA + 4 using item response theory from two large samples of employees in Canada and Australia. The findings showed that 9 items, one item representing each building block of PERMA + 4, had good item discrimination and provided comparable test information compared to a 15-item measure of PERMA + 4. Taken together, these findings may be useful to well-being researchers who are interested in using a short, robust, and reliable measure of PERMA + 4, or for practitioners who want to reduce participant burden while maintaining a comprehensive metric of well-being in “real world” settings.

The present study found that most respondents in the Canadian and Australian workplace reported relatively high levels of PERMA. However, reports on the additional PERMA + 4 building blocks of well-being, including on economic security and environment, were found to be much lower than the five PERMA elements. Employees’ perceptions of their economic security and immediate physical environment may have conjured up feelings of uncertainty or negative emotions. Past research has shown that individuals fear economic losses, and when they experience such losses, their well-being also suffers (Hacker & Jacobs, 2008). Global crises, such as the COVID-19 pandemic and the war in Ukraine may have exacerbated feelings of economic insecurity and threats to the physical environment for participants in the current study. Nonetheless, research supports that perceptions of economic security and restorative physical features of our environment can impact employee well-being (Bellini et al., 2015; Diener & Seligman, 2004; Easterlin, 2003). Scholars and practitioners that design work-related well-being programs and interventions should consider economic and environmental aspects to support wellbeing.

Numerous studies have validated measures of PERMA across countries and populations (Lai et al., 2018; Pezirkianidis et al., 2021; Watanabe et al., 2018). These studies have primarily relied on CTT methods, whereas findings from the present study advanced measurement precision in this literature by using item response theory. Findings from the item response model showed that the 9-item measure of PERMA + 4 had good item discrimination. Thresholds estimates and option characteristic curves also showed acceptable separation along the latent trait. Most respondents endorsed items higher on the scale (i.e., 6 and beyond), even if their scores were found to be average on the latent trait. These findings suggest that this 9-item measure of PERMA + 4 may accurately reflect the workplace well-being levels of typical employees.

Past systematic reviews have shown that employees with higher levels of PERMA + 4 were associated with increased job satisfaction and job performance (Donaldson et al., 2019; Roll et al., 2019). Likewise, the present study found that the 9-item measure of PERMA + 4 was conditionally reliable between theta values of -3 to 2, representing most of the sample. These findings suggest that practitioners and researchers may use the total score across the nine items to measure the overall construct of PERMA + 4 in the workplace. Future research should assess the predictive ability of the 9-item measure of PERMA + 4 with other work outcomes, such as work performance, and assess convergent validity with similar well-being measures (e.g., psychological capital). Studies might also test the 9-item measure of PERMA + 4 in various workplace industries and validate the scale among other populations like college students or school children. Further longitudinal studies are needed to assess how PERMA + 4 varies within individuals, between individuals, and the subsequent impact on work performance.

3.1 Limitations

There were restrictions on the number of items that could be administered from the original 29-item PERMA + 4 developed by Donaldson and Donaldson (2021a)). As such, the 15 items selected to represent PERMA + 4 may have impacted the findings. The data were collected using a cross-sectional, self-reported survey instrument and may be prone to social desirability bias (Donaldson & Grant-Vallone, 2002). Age was measured using a band response option (i.e., a range of ages such as 25–34) and thus specific age-related differences may not have been detected than if a continuous variable was collected.

4 Conclusion

The present study found support for a 9-item measure of PERMA + 4 using item response theory. The item level information builds on past research that used CTT to develop and validated PERMA-related measures, adding an IRT analysis and reduction of items needed to adequately capture the PERMA + 4 construct. Scholars and practitioners are encouraged to use this short scale to design, measure, and evaluate workplace programs and interventions that assess well-being.