Development and Validation of the Awareness Outcomes Measure (AOM) Using Rasch Approach

Awareness is a key component of concepts related to well-being, such as mindfulness and authenticity. Similarly, interventions to enhance mindfulness and well-being often focus on developing awareness. But measuring the effect of awareness development represents a challenge due to the lack of reliable and valid measures focused specifically on awareness outcomes. This study aimed to develop and validate the Awareness Outcomes Measure (AOM) using modern Rasch methodology. The measure was developed from Self-Awareness Outcomes Questionnaire (SAOQ) items, drawn from previous research with awareness-development experts. A partial credit Rasch model was applied to examine the psychometric properties of the AOM with a combined sample of 713 participants from three English-speaking countries. The 21-item AOM met expectations of the unidimensional Rasch model. It is a reliable and psychometrically sound instrument, invariant across sex, country, and age, designed to measure the outcomes of awareness development. Person-item thresholds demonstrated excellent coverage of awareness outcomes, and we developed an algorithm for ordinal-to-interval transformations presented in a table to further enhance precision of the AOM. In this study, we have developed and validated the AOM, providing researchers and practitioners with a robust measure of awareness outcomes that is suitable for use in a range of populations.

The development of both mindfulness and authenticity is proposed to be important in the improvement of well-being (Querstret et al., 2020;Sutton, 2020) and awareness of the self is a key element of both of these constructs. Mindfulness, for example, is defined in terms of an awareness that operates on "thought, feeling, and other contents of consciousness" (Brown & Ryan, 2003, p. 823). Similarly, authenticity is commonly defined in terms of self-awareness and genuine self-expression (Kernis & Goldman, 2006). Interventions to improve awareness, such as mindfulness training or talking therapies, therefore have significant potential to improve individual well-being.
When researchers and practitioners wish to evaluate the effectiveness of such awareness-building interventions, they frequently attempt to measure changes in levels of awareness itself or choose related outcomes such as well-being, engagement, or work performance. While valuable, there are some drawbacks to these approaches. First, they involve the paradox of attempting to measure self-awareness by directing the individual's attention towards their awareness. Second, the different measures of awareness are not interchangeable and may be associated with different outcomes. Third, the measurement of non-overlapping outcomes in different studies reduces the possibility of effective comparisons between interventions. We explore each of these drawbacks in more detail next and demonstrate how the Self-Awareness Outcomes Questionnaire (SAOQ; Sutton, 2016), a measure focused on evaluating the outcomes associated with awareness rather than awareness itself, provides a potential solution.
A significant, but often overlooked, confounding issue with measuring awareness is that it necessarily involves metacognitive awareness of one's own self-awareness. The well-known Dunning-Kruger effect describes how individuals with less awareness of their own skills can overestimate their abilities, while developing more accurate awareness can paradoxically reduce their self-ratings (Kruger & Dunning, 1999). Recent research has shown that those individuals who show evidence of the Dunning-Kruger effect in the reasoning domain also report greater self-efficacy and overestimate their use of effective decision-making styles (Christopher et al., 2021). Thus, when practitioners attempt to aid people to develop their awareness, there is likely to be at least a sub-group who are less able to accurately judge their abilities. Grossman (2019) has suggested exactly this as a challenge to mindfulness studies using self-report questionnaires. Ironically, even if an awareness-developing intervention is successful, participants may report reduced awareness in post-than in pre-intervention measures as their metacognitive abilities have improved. There is some evidence that this may be the case in mindfulness development as Baer et al. showed that meditators scored higher on all scales of the Five Facet Mindfulness Questionnaire (FFMQ) except Acting with Awareness (Baer et al., 2008). A measure that assesses outcomes associated with greater awareness, rather than expecting participants to have accurate metacognitive awareness, can avoid this potential confound.
The second challenge facing measures of awareness is that commonly used self-rating instruments use facets or subscales drawn from a range of mindfulness or authenticity questionnaires. For example, the FFMQ (Baer et al., 2006) assesses awareness with its Acting with Awareness facet (Krägeloh et al., 2019) drawn from the Mindful Attention and Awareness Scale (MAAS; Brown & Ryan, 2003). The Authenticity Scale evaluates the extent to which a person's awareness and expression of their states, emotions, and cognitions is congruent (Wood et al., 2008) while the Integrated Authenticity Scale assesses self-awareness as a subscale (Knoll et al., 2015). While each measure or index of awareness is of course valuable in its own right, the distinct definitions and conceptualizations mean that they are not equivalent and thus the ability to compare the effectiveness of different interventions to improve awareness is diminished. A focus on outcomes that are associated with general increases in awareness across a range of interventions and conceptualizations would enable this kind of comparison.
A third drawback of these common approaches to evaluating awareness-building interventions is that they do not measure "downstream" changes in a coherent or standardized manner. That is, they do not evaluate the effect of increased awareness on a specified range of individual outcomes that are theoretically and demonstrably linked to awareness. Instead, the effectiveness of mindfulness training may be evaluated in terms of general outcomes such as individual well-being, stress, and physical symptoms through work-related outcomes such as job satisfaction or client satisfaction (Jamieson & Tuckey, 2017). Similarly, workplace awareness-building interventions, such as 360-degree feedback, may be evaluated solely in terms of their impact on work performance measures (Fletcher & Bailey, 2003). There are very few measures available to researchers or practitioners who wish to assess the effect of awareness on a standardized range of individual outcomes and thereby enable comparative evaluations of different interventions.
The SAOQ was designed to address these issues in awareness measurement (Sutton, 2016). It avoids the metacognitive issues of measuring awareness by focusing on reporting the frequency with which individuals experience certain states or engage in specific behaviors. Second, as a single measure for a wide range of outcomes, it allows for the testing of theoretical distinctions between existing awareness concepts by considering their differential effects on sets of outcomes. And third, it provides a means to evaluate awareness-building interventions using a standard array of changes people experience when awareness improves.
The SAOQ was developed through a three-part process. First, qualitative analysis of responses to awareness-building workshops identified an initial list of outcomes (Sutton et al., 2015). Second, expert focus groups expanded on this list and developed thematic categories with example items. The resulting list of items was administered to a stratified sample of respondents with higher and lower awareness as measured by several awareness-related questionnaires. Finally, items were revised through factor analysis, resulting in a 38-item measure assessing awareness outcomes on four subscales. The reflective self-development subscale (RSD, 11 items, α = 0.87) captures the development of continuous attention to the self, with a focus on conscious, reflective, and balanced learning. The acceptance of self and others subscale (Acceptance, 11 items, α = 0.83) measures outcomes such as improved self-image and a deeper understanding of other people. A third scale was developed specifically to assess proactivity at work (Proactive at Work, 9 items, α = 0.81) and measures the extent to which an individual has an objective and proactive approach to dealing with work. And finally the Emotional Costs scale (7 items, α = 0.77) includes items to capture the potential negative impacts identified by people engaged in awareness development, such as guilt, vulnerability, and fear.
The SAOQ has proven valuable in distinguishing between conceptualizations of dispositional awareness, identifying their unique patterns of associations with theoretically relevant constructs (Sutton, 2016). It provided further confirmation that dispositional rumination (as measured by the RRQ, Trapnell & Campbell, 1999) is a generally damaging form of awareness as it was found to predict reduced acceptance and proactivity, as well as increased emotional costs. The SAOQ subscales also helped to distinguish between the concepts of dispositional reflection as measured by the RRQ and reflection as measured by the Self-Reflection and Insight Scale (SRIS; Grant et al., 2002): While the former was not associated with any of the 4 SAOQ subscales, the SRIS self-reflection scale positively predicted both reflective self-development and proactivity outcomes. Insight into the self, the second subscale of the SRIS (Grant et al., 2002), was a positive predictor of acceptance only. Mindfulness, as measured by the MAAS (Brown & Ryan, 2003), was associated with increased proactivity as well as emotional costs. Somewhat similarly, increased frequency of engagement in awareness-building practices (such as meditation, talking therapy, or mindfulness practice) was positively associated with reflective self-development and acceptance but also increased emotional costs. In evaluating the effects of awareness-building interventions, therefore, the SAOQ provides a measure of both positive and negative outcomes associated with different conceptualizations of awareness.
The SAOQ is somewhat unique in its inclusion of a subscale measuring the potentially negative outcomes associated with improved awareness. These negative outcomes have received some attention in the literature (Brown et al., 2007;Zhang et al., 2013) but rarely have researchers actively tried to measure these outcomes or practitioners attempted to mitigate them. The SAOQ has demonstrated some utility in providing a balanced assessment of positive and negative outcomes in longitudinal research designed to evaluate interventions to increase participant awareness. For example, pre-post-intervention research to evaluate the impact of group coaching (Sutton & Crobach, 2022) and emotional intelligence workshops (Mounce & Culhane, 2021) both showed that carefully designed interventions are indeed able to improve the positive outcomes of awareness while protecting participants from the emotional costs.
Despite these promising results, there has also been some confusion in the research literature regarding the use of the SAOQ, with some studies attempting to use it to measure awareness itself rather than the anticipated outcomes of awareness. For example, items from the SAOQ were adapted for a study of students building a personal brand through Instagram (Wiryananta et al., 2021) and to measure student self-awareness in the use of reference management software (Avidiansyah & Kurniajaya, 2020). Although these interpretations of the measure were flawed, psychometrically, the subscales demonstrated good convergent and discriminant validity and Cronbach alphas above 0.70.
Thus far, the SAOQ has proven useful from both a theoretical and practical standpoint, addressing a need for psychometrically sound instruments to measure the effects of awareness-building practices. However, the SAOQ is fairly lengthy, with a total of 38 items. Many research projects seek to measure several variables at once, and the need for short yet robust measures is well-known (Stewart- Brown et al., 2009). This becomes even more critical in research designs using longitudinal measures, where participants are asked to complete the same measures two or more times. To continue the development of our understanding of awareness, especially with regards to the effect of awareness-building activities, a robust instrument that can precisely measure the latent construct is needed.
A further challenge in using the SAOQ is that, in common with many psychological measures, it uses an ordinallevel measurement and hence does not satisfy fundamental assumptions of many parametric tests. Specifically, each item in the questionnaire explains a different quantity of information in the latent traits being measured, meaning that it may not be an accurate measure of that latent concept (Stucki et al., 1996). In addition, conversion of ordinal scores to interval level scales is advantageous in comparing scores across different instruments.
Rasch analysis uses a probabilistic model that accounts for unique contribution of individual items to the overall latent construct and is well suited to address these challenges (Rasch, 1960;Tennant & Conaghan, 2007). A Rasch model can improve the precision of measurement by allowing researchers to derive interval-level scores from ordinal-level responses, taking account of both the item difficulty and the person's ability (Bond & Fox, 2013). These models are increasingly used in analyzing mindfulness measures (Medvedev et al., 2016(Medvedev et al., , 2017: researchers were able to refine and improve widely used measures of mindfulness by using a combination of item subsets to resolve local dependencies and discarding items that do not fit the model adequately. The increased precision of these kinds of measures has been demonstrated by comparing ordinal with Rasch-transformed scores (Norquist et al., 2004). In addition, Rasch models typically have lower standard deviations for person-parameter estimates allowing for a more fine-grained estimate of changes (Tennant & Conaghan, 2007).
In this study, therefore, we used Rasch analysis of the SAOQ to develop a short, unidimensional measure of awareness outcomes (the Awareness Outcomes Measure or AOM). The use of Rasch analysis will ensure that this measure is robust across contexts and cultures and well suited to a range of applications, including intervention and longitudinal studies. In addition, the study aims to produce ordinal-to-interval transformation tables to increase precision of measurement in intervention research.

Participants
Researchers who had contacted the first author to request permission to use the SAOQ were emailed with a request to share their data for the purposes of improving and validating the measure. Datasets which recorded participant age, sex, and country were then included in this study, resulting in a total sample of 713 participants from three countries (New Zealand, Philippines, and UK). To enable investigation of differential item functioning (DIF) in Rasch analysis, original age categories in the datasets were combined into two approximately equal-sized age categories: 18-24 and > 25 years old. Demographics are summarized in Table 1.

Measures
As noted in the introduction, the SAOQ items were developed in a three-stage process (Sutton, 2016). First, qualitative responses describing the effects a self-awareness development intervention were analyzed to produce a pool of 61 items. Second, expert focus groups categorized and expanded on this pool of items to produce a total of 83 potential outcomes of developing awareness. Finally, these items were arranged into a questionnaire, refined through factor analysis and tested for reliability and validity, to produce the 38-item SAOQ.
The SAOQ employs a 5-point Likert scale format from 1 = never to 5 = almost always, with a N/A option. There are no negatively scored items and total scores are calculated by taking the mean of each subscale, excluding the N/A items. Higher scores correspond to a higher frequency of experiencing the self-awareness outcomes. For the purposes of this Rasch analysis, all N/A scored data was coded as missing and cases with > 5% missing data were excluded. In the current dataset, Cronbach's α = 0.86 [95% CI 0.85, 0.88] and McDonald's ω = 0.85 [95% CI 0.83, 0.87], but items from the emotional costs subscale correlated negatively with the scale overall, indicating further need for development and refinement.

Data Analyses
RUMM2030 software (Andrich et al., 2009) was used to conduct Rasch analysis following the main steps outlined elsewhere (Medvedev et al., 2016(Medvedev et al., , 2017. An acceptable Rasch model fit requires that individual item fit residuals do not exceed a range of − 2.50 to + 2.50 and that chi-square statistics for the interaction between items and the latent variable, reflecting deviation of the scale data from the Rasch model, should be non-significant (p > 0.05). In addition, individual items should not be locally dependent, which requires evaluation of a residuals correlation matrix. Typically, a correlation magnitude of 0.20 with reference to the mean of all residual correlations is interpreted as a sign of local dependency (Christensen et al., 2016). Local dependency is often referred to as a method effect and can be addressed by creating super-items by combining individual item scores (Wainer & Kiely, 1987). Rasch models also require scale invariance across personal factors such as age, gender, or country, which is reflected by having no DIF.
Reliability in Rasch analysis is examined using the person separation index (PSI), which assesses how well a construct defined by its items discriminate between individuals' scores spread out along scale continuum. Dimensionality in Rasch analysis is typically investigated using the method developed by (Smith, 2002), which involves independent-samples t-test comparisons of person-estimates between the groups items with the lowest and highest factor loadings on the first principal component of the residuals. Unidimensionality requires that the percentage of significant t-test comparisons is not statistically significant. In other words, the number of significant t-test comparisons should be less than 5% or the lower bound of a binominal confidence interval calculated for the number of tests should overlap 5% (Tennant & Pallant, 2006).

Results
Rasch model fit of the initial analysis statistics including item location, fit residual, and chi-square for goodness of fit for the original 38 SAOQ items are shown in Table 2 and the overall model fit summarized in Table 3 (A1). Sixteen items demonstrated significant misfit to the Rasch model and the overall fit was also unacceptable as evidenced by significant chi-square of item-trait interaction and lack of unidimensionality. In Step 2, 6 items (4, 8, 16, 20, 24, and 28) showing significant misfit with extremely high fit residuals (> 4.00) were deleted. As this represented 6 of the 7 items contributing to the EC scale, the final item from this scale (Item 12) was also deleted to ensure internal construct validity. This resulted in increased reliability of the scale and improvement of the overall model fit but the interaction chisquare was still significant indicating a deviation from the expectations of the Rasch model (Analysis A2 in Table 3). In Step 3, further misfitting items were identified and removed (3,6,7,11,14,23,25,26,27,29). This resulted in acceptable overall fit and good reliability but no evidence of unidimensionality, as the percentage of significant t-tests was 18.5% (Analysis A3 in Table 3). The residual correlation matrix was therefore examined and residual correlations  between several items that exceeded the cut-off point of 0.20, indicating local dependency, were identified. These locally dependent items were combined into 3 super-items: 17, 31, and 36; 5 and 30; and 9 and 18. After this minor modification, the final analysis (A4) demonstrated the best fit to the Rasch model with non-significant interaction chisquare, excellent reliability, and unidimensionality. DIF was also evaluated and the scale proved invariant across individual factors including sex, age, and country. Finally, the ordinal-to-interval conversion table was constructed based on person estimates of the Rasch model to convert ordinal scale scores into interval-level data (Table 4). Figure 1 presents the person-item plot from the final analysis, showing good coverage of sample awareness outcomes with no significant ceiling or floor effects. There are no gaps between item thresholds, reflecting excellent reliability of the scale in discriminating between awareness outcome levels of the sample. The sample abilities are relatively high reflected by person mean of 1.37 logits above the item mean set to zero, which would be expected for the healthy population sampled for the current study.

Discussion
Awareness is a central component of several psychological concepts related to well-being, including mindfulness and authenticity. Development of awareness is often a key element of interventions and training programs aiming to improve well-being, yet measurement of the effects of awareness has been hampered by the lack of a suitable measure. This study used Rasch analysis to validate a robust, unidimensional measure of these awareness outcomes. The resulting Awareness Outcomes Measure (AOM; see Supplementary Information) is an internally structurally valid and reliable measure that allows researchers and practitioners to measure awareness outcomes with improved precision by using the transformation table provided here.
The AOM consists of 21 items measuring the original three positive areas of awareness outcomes (reflective self-development, acceptance of self and others, and proactivity). These outcomes are theoretically and empirically linked to higher awareness (Mounce & Culhane, 2021;Sutton, 2016;Sutton & Crobach, 2022) and the new AOM provides researchers with short scale that can be used to evaluate the effectiveness of awareness-building interventions across contexts and cultures. In some recent studies, such as the Rasch analysis of the MAAS, disordered thresholds needed to be corrected by uniform rescoring to address limitations of this scale (Medvedev et al., 2016). However, among the AOM items, there were no significantly disordered thresholds, supporting the utility of the measure's current response format without the need for correction. The AOM is also invariant across sex, age, and three countries (New Zealand, Philippines, and UK), meaning that items do not function differently for these population groups. The AOM therefore provides a robust and promising instrument that can be used for cross-cultural comparisons and interventions with a range of populations. While awareness is central to constructs such as mindfulness, and interventions to improve mindfulness or wellbeing often attempt to develop individual awareness, until now researchers have had little recourse should they wish to evaluate the range of outcomes associated with awareness. Similarly, the development of awareness is central to many therapeutic interventions, such as person-centered counselling (Rogers, 1951) or workplace coaching, and is particularly important as a pre-requisite of self-determined behavior (Deci & Ryan, 1980). Yet the effect of awareness-building efforts on awareness outcomes themselves is rarely considered. Instead, researchers and practitioners usually seek to measure changes in mindfulness itself or in more distal outcomes such as well-being or performance. The AOM offers a robust measure of a range of outcomes associated with improved awareness, providing immediate insight into the beneficial impacts of these types of interventions.
The use of a measure focused on outcomes, i.e., the frequency of experiences or behaviors, rather than selfreport of awareness, avoids the potential confounding effects of low metacognitive awareness (e.g., Dunning-Kruger effect;Dunning, 2011). It also allows researchers to conduct comparative evaluations of interventions designed to improve awareness from a range of theoretical backgrounds, whether that be mindfulness training or talking therapies. Mindfulness training, for example, is often badged and delivered in different ways, meaning that it becomes difficult to identify the mechanisms by which mindfulness may have its effects. Training may be pitched as a way to develop mindfulness, reduce anxiety and stress, or improve well-being, and the training itself can vary in terms of the time commitment or delivery methods (Bartlett et al., 2019). The AOM can provide a means to evaluate the effectiveness of different mindfulness training programs in improving the key outcomes associated with increased awareness.
Model fit for the revised questionnaire was achieved by exclusion of some items (notably, the whole emotional costs subscale). These items were also noted as correlating negatively with other items when the overall internal consistency was calculated for this dataset. In the original SAOQ scale development, the emotional costs subscale showed substantially different relationships with measures of mindfulness and reflection (Sutton, 2016). For example, emotional costs were strongly positively correlated with both mindfulness and ruminative self-focus, while the positive SAOQ subscales showed the opposite relationships. Of some concern was the finding that increased frequency of awareness practices (such as meditation or journaling) was also associated with increased emotional costs. It has been important in previous research to identify these costs in order to ensure that interventions can demonstrate they do not cause harm and that individuals undergoing awareness-development are adequately supported by practitioners (Mounce & Culhane, 2021;Sutton & Crobach, 2022). It should therefore be noted that the AOM is a measure of the positive outcomes of awareness only and future work could consider the need for a similarly robust measure of the potential drawbacks or costs of developing awareness.
The AOM demonstrated internal structural validity, unidimensionality, and excellent reliability. These outstanding psychometric properties met the expectations of the Rasch model, leading to the production of ordinal-to-interval transformation tables to further increase the precision of the measure. Individual AOM items were found to have varying degrees of "difficulty," meaning that they contribute differently to the total score. This difference in individual items contributions is a common limitation of ordinal measures as they do not take account of it during total scale score calculation, thereby increasing measurement error (Bond & Fox, 2013). Therefore, using the conversion table presented in Table 4 enables the researcher to overcome limitations of ordinal scales, transform ordinal scores into interval-level data, and thus meet the assumptions for parametric statistical tests.

Limitations and Directions for Future Research
As noted above, the AOM does not include items from the original SAOQ which measure the emotional costs associated with increased awareness, and this should be borne in mind when evaluating awareness-building interventions. While the original SAOQ has been shown to be useful in distinguishing the outcomes associated with different conceptualizations of awareness and self-insight, the AOM instead focuses on evaluating a wide range of positive outcomes in a coherent manner. It includes items measuring outcomes as diverse as confidence, acceptance of others. and proactive behavior. The relationship of these outcomes with psychological concepts such as self-efficacy or conflict resolution strategies is a promising direction for future research. Additionally, as many of the items assess behaviors that may be observed by others, equivalent other-report measures may be also developed, giving researchers the opportunity to triangulate their data.
There was no ceiling or floor effect on the AOM for the sample used in this analysis, but the person mean was relatively high compared to the item mean (1.37 logits above the item mean set to zero). This could reflect an underlying interest in awareness on the part of the people who volunteered for this study, resulting in them reporting a relatively high level of experience with the positive outcomes associated with awareness. It would be beneficial to explore the properties of the scale in a clinical sample or a sample that perhaps does not equally value the importance of awareness.
Overall, the AOM has excellent reliability and meets the expectations of the unidimensional Rasch model. In addition, item-person thresholds indicate thorough coverage of awareness outcomes, reflecting excellent discrimination between awareness outcome levels. Given that the AOM was based on a thorough and wide-ranging conceptual development and has demonstrated good psychometric properties and invariance across three English-speaking samples, further work could be carried out to translate and validate the measure in other languages and populations and thereby extend its utility.