Validation of the Comprehensive Inventory of Mindfulness Experiences (CHIME) in English Using Rasch Methodology

Although mindfulness has been studied for multiple decades, psychometric research has yet to agree upon the optimal way to measure the mindfulness construct. Prior research has identified eight distinct aspects of mindfulness that were not adequately captured by any of the available measures. Hence, the Comprehensive Inventory of Mindfulness Experiences (CHIME) was developed. The CHIME contains 37 items and was originally developed in the German language. The CHIME has demonstrated excellent psychometric properties in both German and Dutch, but so far, no English version has been validated. The purpose of the present study was to investigate the psychometric characteristics of the translated English-language CHIME scale using Rasch methodology. The current study utilized Partial Credit Rasch analysis to evaluate the psychometric characteristics of the English CHIME. The sample included responses from 620 participants from the general population residing in the USA. The validity of the English CHIME was examined by correlating its scores with various measures of mindfulness and psychological functioning. Initial Rasch analysis of the English CHIME showed poor model fit, local dependency, and evidence against the assumption of unidimensionality. Several minor modifications, that involved creating super-items, were required to fit the Rasch model (χ2(45)=31.99, p=0.93). This model displayed evidence of unidimensionality, invariance across personal factors, and a high reliability (PSI=0.92). Ordinal-interval transformation tables were produced, which increase the English CHIME’s precision of measurement. The English CHIME’s external validity was established by moderate–high correlations with other measures of mindfulness and various measures of psychological functioning. The results of this study provide evidence for the validity of the English CHIME scale, which can be used to assess the overarching construct of mindfulness.

Attention and Awareness Scale (MAAS; Brown & Ryan, 2003), and other available mindfulness measures. The FFMQ includes five subscales: Describing, Observing, Non-judging, Non-reacting to inner experience, and Acting with awareness. However, despite the well-established efficacy of the FFMQ, an analysis of mindfulness measures conducted by Bergomi et al. (2013) revealed that there were aspects of mindfulness that were not adequately captured by the FFMQ or any other available mindfulness assessment. Hence, to address this issue and create a complete measure of mindfulness, the Comprehensive Inventory of Mindfulness Experiences (CHIME) was developed (Bergomi et al., 2014). The CHIME includes all the components of mindfulness highlighted by Bergomi et al. (2013) and is rooted within the relevant theoretical frameworks . The CHIME comprised eight subscales, which measure awareness of internal experiences, awareness of external experiences, acting with awareness, accepting non-judgemental attitude, nonreactive decentring, openness to experience, awareness of thoughts' relativity, and insightful understanding.
Since its development, the CHIME has been shown to have excellent psychometric properties. Specifically, the CHIME has been validated using participants from a Mindfulness-Based Stress Reduction (MBSR) intervention in addition to participants from a community sample (Bergomi et al., 2014). The CHIME demonstrated high internal consistency (α ranging from 0.70 to 0.90) in addition to high reliability (test re-test reliability, with r ranging from 0.70 to 0.90). Confirmatory factor analysis (CFA) was run on a different sample, which confirmed that the 8-factor structure was appropriate. Additionally, each CHIME item demonstrated invariance of measurement across the personal factors included in the study, indicating that there was no reliable difference in the participant's ability to understand the CHIME items. The external validity of the scale was established by a high positive correlation with the FFMQ (r=0.85), as well as moderate negative correlations with depression (−0.46), anxiety (−0.39), and stress (−0.40) (Bergomi et al., 2014).
The CHIME scale has two main advantages over other existing mindfulness measures. Firstly, it was developed with a strong theoretical grounding in traditional conceptualizations of mindfulness (Bergomi et al., 2014;Krägeloh et al., 2019). This contrasts with the FFMQ, which was constructed by applying factor analysis to other available mindfulness measures (Baer et al., 2006). Subsequently, most of the FFMQ's items are derived from the MAAS and the KIMS; hence, the FFMQ inherits the methodological flaws inherent in these scales. Specifically, the MAAS has been heavily criticized in mindfulness literature as not being a measure of mindfulness, but rather a measure of mindlessness/inattention (Bergomi et al., 2013). Additionally, the KIMS was developed using a conceptualization of mindfulness as it is outlined in Dialectical Behavior Therapy (DBT; Linehan, 1993), which is an intervention primarily used to treat symptoms of borderline personality disorder. To address these issues, the CHIME was developed using a conceptualization of mindfulness as it is outlined in Eastern spiritual traditions. Secondly, the CHIME incorporates a wider range of characteristics, such as awareness of internal and external experiences, acting with awareness, accepting and non-judgmental attitude, nonreactive decentring, openness to experience, awareness of thoughts' relativity, and insightful understanding, that define and operationalize mindfulness more comprehensively than previous measures (Bergomi et al., 2014). Most of the prior mindfulness measures incorporate the mindfulness factors of awareness attention and non-judgmental attitude. However, in traditional conceptualizations, mindfulness is described as a path that develops an understanding of reality which can be used to alleviate suffering and enhance well-being (Harvey, 2013). Hence, it has been argued that a complete measure of mindfulness must incorporate these wisdom factors in addition to the non-judgment and attitude factors. These wisdom factors are incorporated into the CHIME as insightful understanding and awareness of thoughts' relativity.
The scale was originally developed in the German language and has been validated using both traditional methods and Rasch analysis (Bergomi et al., 2014;Medvedev et al., 2019). These analyses confirmed that the scale has excellent psychometric properties. Additionally, a Dutch translation of the CHIME, as well as a short version, has been developed and validated using classical test theory methods (Cladder-Micus et al., 2019). However, at present, there is only an English-language CHIME version adapted for children and adolescents (Johnson et al., 2016), whereas no English version for adults has been developed and validated to date. Even though the facets of the CHIME are based on relevant theoretical frameworks and translated using appropriate methodology, it cannot be assumed that facets would have the same psychometric properties as the validated German version. Such assumption is weak due to being subject to sampling error and variations between populations. Hence, further investigation is needed to evaluate the psychometric characteristics of the CHIME after it has been translated into English.
The narrative of developing an ordinal measure of mindfulness with the same properties as height or temperature is highly desirable while debatable in the current literature. First, height, temperature, and related measures are considered ratio-level scales in the Stevens classic system for variables (due to a known intercept for the scale representing the absence of the variable). This is rarely the case for psychological variables, including mindfulness. Second, considering psychometric variables like mindfulness as scales (interval or ratio) is highly debatable, with relevant contributions arguing that these variables are ordinal variables (Beaujean et al., 2018). Therefore, Rasch analysis is especially suitable for evaluating and improving ordinal scales properties due to several key advantages over classical test theory methods (Rasch, 1961;Tennant & Conaghan, 2007).
The Rasch model accounts for both the underlying ability of the person and the difficulty of each item (Bond & Fox, 2007), allowing interval scores to be derived from ordinal questionnaires. This ordinal-interval conversion increases the precision of measurement and allows for parametric statistics to be conducted (Brogden, 1977;Rasch, 1961). Sandham et al. (2019) used the process of squeezing vitamin C from different fruits as a metaphor to explain how the Rasch model works. In this analogy, different fruits, such as apples, blueberries, and bananas, represent different items on a scale. Each fruit (item) contains differing levels of vitamin C (the latent trait being measured) and therefore contribute with different amounts of vitamin C to the overall (trait) smoothie being produced. In the same way, different items of a scale contribute different amounts of the latent trait (which in the present study is mindfulness ability) to the overall score. Rasch analysis allows for the measurement of a latent construct (vitamin C) to the extent that it is contained within each item (fruit) while also filtering out other constructs (other vitamins and minerals), manifesting as fit residuals in the model. The increased precision of measurement is demonstratable by comparing the accuracy of the original ordinal-level scores with their Rasch-converted interval equivalents (Norquist et al., 2004).
In fact, the Rasch model (Rasch, 1960) was proposed 8 years before the first introduction of the item response theory (IRT) in 1968, and uses the same computational algorithms as the one-parameter IRT model (Lord & Novick, 1968;Rasch, 1960). This basic Rasch/IRT model accounts for both a person's ability and item difficulty and complies with the fundamental measurement principles outlined by Thurstone (1931). These principles require equal units of measurement across the continuum of a scale, scale invariance by personal factors, and evidence of unidimensionality in measuring the overarching latent trait as operationalized by scale items and facets. The key difference between multidimensional IRT models (e.g., 2 or more parameter models) and Rasch models is that the latter strictly adhere to the principles of fundamental measurement, whereas two or more parameter IRT models use additional parameters that potentially inflate model fit by adjusting the model for the sample data (Hobart & Cano, 2009). For this reason, Rasch model is more suitable for transforming ordinal responses into interval-level scale.
While latent variable techniques such as CFA can also be used to estimate the ability of persons and difficulty of items using intercepts and thresholds, these techniques do not provide accurate conversion of ordinal scores to intervallevel data, which is a unique advantage of one-parameter IRT and Rasch models (Hobart & Cano, 2009). Another benefit of Rasch modelling is reduced measurement error, where participant parameter estimations usually have a lower standard deviation. The lower standard deviation also allows for more precise assessment of change in individual scores, making the measurement more accurate and reliable.
As opposed to item parcelling used in traditional methods such as CFA and exploratory factor analysis (EFA), the super-item models (or testlets) used in Rasch analysis are generally based on residual correlations and aim to resolve the local dependency caused by spurious correlations or method effects (Oyler et al., 2022). Items are considered to be locally dependent if they have residual correlations above 0.20, which is identified by examining a residual correlation matrix. Local dependency affects the overall fit of the model and may lead to spurious correlations between items, which interferes with unidimensionality (Christensen et al., 2016;Wainer & Kiely, 1987). When items share variance that is not directly related to the central construct that is being measured, they are usually locally dependent, for example, facets of the FFMQ . Hence, using super-items helps to solve the problem of local dependency between the items and achieve an acceptable fit of the Rasch model, which increases reliability. It should be noted that multidimensional scales cannot achieve an acceptable fit to the unidimensional Rasch model, and meeting strict criteria of the Rasch model is taken as an evidence for measuring one overarching latent trait regardless how the items/superitems are combined within the model (Balalla et al., 2019;Mitchell-Parker et al., 2017). In more traditional techniques such as CFA, the psychometric properties of analyzed scales remain unmodified, whereas Rasch analysis can transform the ordinal data into true interval-level scores, similar to those used to measure physical phenomena such as temperature, weight, or height. In Rasch analysis, all of the superitem modifications are included and accounted for in the conversion algorithms .
As is evident from the current literature, the application of Rasch analysis is providing a growing contribution to the field of mindfulness assessment Medvedev & Krägeloh, 2022). For example, Rasch analysis is increasingly being used to explore the psychometric characteristics of mindfulness measures (Goh et al., 2017;Sauer et al., 2013). In 2013, Sauer et al. utilized Rasch analysis, analyzing the properties of the Freiburg Mindfulness Inventory (FMI-14, Walach et al., 2006). After discarding one item, they achieved a Rasch model fit. However, a better fit was obtained after the two dimensions of "presence" and "acceptance" had the Rasch model applied to them individually. On a similar note, both Medvedev, Siegert, Feng, et al. (2016) andGoh et al. (2017) applied Rasch analysis to assess the characteristics of the MAAS (Brown & Ryan, 2003). Interestingly, their approaches to fit the Rasch model were quite different. Goh et al. (2017) removed 5 items that were misfitting, whereas Medvedev, Siegert, Feng, et al. (2016) only discarded 2. Medvedev, Siegert, Feng, et al. (2016) were able to reduce the number of items they discarded by using super-items to resolve issues of local dependency. In contrast,  conducted Rasch analysis on the KIMS (Baer et al., 2004) but were unable to find a fit for the overall scale, even after they had discarded 5 items that were misfitting. Medvedev, Siegert, Feng, et al. (2016), Medvedev, Siegert, Kersten et al. (2016 ended up resolving this issue and achieved an adequate fit by applying the Rasch model to each of the KIMS subscales individually. Rasch analysis was also used to evaluate the psychometrics of the FFMQ . To fit the Rasch model, two misfitting items of the FFMQ were discarded. Additionally, each of the individual FFMQ subscales was summed to form super-items. After these modifications, the best model fit was obtained. Finally, although it is not solely a measure of mindfulness, the Self-Compassion Scale (SCS; Neff, 2003) was also scrutinized using Rasch methodology (Finaulahi et al., 2021). Although no SCS items were significantly misfitting the Rasch model, the individual items were found to be locally dependent, which affected the overall model fit. Once again, this problem was solved by summing the SCS items to form four higher-order super-items.
By utilizing modern Rasch methodology, the present study aimed to evaluate the psychometric characteristics of the CHIME, after it was translated into English. A product of this analysis is to develop ordinal-interval conversion algorithms, increasing the CHIME's accuracy of measurement, which will aid in future mindfulness research. To establish construct validity of the measure, the present study correlated the CHIME scores with other measures that are expected to be related to the construct of mindfulness. If the CHIME successfully measures the construct of mindfulness, then it should be positively correlated with other validated measures of mindfulness (i.e., the FFMQ). As mindfulness has been shown to promote well-being (Bennett & Dorjee, 2016), mindfulness measures should be positively correlated with measures of well-being (Satisfaction with Life Scale and Positive Affect) and negatively correlated with measures of ill-being (Negative Affect, Depression Anxiety Stress Scales). Informed by previous studies utilizing Rasch analysis, we hypothesized that, if the individual items and subscales of the English CHIME adequately represent the underlying mindfulness construct, then a Rasch model fit will be obtained if the subscales are treated as super-items. Additionally, positive relationships are expected between the CHIME and the FFMQ, as well as measures of well-being. Negative relationships are expected to be found between the CHIME and measures of ill-being.

Participants
Out of the 760 English-speaking participants from the general population residing in the USA who responded to the survey, 140 did not complete all the study measures and were therefore not included in the subsequent analyses. The 620 participants sample selected for the study was identified to minimize the probability of type-1 and type-2 errors. Typically, overfitting occurs with larger sample sizes (e.g., n=1000) while a lower sample size (n<250) may be insufficient to calibrate items (Hagell & Westergren, 2016). The sample comprised 301 (48.5%) males, and 238 (38.4%) participants indicated that they meditated or engaged in some form of contemplative practice (such as Yoga or Tai-Chi). In this sample, 503 participants (81.1%) were White American, 69 (11.1%) were African American, 26 (4.2%) were Asian American, and 22 (3.6%) identified as other. All participants were proficient in English and were living in the USA at the time of data collection. The ages of the participants spanned between 18 and 70 years old (M=41.65, SD=13.07). To determine whether there was any differential item functioning (DIF), participant data was sorted into different age categories. The categories utilized were 18-34, 35-49, and 50-70.

Procedure
Data were collected through an online survey using Qualtrics Research Services in March 2018 and targeted participants of both sexes (male/female split of 50/50). The average survey completion time was estimated by the work of five volunteers who filled out the survey and reported the length of their work. An average completion time of 15 min was used as a parameter for data collection. The online data collection lasted up to 10 days, and each participant was compensated with US$5 after completing the survey. Each participant had their IP address recorded, preventing the same participant from filling out the survey multiple times. The IP address collection confirmed that the survey was distributed throughout the country, guaranteeing good sample representation. This research was approved by the author's institutional ethics review board. All participants who were involved in the study provided informed consent to participate in the research.

Measures
The Comprehensive Inventory of Mindfulness Experiences (CHIME) is a self-report questionnaire containing 37 items. It includes 8 subscales that measure different aspects of mindfulness: awareness of internal experiences, awareness of external experiences, acting with awareness, accepting non-judgmental attitude, nonreactive decentring, openness to experience, awareness of thoughts' relativity, and insightful understanding (Bergomi et al., 2014). The measure utilizes a 6-point Likert scale format, ranging from almost never=1 to almost always=6, and negatively worded items (7,10,17,19,22,26,30,33, and 36) require reverse coding before subscale scores can be calculated. The total score is computed by summing the answers from each individual CHIME item. Higher scores on the CHIME correspond to higher levels of the underlying mindfulness construct. A routine check of reliability was conducted before the Rasch analysis. The overall scale had a strong internal consistency reflected by both Cronbach's alpha and McDonald's Omega of 0.95, suggesting that it is a good candidate for a unidimensional Rasch model.
The translation of the original German CHIME scale into English was completed in multiple stages, adhering to the recommendations outline by Hambleton, (2005) and the guidelines laid out by the International Test Commission (2017). The original CHIME authors (German native speakers who also spoke English fluently) translated the CHIME items into English and then the translated items were reviewed by two English speakers who were familiar with meditation. The translated items were then given to a professional translator, who translated the items back into German. The back-translated items were compared with the original CHIME items to assess whether the translation was successful. Considering the validity of cultural and language differences between the two populations, the CHIME items were further adapted before running the validation study. Twenty graduate psychology students originally from the USA (8 males, 12 females) rated the CHIME items according to their readability and understandability. Participants were also asked to rewrite the items according to their understanding. This small sample survey was then subjected to a qualitative analysis in which a group of six mindfulness researchers and practitioners analyzed item by item, adapting them as needed, according to the participants' feedback.
The SCS is a measure which contains 26 items and assesses a person's underlying amount of self-compassion (Neff, 2003). The measure includes 6 subscales: Self-Kindness, Self-Judgement, Common Humanity, Isolation, Mindfulness, and Over-Identification. Questions are answered on a 5-point Likert scale, from almost never=1 to almost always=5. Half of the items are negatively worded (1, 2, 4, 6, 8, 11, 13, 16, 18, 20, 21, 24, and 25) and require reverse coding before total and subscale scores can be calculated. There are multiple versions of the SCS and the abbreviated version containing 12 items, the Self-Compassion Scale Short Form (SCS-SF; Raes et al., 2011), was used in this study.
The Satisfaction with Life Scale (SWLS) is a brief inventory including five items to measure a person's self-reported life satisfaction (Diener et al., 1985). Items are scored on a 7-point Likert scale, from strongly disagree=1 to strongly agree=7.
The Depression Anxiety Stress Scales (DASS) is a selfreport measure that assesses the negative emotion facets of depression, anxiety, and stress (Lovibond & Lovibond, 1995). The measure includes 42 items, 14 for each subscale. Scoring of the items takes place on a 4-point Likert scale, ranging from Did not apply to me at all=0 to Applied to me very much or most of the time=3. In addition to the 42-item questionnaire, a shorter version of the scale (the DASS-21) containing 21 items is also available (Antony et al., 1998). Both versions of the scale have been shown to have high internal consistencies and robust psychometric properties. The DASS-21 was utilized in the present study.
The Positive and Negative Affect Schedule (PANAS; Watson et al., 1988) is a short list of adjectives that describing different emotions and feelings. Participants are instructed to indicate the degree to which they have felt the emotions/feelings over the past week, and answers are recorded on a 5-point Likert scale. Scores range from Very slightly or not at all=1 to Extremely=5. Once the survey is complete, answers from all the positive emotion adjectives are added to create the Positive Affect scale, and answers from all the negative emotion adjectives are added to create the Negative Affect Scale.

Data Analyses
Prior to Rasch analyses, we have examined psychometric properties and reliability of the CHIME facets using IBM SPSS v27. Rasch analysis was conducted using RUMM2030 software (Andrich et al., 2009). The Rasch analysis was performed following procedure outlined elsewhere . A likelihood-ratio test was computed on the initial analysis output for the CHIME scale prior to the main 1 3 analysis. The likelihood-ratio test supported the suitability of the unrestricted partial credit version of the Rasch model (p<0.001) (Masters, 1982). Rasch analysis was conducted for the full scale, where all the subscales were treated as super-items, following the methodology of Lundgren- Nilsson et al. (2013).
Unlike classical test theory methods, the model fit in Rasch analysis indicates that the scale complies with the fundamental principles of measurement, such as invariance across personal factors, unidimensionality, and the same measurement units across scale continuum (Oyler et al., 2022). Therefore, obtaining the best model fit in Rasch analysis is desirable as this will minimize potential deviations from the interval scale parameters (Hobart & Cano, 2009). The ideal Rasch model will have a mean close to 0.00 and a standard deviation close to 1.00 for the overall and person fit residuals (see Balalla et al., 2019, for an overview of these criteria). Individual item fit residuals are expected to range from −2.50 to +2.50. Trait-item interaction, reflected by an overall and individual Chi-square fit statistic, should be non-significant (p<0.05, Bonferroni adjusted). The residuals correlation matrix should display no evidence of local dependency between individual items. A correlation magnitude of 0.20 relative to the mean residual correlation indicates local dependency (Christensen et al., 2016), an issue which can be resolved by combining the dependent items into super-items (Lundgren-Nilsson et al., 2013;Wainer & Kiely, 1987). In the Rasch model, personal factors should not produce any significant DIF. The person separation index (PSI) reflects how precisely subjects have been spread out along the measurement construct defined by the items and is used to test the reliability of subscales in Rasch analysis. PSI is interpreted in a similar way to Cronbach's alpha (Tennant & Conaghan, 2007).
In Rasch analysis, dimensionality is typically investigated by using independent-samples t-tests to compare person estimates for two item groups. In this method, the highest positive and negative factor loadings are on the first principal component of the residuals after the latent factor has been removed (Smith, 2002). If the percentage of significant t-test comparisons does not exceed 5%, or if the 5% cutoff point is overlapped by the lower bound of a binomial confidence interval (computed for the number of significant t-tests), then the scale is considered unidimensional (Tennant & Pallant, 2006).

Results
To ensure that the English CHIME version is suitable for application of the unidimensional Rasch model, we have conducted the overall unidimensionality assessment using parallel analyses (Timmerman & Lorenzo-Seva, 2011), which supported the overarching mindfulness construct operationalized by English CHIME items. Mean of item residual absolute loadings (MIREAL = 0.28) was below 0.30, while unidimensional congruence (0.87) and explained common variance (0.77) were approaching benchmarks set up for strict unidimensionality. Similarly, all facets of the CHIME demonstrated unidimensionality as evidenced by McDonald's omega (ω=0.70-0.91) and high loadings on each single facet factor (0.42-0.86) with exception of Accepting nonjudgmental attitude (ω=0.56) and Item 12 from the Acting with awareness facets (<0.40) that were retained due to their importance for the overall construct validity of the scale. Table 1 shows the fit statistics of the Rasch model from the main analysis. The English CHIME scale displayed poor model fit in the initial analysis, with a significant Chi-square indicating that there was a deviation from the expectations of the Rasch model (χ 2 (185)=1111.39, p<0.001). Although the reliability of the scale was strong (PSI=0.95), there was evidence against the assumption of unidimensionality (Table 1). Table 2 shows the fit statistics for each individual item. Notably, several items were misfitting, with fit residuals outside the acceptable range of −2.50 to +2.50. The misfitting items included Items 1 ("When my mood changes, I notice it right away"), 10 ("I break or spill things because I am not paying attention or am thinking of something else"), 11 ("I see my mistakes and difficulties without judging myself"), 17 ("In everyday life, I get distracted by many memories, images, or daydreams"), 19 ("I try to stay busy to avoid specific thoughts or feelings from coming to mind"), 26 ("When I read, I have to reread paragraphs because I was thinking of something else"), and 36 ("I resent my own mistakes and weaknesses").

3
The residual correlation matrix indicated that there were high residual correlations (above 0.20) between items representing the 8 mindfulness facets. Hence, the individual English CHIME items were combined to form 8 superitems, which were then treated as individual items in the Rasch analysis, a method that is now becoming common practice (Lundgren-Nilsson et al., 2013;Medvedev et al., 2018). After creation of the 8 super-items, the fit improved considerably, the reliability remained high (PSI=0.90), and strict unidimensionality was confirmed (Table 1, 8 superitems). However, the Chi-square remained significant (χ 2 (72)=170.25, p<0.001), indicating deviation from the Rasch model.
Examination of the correlation matrix between the 8 super-items revealed local dependency between 6 out of the 8 super-items (above 0.20). To address this issue, the locally dependent super-items were reconfigured to form 3 new super-items as follows: 2 (awareness of external experiences) was combined with 4 (accepting non-judgmental attitude), 3 (acting with awareness) was combined with 8 (insightful understanding), and 5 (nonreactive decentring) was combined with 6 (openness to experience). After creating these higher-order super-items, the excellent fit was obtained (χ 2 (45)=31.99, p=0.928). This model displayed high reliability (PSI=0.92), and unidimensionality was confirmed (Table 1, Final). DIF was also examined for the best-fitting model, and the scale was found invariant across personal factors, including meditation practice, gender, and age. All the individual super-items displayed excellent fit to the overall Rasch model (Table 3). The distribution of person-item thresholds of the English CHIME is illustrated in Fig. 1. The plot demonstrates excellent targeting of the sample by item thresholds (M=0.31, SD=0.50), with person mean slightly above item mean. There were no detectable ceiling or floor effects, with person abilities perfectly covered by the scale items. Overall, Fig. 1 indicates that there is a good combination of both easy and difficult items in the English CHIME scale. Table 4 shows the conversion algorithms that allow for ordinal scores to be converted into interval scores for English CHIME scale. The conversion algorithms are based on the final analysis using 5 super-items. In the conversion table, the ordinal scores are presented in the first column, logit unit interval scores are presented in the second column, and in the third column, the interval scores are presented in the original scale metric. To use the conversion table, (1) reverse code the negatively worded items, (2) compute the total score by summing the individual scores of items, and (3) find the equivalent interval level scores in logits in the second column and the original scale metric in the third column. Note that these conversions are not able to be performed for participants who have responses that are missing.
To compare the original ordinal English CHIME scores with the Rasch-transformed interval scores, a paired-samples t-test was run. The data used for t-tests comparisons met the common assumptions of normality with skewness and kurtosis values within the acceptable range from −2.00 to +2.00. The t-test revealed that the difference between ordinal scores (M=145.95, SD=29.16) and the interval scores (M=136.26, SD=19.59) was significant, t(619) =−18.93, p<0.001, d=−0.76. Note that the standard deviation of the interval scores was noticeably smaller than the ordinal scores, providing additional evidence that the measurement error was significantly reduced by using super-items. Independentsamples t-tests were then conducted comparing meditators and non-meditators scores, using the original ordinal scores and scores that had undergone the Rasch interval transformation. Meditators' ordinal scores (M=153.66, SD=29.62) were found to be significantly higher compared to non-meditators' ordinal scores (M=146.18, SD=29.62),   00, p=0.047, d=0.26. These results provided evidence that the English CHIME accurately differentiates between mindfulness levels, as people who practice meditation tend to score higher on the scale than those who do not practice meditation. Table 5 shows the correlations between the English CHIME total interval scores and other relevant psychometric measures. All the additional measures included in Table 5, excluding the SWLS, were converted into interval scores using the available Rasch transformation algorithms. All scales data met the standard assumptions of normality for Pearson's correlation, with skewness and kurtosis values within the acceptable range from −2.00 to +2.00. All correlations were in the expected directions, except for the relationship between the English CHIME total score and DASS-Anxiety, which remained non-significant. The external validity of the English CHIME was demonstrated by high positive correlations with both the FFMQ and SCS, and a moderate positive correlation with positive affect. As expected, the English CHIME correlated negatively with stress, depression, and negative affect.

Discussion
Using modern Rasch methodology, the present research validated the English CHIME measure and developed ordinalto-interval transformation tables to be used in future mindfulness research. The English CHIME instrument displayed evidence of internal structural validity, unidimensionality, high reliability, and external validity. Several items in the initial analysis were misfitting the Rasch model. This issue was addressed by summing the individual items to form super-items. The utilization of super-items effectively reduced the measurement error due to spurious correlations between items and possible method effects (Finaulahi et al., 2021), resulting in the best fit. After the best fitting model was achieved, ordinal-to-interval transformation algorithms were developed, which may be used to measure mindfulness and the effects of MBIs more precisely in future research.
Rasch transformation is highly desirable in the case of multifaceted measures because it reduces the error of measurement, while also accounting for the different contributions of items and subscales to the overarching construct (i.e., mindfulness as measured by the English CHIME). A major limitation of ordinal measures is that they do not consider the difficulty (location) of items when calculating the total scores. Not considering item difficulty decreases the accuracy of assessment and increases the amount of measurement error (Bond & Fox, 2007;Norquist et al., 2004). As a result of the Rasch transformation, ordinal English CHIME scores may be converted into interval scores by using the transformation algorithms displayed in Table 4, allowing for parametric statistics to be conducted. This ordinal-to-interval transformation is achieved by accounting for unwanted measurement error unrelated to the overarching mindfulness construct, increasing the accuracy of the English CHIME, without modification to the original scale format. This conversion can be conducted for the full scale if there are no missing data.
The Rasch transformation algorithm allows for the measurement of mindfulness to the extent by which it is reflected by each individual item or super-item, while also filtering out the irrelevant influences associated with other constructs (e.g., personality) along with methodological errors. For example, Item 10 ("I break or spill things because I am not paying attention or I am thinking of something else") appears to express mindfulness to a larger extent, as reflected by a positive location of 0.65, than Item 27 ("I notice sounds in my environment, such as birds chirping or cars passing"), which has a negative location of −0.55. The same is true at  super-item level. Using the vitamin C example by Sandham et al. (2019), vitamin C, a super-item, could be compared to a smoothie made up of fruits which have differing levels of vitamin C. The super-item (smoothy) thresholds are estimated by using the combined responses of all the included items (fruits). This example is useful to illustrate that the super-items and their combinations used in the current study refine the measurement of the overarching mindfulness construct (as captured by the English CHIME items) and have no implications for either the factor structure of the English CHIME or the use of its individual subscales. A reader should be mindful that, by using individual subscales, they are only able to measure a specific facet that is relevant to mindfulness. The initial exploration of the English CHIME scale revealed that the baseline model showed promising psychometric properties. However, the significant Chi-square showed a Rasch model misfit, with several items exhibiting issues of local dependency. Hence, the individual English CHIME items were summed to form 8 super-items, following the methodology previously utilized by Lundgren-Nilsson et al. (2013). A significant advantage of this approach is that the issue of local dependency can be dealt with without the deletion of any items. Deleting items can affect the construct validity of a scale and therefore should be avoided whenever possible. Even though the English CHIME retained its excellent properties after these modifications, the Chi-square remained significant. After some investigation, it was determined that six out of the 8 super-items showed issues of local dependency. Hence, the model was adjusted, and instead the English CHIME items were summed to form 5 super-items. After this adjustment, the Chi-square became non-significant, and the best Rasch model fit was obtained.
Compared with other mindfulness measures, the English CHIME required only marginal modification to fit the Rasch model. This is in contrast with the Rasch exploration of the KIMS (Medvedev, Siegert, Kersten, et al., 2016), where the fit was only achieved at subscale level after the deletion of five items. On a similar note, the FFMQ required the deletion of two items to fit the Rasch model, with the five facets summed to form super-items . The English CHIME, on the other hand, did not require the deletion of any items to obtain the best fit. This result is consistent with the Rasch analysis conducted on the original CHIME scale in German, which also did not require the deletion of any items . However, in the German scale, the adequate fit was obtained by forming 8 super-items, whereas in the present study, the adequate fit was achieved after forming 5 super-items. This discrepancy is likely an artifact of translating the scale from German to English. Although both the German and English scales displayed exceptional psychometric properties, the results of the present research indicate that the CHIME items may present with a different pattern of local dependency when translated into different languages. However, in terms of its overall structure and scoring as a unidimensional profile, there is no difference between the German and English CHIME versions.
In following the original validation study , the English CHIME showed correlations in the expected directions with the other psychometric measures, apart from the correlation with DASS-Anxiety, which remained non-significant. Unsurprisingly, the English CHIME showed convergent validity with the FFMQ. Although the magnitude of the correlation was high (0.66), indicating that the scales reflect the overlapping construct, the English CHIME explains a unique amount of variance not covered by the FFMQ. The English CHIME was found to be positively correlated with negative affect, stress, and depression. These results are consistent with the substantial literature findings, which demonstrate higher mindfulness levels are linked with better psychological functioning (Fumero et al., 2020;Morton et al., 2020;Thomas et al., 2020). The non-significant correlation between the English CHIME total score and DASS-Anxiety is unexpected and may be explained by the relatively low anxiety levels in the current ample. However, this will require further investigation to verify whether the effect is consistent across different samples.
A significant difference was found between the English CHIME total scores of meditators and non-meditators. This effect was consistent when examining the original ordinal data and the Rasch transformed interval data. Although the overall effect sizes were similar, the result does not undermine the benefits of conducting the ordinal-to-interval conversion because the transformation increases the precision of measurement. Furthermore, this result demonstrates that the English CHIME maintains its excellent psychometric properties, even without the ordinal-interval conversion.
No differences were found in the functioning of the English CHIME items across the personal factors of meditation practice, gender, and age. Additionally, the person-item plots show that the English CHIME items discriminate near perfectly between different levels of meditation experience, meaning that the scale functions just as well for meditators and non-meditators. These results are identical to those found in the original validation of the CHIME scale, and taken together, provide strong evidence that the CHIME maintains its reliability and validity across different populations.

Limitations and Future Directions
The primary limitation of the present research stems from the use of online survey data collection. Although the use of online data collection is becoming more prevalent, it can present research with unique challenges. In the present research, participants voluntarily signed up to participate in the study. Even though the proportion of population engaging in mindfulness practice is growing, the sample may still be affected by self-selection bias, and therefore been skewed towards individuals who were familiar with the concept of mindfulness. This effect is evidenced by the relatively high number of participants who indicated that they practiced mindfulness or engaged in some form of contemplative practice (38.4%). The higher-than-average level of mindfulness in this sample could distort the correlations between other psychometric measures, an effect which may explain the non-significant relationship between the English CHIME total score and DASS-Anxiety. Future research could address this issue by selecting participants based on different sampling characteristics. In addition, we could not provide any solid evidence on the nature of the correlations, and they remain assumed as a residual phenomenon observed in research based on self-reported measures. When multiple constructs are measured using common methods (e.g., multiple-item scales presented within the same survey), this may result in spurious correlations due to the assessment tools rather than to the constructs being measured (Podsakoff et al., 2012).
The present study included participants from the general population; and therefore, future research should seek to replicate this research in different population types, such as people diagnosed with affective conditions or other psychological disorders. Both samples in this study consisted largely of participants who identified themselves as White American. Future research could use more diverse samples to generalize this study's findings to other cultural and ethnic groups. Nonetheless, Rasch analysis is less dependent on sample characteristics compared with other traditional methods (e.g., factor analysis) (Tennant & Conaghan, 2007). Additionally, the sample utilized in the present study was large enough to ensure the robustness and generalizability of these results. For its German-language version, the CHIME has been shown to be stable over time, with little evidence that items are interpreted differently before and after a mindfulness intervention (Krägeloh et al., 2018). Such evidence is important as it demonstrates that the questionnaire is evaluated using the same standard at both measurement times. Future work will need to confirm the absence of such a response shift in the English CHIME version.
Perhaps the most interesting distinction between the validation of the original German scale and the translated English scale is the different super-item structure required to obtain the best Rasch model fit. In the original German validation , adequate fit was confirmed when the individual CHIME items were summed to form 8 super-items, which is in line with the subscales of the original measure. However, in the present study, the fit was achieved after summing the items to form 5 superitems. Taken alone, this result suggests that the CHIME may be affected by spurious residual correlations that are different in English compared to German. The question then is, do the linguistic conceptualization and understanding of mindfulness itself differ significantly across different languages and cultures? Further research is required to address this question. Replications of the psychometric analyses for all of the available languages of the CHIME are also necessary to ensure that results are generalizable and not due to overfitting (Nosek et al., 2022). Although some of the differences found between the German-and English-language versions are relatively minor, future work could explore to what extent such differences may be replicable when the two versions are directly compared in a pooled psychometric analysis.
The facets of the CHIME represent the relevant aspects of the overarching mindfulness construct as developed by Bergomi et al. (2014) and are theoretically supported in Bergomi et al. (2013). The relevance of these facets to the overarching mindfulness construct was demonstrated by Rasch analysis of the German CHIME version by Medvedev et al. (2018). This study aimed to validate the CHIME as a measure of the overarching mindfulness construct that can be measured using one single interval level score, like physical phenomena such as height, weight, temperature, and speed. We acknowledge that measuring individual facets included in the CHIME might be beneficial for specific purposes; however, none of these facets would represent a valid construct of mindfulness by themselves. Therefore, in this study, we focused on developing an overarching mindfulness scale that produces a single interval level mindfulness assessment score.
As our understanding of the benefits of mindfulness continues to grow, it is becoming increasingly valuable for researchers and clinicians to be able to measure the mindfulness construct accurately. The present study utilized Rasch analysis methods to explore the psychometric characteristics of the English CHIME scale. The results demonstrate that the CHIME maintains its structural validity and reliability after being translated from German to English. However, a slightly different super-item structure was used for the English CHIME compared to the German CHIME, which may be related to error variances associated with linguistic differences or other unknown method effects and requires further investigation. Therefore, replications of the psychometric analyses for all of the available languages of the CHIME are also necessary to ensure that results are generalizable and not due to overfitting (Nosek et al., 2022). Although some of the differences found between the German-and English-language versions are relatively minor, future work could explore to what extent such differences may be replicable when the two versions are directly compared in a pooled psychometric analysis. Funding Open Access funding enabled and organized by CAUL and its Member Institutions Data Availability The data used for this study is available from the corresponding author upon a reasonable request.

Declarations
Ethical Approval This study was approved by the Pacific University Institutional Research Board (protocol number 131-17), which follows internationally recognized ethical standards.
Informed Consent All participants provided informed consent to participate in the study.

Conflict of Interest
We declare no conflicts of interest in connection with this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.