Introduction

The INSPIRE questionnaire (Williams et al., 2015) was designed to assess service users’ experiences of support from health professionals in their personal recovery. The questionnaire consists of two sub-scales, Support and Relationships, and it is based on two studies containing theoretical foundations for a measure of staff support for personal recovery. In the first study, including a review of 97 publications (Leamy et al., 2011), the authors performed a systematic review and a narrative synthesis of relevant models and frameworks, resulting in five domains of the recovery process: Connectedness, Hope, Identity, Meaning and purpose, and Empowerment (CHIME). The second study (Le Boutillier et al., 2011), based on a qualitative analysis of 30 international documents, provided a theoretically justifiable classification of domains of recovery support. The INSPIRE was developed by drafting the questions, refining and reducing the initial list of questions, piloting a questionnaire, and finally psychometrically evaluating it (Williams et al., 2015). The final version of the Support sub-scale comprises the five key domains of the recovery process, as defined in CHIME, each consisting of four questions, while the Relationship sub-scale exhibits a one-dimensional structure.

Convergent validity of the questionnaire was assessed by correlating the INSPIRE sub-scales with two validated questionnaires: the Support sub-scale with the Satisfaction Index-Mental Health, a 12-item service user-rated measurement of satisfaction with services; and the Relationships sub-scale with the Recovery-Promoting Relationship Scale, a 24-item measure of the competencies of mental health staff promoting recovery. While the Relationship sub-scale showed adequate convergent validity (correlation of 0.69), it was rather low for the Support sub-scale (correlation 0.47). The five-factor structure (CHIME) in the Support sub-scale was assessed by performing exploratory factor analysis separately in each domain, aiming at demonstrating their unidimensionality. For the Relationship sub-scale, a parallel analysis assessing whether a single underlying factor could be identified, was performed. Internal consistency of each domain of the Support sub-scale and the Relationship sub-scale was measured by Cronbach’s alpha and shown to be adequate (Cronbach’s alpha 0.82–0.95). Test-retest reliability was assessed by analysing changes in ratings of importance on the Support sub-scale, which were demonstrated to be small, and correlating scores of the Relationship sub-scale obtained at two time points, showing good reliability with intra-class correlation coefficient of 0.75.

The INSPIRE questionnaire has been translated into a number of languages (INSPIRE manual, 2015). However, to our knowledge only the Swedish translation of the questionnaire has been assessed psychometrically (Schön et al., 2015). The Norwegian version (available at NAPHA, 2017) has not been evaluated psychometrically. While most of the established methods for the evaluation of the psychometric properties of the questionnaires seem to work fine in the paper by Williams et al. (2015), the assessment of the dimensionality and the internal consistency of the domains of Support sub-scale of the INSPIRE questionnaire raises some concerns. While attempting to validate the Support sub-scale in the Norwegian context, a number of problems appeared, in particular when assessing the dimensionality of the sub-scale. As CHIME seems to be the most accepted framework for assessing personal recovery (van Weeghel et al., 2019), a better understanding of its domains, and studies claiming to measure these, is of importance. In this paper, the focus is on the Support and not the Relationship sub-scale.

Hence, the aim of this paper is two-folded. The first aim is to present and discuss the methods for assessing the dimensionality of the scale. For clarity, this is presented in a separate section in the manuscript (“Methodological considerations”). The second aim is to illustrate these methods by applying them on data collected by using a Norwegian translation of the INSPIRE Support sub-scale.

Methods

The INSPIRE Support sub-scale consists of 20 questions. First, the participant is asked to answer “yes” or “no” whether a certain item of the sub-scale is important for the participant’s personal recovery. If the answer is “no”, the participant is asked to proceed to the next question. In the case the item is important, the participant is asked to rate how much support the participant receives from the mental health care worker on the ordinal scale 0–4 with 0 meaning “not at all”, 1 – “not much”, 2 – “somewhat”, 3 – “quite a lot”, and 4 – “very much”.

Participants and Data

The INSPIRE questionnaire was filled in by patients as part of a patient outcome sub-study of a Norwegian research project “Implementation of Guidelines for Treatment of Psychoses” (NCT03271242). Altogether 39 community mental health centers and inpatient wards participated in the project. Inclusion criteria were: patients 16+ years old and having a diagnosis from the F20–29 section in the International Statistical Classification of Diseases and Related Health Problems (ICD-10) (World Health Organization, 1992). There were no exclusion criteria except that patients had to be able to understand and answer the questionnaires in Norwegian. Patients were included at the level of the mental health clinics of six health trusts, and questionnaires were administered to the patients by the clinicians. Inclusion of patients started in the beginning of June 2016 and was completed in March 2017. Only baseline data were used in this study, and in total, 325 patients with mean age of 40 (SD 12.7) years, 41% female and 21% with higher education, were included. Among the patients, 54% had a diagnosis of schizophrenia, 20% had a diagnosis of schizoaffective disorder, and 27% had another diagnosis from the F20–29 section in ICD-10. Further patient characteristics and study details have been reported earlier (Skar-Fröding et al., 2021).

Methodological Considerations

When developing a new questionnaire, the presumed dimensionality is typically assessed by exploratory factor analysis. In general, the items belonging to one dimension are assumed to reflect the same underlying construct. After the dimensions are extracted, the internal consistency for each dimension (reliability across the items) is evaluated. These issues are elaborated on next.

The exploratory factor analysis is a tool for analysing the correlations among a large number of items with the purpose of grouping those items into constructs (or factors in the statistical language). Items within each factor are supposed to be highly correlated, while items from different factors should not be correlated in any substantial way in order to conclude with multiple dimensions. In other words, exploratory factor analysis identifies the necessary amount of dimensions representing data. According to Armor (1973-1974), the correlation analysis and factor analysis should be the first step in scale development, as this is a tool for determining the dimensionality of the scale. If factor analysis generates multiple factors, the multidimensionality of the scale is confirmed, and the correlations among items contributing to different factors are nearly zero. A multidimensional measure thus consists of a number of unidimensional measures (Whidhiarso & Ravand, 2014).

However, it is problematic to conclude that the measure is multidimensional from multiple unidimensional solutions without performing factor analysis on all items simultaneously. Performing factor analysis on each dimension separately with the aim to demonstrate its unidimensionality is not enough, as it cannot identify items belonging to different dimensions, but still overlapping in a substantial degree according to factor loadings (i.e. high cross-loadings). According to Costello and Osborne (2005), “a “crossloading” item is an item that loads at 0.32 or higher on two or more factors”. Ziegler and Hagemann (2015) emphasize that when constructing the scale, it should at least be ensured that the items in one construct do not load on other constructs. Crutzen and Peters (2017) also call for factor-analytic evidence before assessing internal structure of the scale.

When the dimensionality of the scale is established, the internal consistency can be assessed. One of the most popular measures of internal consistency of a scale is Cronbach’s alpha (Cronbach, 1951). A value of the alpha close to 1 indicates a good internal consistency of the scale. Since Cronbach’s alpha is a function of the number of items, increasing number of items may increase the alpha as well. Cortina (1993) has shown that for scales with more than 14 items, Cronbach’s alpha may exceed 0.70 even in the case of two orthogonal dimensions, and be even higher if the dimensions are correlated, which often is the case in practice. Moreover, if the covariances among the items are very high, the resulting alpha will also be high, often above 0.90, which might indicate redundancy of some items in the scale, and consequently, a high value is not necessarily a sign of a consistent scale (Streiner, 2003).

The important feature of the alpha is that it is not a test of unidimensionality (Streiner, 2003). It rather assumes that the sample of items is unidimensional and may cause underestimated reliability if this assumption is violated (Tavakol & Dennick, 2011). Even if other assumptions for alpha to be a valid measure hold, high values may appear in the scales with multiple dimensions. Thus, alpha can only be used to confirm unidimensionality, not assess it (Cortina, 1993; Green et al., 1977). If the factor analysis results in more than one dimension, Cronbach’s alpha should be calculated within each dimension separately (Whidhiarso & Ravand, 2014).

The alpha is based on a so-called assumption of essential tau-equivalence, requiring that all items have the same variances, that covariances between items are equal, that items measure the same underlying construct, and have equal factor loadings (Trizano-Hermosilla & Alvarado, 2016). If the assumption of essential tau-equivalence is violated, the scale reliability may be underestimated, and the alpha can only be used as a lower bound approximation of the scale reliability (Cronbach, 1951).

In practice, a congeneric model only requiring the items to measure one underlying construct (i.e. unidimensionality) is more realistic. In the case of a congeneric model, omega reliability coefficient is a better measure of scale consistency (McDonald, 1999). Omega coefficient is suitable both when essential tau-equivalence exists and when the items constitute congeneric measurement (Trizano-Hermosilla & Alvarado, 2016). Omega coefficient applied to essentially tau-equivalent measurements (single factor model with equal loadings) would be equal to the alpha. For congeneric measurements, its value will be higher than the alpha (Whidhiarso & Ravand, 2014).

Statistical Analyses

Frequencies and percentages of “Not important” answers and means and standard deviations (SDs) for the level of support were presented, and the number of missing values for each item was reported (Table 1).

Table 1 Descriptive statistics for ratings of the INSPIRE Support sub-scale, N = 325

Exploratory factor analysis was performed to assess the dimensionality of the Norwegian translation of the INSPIRE Support sub-scale. The exploratory, not the confirmatory factor analysis was chosen, because the sub-scale has not previously been psychometrically assessed in a Norwegian context. Another and most important reason for using exploratory factor analysis was that an approach used by Williams et al. (2015) to assess the dimensionality of the sub-scale is not valid, and hence there is yet no factor structure to be confirmed.

In the factor analysis, cases with at least one missing value on the included items are eliminated. List-wise elimination of cases with missing items resulted in substantially reduced sample size, and thus power. The exploratory factor analysis was therefore performed on a matrix of polychoric correlations among the items with the correlations calculated on all available pairs of observations. Polychoric instead of Pearson correlations are preferred in factor analysis of ordinal variables, since Pearson correlations tend to be underestimated, which again results in smaller values of factor loadings (Holgado-Tello et al., 2010).

Iterated principal factors, principal factors and principal-component factors extraction methods, and varimax and oblique promax rotation methods were applied. The number of factors was assessed in several ways. Kaiser’s criterion of eigenvalue >1 in combination with the scree-test followed by parallel analysis was applied first. Next, to partially mimic the analyses by Williams et al. (2015), the five-factor solution was requested. Finally, entirely for comparison purposes, a factor analysis was performed separately for each dimension anticipated by Williams et al. (2015).

Other approaches for dealing with missing values were considered as well. Keeping in mind the algorithm for calculating the sum score for the Support sub-scale (INSPIRE, 2015), an imputation of missing values with mean of existing items for each patient was performed, although not recommended (Eekhaut et al., 2014). The approach using maximum likelihood with the expectation-maximization algorithm to estimate the covariance matrix (mi estimate function in the STATA) suggested by Graham (2009) was employed as well.

To assess the internal consistency of the identified dimensions in the Support sub-scale, the Cronbachs’s alpha and the omega coefficient were calculated.

The statistical analyses were performed with STATA v16 and SPSS v26.0.

Results

Description of the Support Sub-Scale

The number of missing answers to whether a certain item of the Support sub-scale is important varied between 4 and 20 (Table 1). The item “Having positive relationships with other people” was most often chosen as being important by N = 290 (89.2%) of the 325 participants. The item reported as being the least important was “Having my ethical/cultural/racial identity respected” (N = 207, 63.7%).

Among participants for whom support is important, the mean ratings varied between 2.3 and 2.8 with SDs between 0.9 and 1.1 for all items (Table 1). Due to some participants answering “Not important” or not answering at all to whether a certain item is important the number of missing rating values varied between 35 and 118. As a result, there were only 66 (20.3%) participants of 325 with no missing values on all 20 ratings.

Exploratory Factor Analysis

The polychoric correlations between pairs of items among 325 participants are presented in Table 2. If the INSPIRE Support sub-scale clearly would comprise five dimensions, one would expect strong correlations along the diagonal in 4 × 4-blocks and substantially weaker elsewhere. However, no clear pattern can be seen in the correlation matrix in Table 2. Nevertheless, the matrix of polychoric correlations was used as input in an exploratory factor analysis.

Table 2 Polychoric correlations among the INSPIRE Support sub-scale items calculated between pairs of items using all available observations. Item numbering is the same as in Table 1

According to Kaiser’s criterion of eigenvalue >1 combined with scree-plot followed by parallel analysis, the 20 items comprised a three-factor solution. The factor structure was stable despite of the extraction method used and the rotation applied. Therefore, only the results of the analysis based on the iterated principal factors extraction method with varimax rotation were presented (Table 3, columns 1–3 under three-factor solution). Among the three factors identified, the largest eigenvalue of 11.5 attributed to the first factor, while the second and third factors contributed little (eigenvalues of 1.1 and 0.9, respectively). The three factors explained 66.2%, 6.3% and 5.0% of the total variance, respectively, providing no strong evidence of more than one dimension. The factors contained items that were randomly distributed with respect to the expected dimensions, and there were many strong cross-loadings (above 0.32) present, indicating no structure similar to CHIME in the Support sub-scale. When a five-factor structure was required, the exploratory factor analysis suggested similar conclusion (Table 3, columns 1–5 under five-factor structure).

Table 3 Results of exploratory factor analysis; iterated principal factors extraction method with varimax rotation. Strongest loadings for each item are highlighted by bold face, while italic highlights high cross-loadings (loadings of 0.32 or higher (Costello & Osborne, 2005))

Other Ways of Handling Missing Values

Exploratory factor analysis performed on a data set where missing values were imputed with mean of existing items for each patient produced a one-factor solution. In the case where the covariance matrix was estimated by expectation-maximization method, also one-factor solution was obtained implying no dimensions in the Support sub-scale. Results for these two analyses are not presented, but available upon request.

Assessing Internal Consistency

Even though SDs were quite equal for all 20 items, the means varied to some extent. Also the correlations among the items and the factor loadings in all factor solutions were rather different. This indicates violation of the assumption of essentially tau-equivalent measurement and hence makes Cronbach’s alpha an inadequate measure for internal consistency. Therefore, omega coefficient should be prioritised. However, we also report Cronbach’s alpha to make our results comparable to Williams et al. (2015).

According to Cronbach’s alpha and the omega coefficient for three- and five-factor solutions presented in Table 3, the internal consistency is good in all dimensions and in both solutions, however, it seems that Cronbach’s alpha underestimates the internal consistency in the identified dimensions. Many items loading on factor 2 and 3 in the three-factor solution load together in the five-factor solution. Most of the items in factor 1 in three-factor solution remain in factor 1 in five-factor solution as well, and only five items load on factor 2, also suggesting a one-factor solution.

For comparison, we also assessed the internal consistency in the same way as Williams et al. (2015), i.e. we assumed the presence of five domains, and performed exploratory factor analysis and calculated Cronbach’s alpha within those domains. Five unidimensional solutions were obtained and the alpha coefficients for each of the five domains were, correspondingly, 0.79, 0.87, 0.79, 0.85, and 0.86. As the assumption of essentially tau-equivalent model is likely violated, the omega coefficients were calculated and found to be 0.85, 0.90, 0.84, 0.89, and 0.88, respectively. Both the alpha and omega coefficients showed good internal consistency.

Interestingly, Cronbach’s alphas calculated for all possible (n = 4845) combinations of four items are all similar (Fig. 1) and indicate quite good internal consistency with min 0.70 and max 0.88, and average alpha 0.79 with 95% CI of (0.791; 0.793). Also the alpha coefficient for all 20 items together was estimated to be 0.95 (while the omega was 0.96).

Fig. 1
figure 1

Distribution of Cronbach’s alphas calculated for all possible combinations of 4 out of 20 items, N = 4845

Discussion

The first step in assessing dimensionality of a scale should be a correlation analysis followed by a factor analysis performed on all items. Only after the dimensionality is established, the internal consistency can be assessed. The measures of internal consistency do not measure the dimensionality, they only confirm it. In other words, internal consistency is a necessary but not sufficient condition for measuring unidimensionality in a sample of items (Green et al., 1977). In the case of multiple dimensions, internal consistency should be assessed for each dimension separately as the statistics used for this purpose rely on the assumption that items measure one underlying construct (unidimensionality). The alpha coefficient for internal consistency can only be used when the items comprise a tau-equivalent measure, an assumption which is often violated. In practice, congeneric models are more realistic and other statistics, for example the omega coefficient, for internal consistency should be preferred.

Williams et al. (2015) assessed the dimensionality of the INSPIRE questionnaire by performing factor analysis for each intended dimension separately, arguing with small sample size, and in such way assuming that certain items measure only the anticipated dimension. Even though their five exploratory factor analyses resulted in five unidimensional solutions, there is no way to quantify the cross-loadings among the items. The correlation matrix could give some insight into cross-correlations among items, but it is not presented in the paper by Williams et al. (2015). Schön et al. (2015) skip factor analysis completely and assume the five-factor structure proposed by Williams et al. (2015) with no statistical evidence.

The correlations in the present study showed no clear pattern, not regarding the five CHIME domains introduced by Williams et al. (2015), nor any other interpretable structure in the Support sub-scale. A number of factor analyses performed resulted in one dimension differing substantially from the proposed five-factor structure. This calls for more research assessing the dimensionality of the INSPIRE Support sub-scale on different data sets.

Both Williams et al. (2015) and Schön et al. (2015) assessed the internal consistency of each CHIME domain using Cronbach’s alpha. However, the essential tau-equivalence assumption was not tested or discussed in either paper. The means of the 20 items varied quite a bit in both studies. As no correlations among the items were present, it was difficult to assess the assumption further. It seems like the omega coefficient, which is suitable for both essentially tau-equivalent and congeneric models, would be a more appropriate statistic for internal consistency.

In the present study the assumption of essential tau-equivalence was likely not met, and the omega coefficient was therefore estimated. However, for comparison purposes also Cronbach’s alpha was calculated. The omega coefficients and Cronbach’s alphas differed slightly, indicating that the items do not comprise an essential tau-equivalent measure. A good internal consistency was demonstrated independently of the number of domains as well as for all 20 items assessed as one domain. As expected, both the alpha and the omega coefficients indicated redundancy of some items in later case. The alpha coefficient calculated for the five domains suggested by Williams et al. (2015) and “verified” by Schön et al. (2015), also showed good internal consistency in the present study. For illustrational purposes, the alpha coefficient was calculated for all the possible 4845 combinations of four items out of 20. The variation in these calculations was strikingly small with average alpha of 0.79 and 95% CI of (0.791; 0.793), also indicating rather good internal consistency of most of the 4845 possible “domains”.

Another important aspect not discussed by Williams et al. (2015) is an issue related to a number of missing ratings on the Support sub-scale, which occur every time the person answers “no” on the question whether a certain item of the sub-scale is important for the participant’s personal recovery. Missing rating may also appear if the person is not rating a certain item even though the item is important for his or her personal recovery. According to the INSPIRE Support sub-scale scoring manual (INSPIRE, 2015), the sum score cannot be calculated only if all 20 questions are rated “no” or left blank. It can however be calculated for persons rating all 20 items or, for example, only one item, implying that the missing values are imputed with the persons mean rating. One can discuss a quality of a sum score if only a few items are rated, but this is out of the scope of this paper. However, the way the sum score is calculated, may indicate that a person’s mean rating could have been used for imputing missing values on relevant items in previous studies. As the factor analysis struggles with cases with at least one missing value, this may seem an appealing solution to the problem but the mean imputation is not a recommended practice in the context of factor analysis as it reduces the variation remarkably Eakhaut et al. (2014). It is unfortunately unclear how Williams et al. (2015) dealt with missing ratings in their publication.

The number of missing ratings was quite high in the present study and excluding patients with at least one missing value among 20 items would result in a substantially reduced sample size. Therefore several approaches for dealing with missing values were considered. The main results of this study are based on an exploratory factor analysis performed on polychoric correlations, the approach also used by Williams et al. (2015). The polychoric correlations were chosen because the ratings constitute an ordinal scale. But another – just as important – reason for using correlation matrix as input to exploratory factor analysis was the handling of missing values. The correlations were calculated between all pairs of items, including all possible observations. Two other approaches handling a large amount of missing values were applied – imputation with a person’s mean value as well as estimating the covariance matrix by the expectation maximization method. Both resulted in one-factor solution with very similar loadings.

There is no clear support for the five-domain structure described in the CHIME framework by Williams et al. (2015). The analysis in the present study suggests one single dimension. There is therefore a clear need for more research through new and larger studies to gain more insight into the dimensionality of the Support sub-scale using the recognized statistical methods. However, single items and the sum score of the questionnaire still cover important aspects of the personal recovery concept. Previous research has shown that a large sample of service users with psychosis rates a clear majority of the items in the INPSIRE Support sub-scale as important for their personal recovery process (Skar-Fröding et al., 2021). Hence, the questionnaire can still be used to investigate what personal recovery areas are of importance and whether service users experience that they receive support for this. Future research could investigate the important knowledge gap of how personal recovery processes take place and how support for this is perceived in patients with other common mental disorders such as mood disorders or anxiety.