Introduction

People with disabling conditions are often constrained in their performance of daily activities, and in their social life such as: relationships, education and community involvement [1]. These restrictions in (social) participation are defined by the World Health Organization (WHO) as ‘problems an individual may experience in involvement in life situations’ [2]. The community can have a particularly negative effect on the participation of the person affected, as seen in stereotyping, isolation and other forms of discriminatory practices of community members [2]. Other causes of participation restriction include the absence of (assistive) equipment, policies or disease-related financial problems [1, 3].

Several instruments have been developed to assess participation restrictions in people with a health condition. Examples include the Perceived Handicap Questionnaire (PHQ) [4], the London Handicap Scale (LHS) [5, 6], the Craig Handicap Assessment and Reporting Technique (CHART) [7, 8], the Assessment of Life Habits (LIFE-H) [911], the Impact on Participation and Autonomy questionnaire (IPA) [12] and the Keele Assessment of Participation (KAP) [13]. More recently the International Classification of Functioning, Disability and Health (ICF) Measure of Activity and Participation-Screener (IMPACT-S) [14], the Participation Profile (PAR-PRO) [15], the Participation Survey/Mobility (PARTS-M) [16] and the Participation Measure for Post Acute Care (PM-PAC) [17] were developed.

All these instruments were developed in high income countries such as the US, UK and The Netherlands. Ten years ago, a large rehabilitation field programme in Nepal identified a need for an instrument specifically suitable for use in low and middle income countries to evaluate the impact of its intervention [18]. The Participation Scale (P-scale) was developed to meet this need [18].

The P-scale is based on the nine participation domains of the ICF: learning and applying knowledge, general tasks and demands, communication, mobility, self-care, domestic life, interpersonal interactions and relationships, major life areas and community, social and civic life [2, 18]. According to Noonan et al., the various instruments based on the ICF that intend to measure social participation, cover 6–8 domains of the ICF [19]. The P-scale covers 8 out of 9 domains. No item was included that covered the domain general tasks and demands. The instrument measures perceived participation restriction and intends to be generic in nature [18]. Specific attention was paid to the cross-cultural validity of the scale, by developing the scale with an international team of experts, simultaneously in six languages and three countries [18]. Another important strength of the P-scale was the emphasis that the instrument should be suitable for use by staff who are not professional interviewers, because specialized staff is scarce in low-income countries [18].

Since the majority of these instruments were developed, important changes occurred in the field of health measurement. Psychometric methods used in instrument development and validation have evolved steadily, resulting in extensive quality criteria that provide indications for what constitute good measurement properties [20]. This framework, proposed by Terwee et al. [20] identified quality criteria for content, criterion and construct validity, internal consistency, agreement, reliability, responsiveness, floor and ceiling effects and interpretability.

We considered it useful to submit the instrument to this, more rigorous testing protocol, to see whether it could comply with these new standards. In addition, we aimed to validate the P-scale in a new area that is culturally very different from the hill region in Western Nepal, which was part of the original development study. Therefore, the purpose of the present study was to investigate the psychometric properties of the P-scale among people with various disabling conditions in the Eastern Region of Nepal.

Materials and methods

The study population consisted of people with a disability (PWD) from 6 Village Development Committees (VDCs) in Morang District, Nepal, who participated previously in a large household survey conducted to assess the prevalence, pattern and severity of disabilities.Footnote 1 Systematic random sampling was applied to select the PWD from 6 VDCs using lists of PWD per VDC as a sampling frame. PWD were considered for inclusion if they had been identified with a disability and were between 16 and 65 years of age and willing to provide verbal informed consent. PWD were excluded if they were diagnosed with a different health condition (e.g., tuberculosis, HIV/AIDS) that might influence their social participation. Furthermore, we aimed to concurrently select at least 50 controls, using a convenience sampling method. The purpose of the latter was to identify the cut-off point for ‘normal’ participation in the local population. Two trained and experienced native language-speaking interpreters conducted the data collection in the 6 VDCs.

The main instrument, the P-scale, is an 18-item scale (v.6.0) that was designed to assess participation restrictions in PWD [18]. The scale is interviewer administered and has six potential response options; the same as everyone else (0 points), not relevant (0), no problem (1), small problem (2), medium problem (3) and large problem (5). The total score on the scale is the sum of the scores of the individual items [18]. The higher the score, the higher the level of participation restriction [18]. The adapted 14-item version of the Explanatory Model Interview Catalogue (EMIC) stigma scale was used to assess perceived stigma [21, 22]. This scale has a 4-point response format, ranging from yes (3), possible (2) and uncertain (1) to no (0). A sum score will be calculated whereby higher scores reflect greater levels of perceived stigma [21, 22]. The mean score of the respondent’s completed items was assigned to missing items found in the EMIC and P-scale. To identify the self-reported health status of the PWD a Visual Analogue Scale (VAS) for self-reported health was administered. The participants were asked to rate their quality of life at that particular moment in time on a line with a range from 0 (bad) to 10 (good).

The P-scale and the EMIC showed good validity and reliability in previous studies in Nepal and India [18, 23, 24]. The VAS for self-reported health was also used during the initial development study of the P-scale. Cronbach’s alpha’s for the P-scale ranged from 0.87 to 0.93 and for the EMIC from 0.76 to 0.88. For the P-scale, an Intra-Class Correlation coefficient (ICC) of 0.83 was found for intra-tester reliability and 0.80 for inter-tester reliability [18, 23, 24]. The weighted kappa for the EMIC was 0.70 [18, 23, 24]. Socio-demographic variables were collected including age, religion, residency, income and education. In addition, a question related to self-reported health was included, consisting of five response levels; excellent, very good, good, fair and poor.

Approval for this study was obtained from the Nepal Health Research Council at Kathmandu. Participants gave verbal informed consent.

Data management and analyses

The data were analyzed using the Statistical Package for the Social Sciences (SPSS) (v.16.0; Chicago, IL) and MPlus (v.6.11). The Chi-square test was used to check for significant differences in the demographic variables between the controls and the PWD. A possible difference in age was investigated using an independent samples t test. A cut-off point for ‘normal’ participation was calculated, based on the 95th percentile score of the control population. Furthermore, item-total correlations were investigated, and the mean and standard deviation (SD) of the items were calculated.

Psychometric properties were tested by using several statistical methods based on predefined quality criteria [20].

Internal consistency

Internal consistency was investigated by calculating the Cronbach’s alpha. A Cronbach’s alpha between ≥0.70 and ≤0.95 was classified as good [20].

The dimensionality of the P-scale was assessed using confirmatory factor analysis (CFA). Based on previous studies, we hypothesized that CFA would show one main factor, namely ‘participation’ [18]. Indices of good fit such as the Comparative Fit Index (CFI), Tucker–Lewis Fit Index (TLI) and the Root Mean Square Error of Approximation (RMSEA) were assessed. Adequate cut-off levels for model fit were set at >0.95 for the TLI and CFI and <0.08, respectively, for the RMSEA [25, 26]. A RMSEA score <0.06 indicates perfect model fit [26]. Explanatory factor analysis (EFA) was used to examine the dimensionality of the item set measuring the underlying construct, because the results suggested insufficient model fit [27, 28]. An oblique geomin rotation method was applied because we expected correlations between factors. Factors were extracted based on the break point of the successive eigenvalues identified in Scree Plot, item factor loadings (r > 0.30) and interpretability [27, 28]. CFA was used to confirm the findings of the EFA.

Construct validity

Construct validity was investigated by correlating the P-score with the perceived stigma score and the VAS self-reported health score, formulating hypotheses in advance. Construct validity was rated sufficient if at least 75% of the a priori formulated hypotheses were confirmed [20]. In our study, only two hypotheses were formulated, so we took this to mean that both should be confirmed. The first hypothesis relates to research findings suggesting a reciprocal relationship between participation and perceived stigma [23]. Second, we hypothesized an association between health status and participation. The higher the level of participation restrictions, the poorer the self-reported health status of the respondent. This resulted in the following hypotheses:

Hypothesis 1

A moderate positive correlation of 0.4–0.8, between the P-scale score and the EMIC Score (Pearson correlation) [23].

Hypothesis 2

A moderate negative correlation of −0.4 to −0.8, between the P-scale score and the self-reported health score (Pearson correlation).

Reliability

The test–retest reliability of the P-scale was assessed by calculating the ICCagreement(two-way random effects model). PWD were visited twice within 2 weeks by a different interviewer without knowledge about the scores obtained during the previous P-scale interview [20]. The minimum acceptable level for test–retest reliability was set at 0.70 [20].

Floor or ceiling effects

The presence of floor and ceiling effects was defined as 15% or more of the respondents with the lowest, respectively, highest possible score on the P-scale [20].

Results

Participant characteristics

A total of 153 PWD and 55 controls were included in the study. Socio-demographic characteristics of the PWD and controls are described in Table 1. Significant differences were identified in educational status and (family) income, where PWD were more often illiterate compared to the control group (p = 0.039) and earned less (p = 0.001 and p < 0.001, respectively). Moreover, PWD rated their health status worse than the control group (p < 0.001). The majority of the PWD were physically disabled (61%), followed by a vision-related (22%), mental (7%), multiple (7%), hearing (3%) and voice/speech-related disability (1%). The PWDs scores on the P-scale ranged from 0 to 85. A median of 30, a mean of 36 and a SD of 23 were found for the P-scale sum score. The 95th percentile of the P-score in our control sample was 12. PWD scoring higher than 12 were categorized as having a ‘participation restriction’.

Table 1 Characteristics of people with a disability (N = 153) and the control group (N = 55)

The mean score of the items was 2.0 and ranged from 0.58 (SD 1.59) to 3.86 (SD 1.71). See Table 2 for a complete overview. The item-total correlations ranged from 0.30 for item 10 (‘start or maintain a relationship’) to 0.85 for item 6 (‘take part in social activities’).

Table 2 Descriptive statistics items (range 0–5)

Internal consistency

A Cronbach’s alpha of 0.93 was found for the whole P-scale. However, CFA was unable to confirm the expected unidimensionality of the P-scale. The following fit indices were found: CFI 0.98, TLI 0.98 and the RMSEA 0.11. The CFI and TLI indicate adequate fit; however, the RMSEA suggests insufficient fit between the unidimensional model and the observed data. Factor loadings for the one-factor CFA model can be found in Table 3.

Table 3 18 Item confirmatory factor analysis (1 factor) (N = 153)

Based on these results, we performed EFA without limiting the numbers of factors and an oblique geomin rotation. This revealed four factors with an eigenvalue greater than 1, 10.82, 1.77, 1.35 and 1.01. However, factor 4 explained only 5.5% of the variance, and the items that made up factor 3 showed an adequate factor loading of at least r = 0.32 on factor 2 [29]. Furthermore, the Scree Plot supported a one- or two-factor solution.

The two factors identified were named ‘work-related participation’ (items 1–3) and ‘general participation (items 4–18). We conducted CFA on the two-factor model to check for model fit. The CFI and TLI were found to be both 0.99 and the Root Mean Square Error of Approximation (RMSEA) was 0.069. This indicates good model fit. The factors were moderately correlated (r = 0.57). The factor loadings for the items were found to be adequate (Table 4). The internal consistency of both subscales was sufficient, α = 0.78 and α = 0.93, respectively.

Table 4 18 Item confirmatory factor analysis (2 factors) (N = 153)

Construct validity

We found a moderately positive correlation between the P-scale and the EMIC (r = 0.55, p < 0.001) and a moderately negative correlation between the P-scale and self-reported health scale (r = −0.51, p < 0.001). These correlations confirmed the a priori expectations.

Reliability

For the whole scale, test–retest reliability was high, with an ICCagreement of 0.90 (CI 0.85–0.94).

Floor or ceiling effects

No floor or ceiling effects were identified for the whole scale. Only 2% of the respondents scored the lowest possible score of 0 and none of the PWD scored the highest possible score of 90 points on the P-scale.

A summary of the findings for the whole scale and the subscales can be found in Table 5.

Table 5 Summary of the psychometric properties of the Participation Scale

Discussion

The purpose of this study was to investigate the psychometric properties of the P-scale. The results show that the psychometric properties of the P-scale were good in the present context.

The psychometric properties found in the initial development study of the P-scale are comparable to those found in the present study [18]. A Cronbach’s alpha of 0.92 was found during the development study, while ICCs for inter- and intra-interviewer reliability were 0.80 and 0.83, respectively [18]. Furthermore, construct validity was confirmed by demonstrating significant correlations with expert opinions, self-assessment and impairment scores of the Eyes Hands Feet system [30]. Other studies in Nepal, Brazil and India also showed good results [23].

In the development study, factor analysis suggested one factor, ‘participation’, which accounted for 90% of the variance [18]. In the present study, we were unable to reconfirm this factor structure of the P-scale by applying CFA. With EFA two factors were identified, named ‘work-related participation’ and ‘general participation’. The two-factor structure showed best model fit. Factor loadings were even higher compared to the one-factor model, with Cronbach’s alphas of 0.93 and 0.78, respectively. Several explanations are possible for this difference in factor structure. Local cultural differences may be present in the experience of participation restrictions, where work-related restrictions may play a different role in the current study population, than those included in the development study. Additionally, the difference may be due to the use of a different type of factor analysis in the two studies. The development study used exploratory factor analysis (EFA) without rotation, while the current study used EFA with oblique geomin rotation.

The two-factor structure found in this study may have certain implications for the use and statistical analysis of the P-scale. If the two-factor structure were to be confirmed, internal consistency and other psychometric properties, such as the ICCs and possible floor and ceiling effects, would have to be calculated per subscale as well as for the whole scale. This may also have implications for the score calculation of the P-scale. However, before changing the description of the structure of the scale, more research is required to determine the optimal factor structure in other, larger data sets. Currently, the scale is used as a general measure for the assessment of participation. If the work-related participation subscale will be replicated consistently, these three items may be used as a separate indicator of specific work-related problems. However, subscale analysis showed ceiling effects, therefore caution is necessary.

According to current international standards, our findings indicate that the P-scale has good measurement properties in South-East Nepal. However, it is important to note that these findings cannot be generalized to use of the scale in other countries. The P-scale has been used successfully in many other languages and only few problems have been reported [23, 24, 3133]. However, re-validation is necessary in every new cultural setting where the instrument is to be used.