Background

Multiple system atrophy (MSA) is a rare progressive neurodegenerative disease characterized by autonomic dysfunction, Parkinsonism, and ataxia [1, 2]. MSA patients generally need wheelchairs in five years and die in ten years from disease onset. Though some underlying mechanisms of MSA have been revealed, such as the aggregation of α-synuclein to oligodendroglia, the complete pathogenesis of the disease remains to be elucidated [3]. As quantitative biomarkers for MSA have not yet been developed for use in clinical trials, clinicians must rely on evaluations of changes in symptoms. However, the usefulness of such evaluation varies according to the scale used, and the large numbers of patients required for MSA trials render redundant and unresponsive scales impractical. Therefore, a brief yet sensitive scale is desirable for clinical trials involving patients with MSA.

In a previous study, we compared the following five scales in their ability to assess symptoms of MSA [4]: Unified MSA Rating Scale (UMSARS) [5], Scale for the Assessment and Rating of Ataxia (SARA) [6], Berg Balance Scale (BBS) [7], MSA Health-Related Quality of Life scale (MSA-QoL) [8], and Scales for Outcomes in Parkinson’s Disease–Autonomic Questionnaire (SCOPA-AUT) [9]. We subsequently devised a simple pilot scale comprised of eight items representative of those exhibiting the largest standardized response means (handwriting, finger taps, transfers, standing with feet together, turning trunk, turning 360°, gait, and body sway) [4].

Our prior study revealed that the UMSARS Part II (motor examination), Part IV (global disability scale, SARA, and BBS are effective in evaluating MSA progression over 12 months, indicating their potential to assess rapid changes in MSA symptoms. Detailed item-by-item analyses suggested that the largest SRMs were obtained for the following items: handwriting, finger taps, transfers, standing with feet together, turning trunk, turning 360 degrees, gait, and body sway. Further analyses revealed that our eight-item semi-quantitative (total score = 36 points) pilot scale (Table 1) exhibited an SRM larger than those observed for the UMSARS Part II/Part IV, SARA, and BBS [4], suggesting that the pilot scale was most effective in detecting rapid changes in symptoms of MSA. In the present study, we aimed to investigate the validity and reliability of the pilot scale for the assessment of symptoms in patients with both cerebellar and parkinsonian subtypes of MSA.

Table 1 The items of the pilot scale

Methods

The present prospective observational study included hospitalized patients and outpatients receiving treatment in the Departments of Neurology at Hokkaido University Hospital and Obihiro Kosei Hospital between January 1, 2014 and February 28, 2015. Included patients had been diagnosed with probable or possible MSA per criteria defined in the 2008 consensus statement [10]. The present study was approved by the institutional review board of Hokkaido University Hospital. Written informed consent was obtained from all patients prior to their participation in the study. Those who declined to participate as well as those with severe cognitive impairments such as inability to understand explanations or to follow instructions in examination were excluded.

Previous reports utilizing both SARA and BBS were consulted in the design of the present study [11, 12]. Patients were separately evaluated by two independent neurologists. Patients first underwent evaluation by Rater 1 using the UMSARS, SARA, and pilot scale. Rater 2 evaluated patients using the pilot scale alone on the same day. Within one month, patients underwent re-evaluation by Rater 1 using the pilot scale. Each trial was performed blindly, under the same conditions, and in avoidance of acute phases in order to eliminate the influence of sudden changes in symptoms. No interventions were utilized in the present study, and patients were allowed to continue treatments (mainly drug and rehabilitation) already in progress. Amassed data were subjected to linkable anonymizing, following which statistical analyses were performed.

Statistical analysis

JMP® Pro Version 12.0.1 (SAS Institute Inc., Cary, NC, USA) was used for statistical analysis. Correlations between scores on the UMSARS, SARA, and the pilot scale were evaluated by Spearman’s rank coefficients. Inter-rater and intra-rater reliability for the pilot scale was assessed between Rater 1 and Rater 2. The total score as well as individual item scores for the pilot scale were analyzed based on Cronbach’s α coefficients and intraclass correlation coefficients (ICCs). Items with Cronbach’s α coefficients of more than 0.8 were considered to exhibit high internal consistency. ICCs were interpreted in conformity to the reference as slight (0.000 to 0.200), fair (0.201 to 0.400), moderate (0.401 to 0.600), substantial (0.601 to 0.800), or almost perfect (0.801 to 1.000) [13]. Mean values were presented along with standard deviations (SD).

Results

A total of 32 patients (15 male, 17 females; mean age: 63.4 ± 9.7 years old; range: 41 to 80 years old) were enrolled in the present study. Demographic information for included patients is presented in Table 2. Twenty patients had been diagnosed with MSA of the cerebellar subtype [MSA-C], while 12 patients had been diagnosed with MSA of the parkinsonian subtype [MSA-P]. The average time required for assessment was 16.4 ± 5.2 (range: 11–25) minutes for the UMSARS, 3.8 ± 1.0 (2–6) minutes for the SARA, and 5.0 ± 1.5 (2–7) minutes for the pilot scale (Fig. 1a).

Table 2 The characteristics of the study patients
Fig. 1
figure 1

Comparison of scales. aThe average time required for examination of UMSARS, SARA and the pilot scale. SARA and the pilot scale needed shorter time for examination than UMSARS. UMSARS, Unified Multiple System Atrophy Rating Scale; SARA, Scale for the Assessment and Rating of Ataxia. *: p < 0.05, Wilcoxon’s rank test. b. Correlation between scale scores. The scores on the pilot scale significantly correlated with those on UMSARS Parts I, II, IV and SARA. UMSARS, Unified Multiple System Atrophy Rating Scale; SARA, Scale for the Assessment and Rating of Ataxia. ρ: Spearman’s correlation coefficients

Total scores on each scale are presented in Table 2. Average scores for the first assessment were as follows: UMSARS Part I: 21.3/48, UMSARS Part II: 21.3/56, UMSARS Part IV: 3.0/5, SARA: 19.3/40, pilot scale: 20.8/36. There was no significant difference in the total score of the pilot scale between MSA-C and MSA-P (average total score of MSA-C: 21.2, MSA-P: 20.2). The same thing was also confirmed for each item’s score. Both total and individual item scores on the pilot scale significantly correlated with scores on UMSARS Parts I, II, and IV as well as SARA scores (Fig. 1b). Spearman’s correlation coefficients ρ were 0.8780–0.9392. No significant differences were observed between each assessment of the pilot scale (Wilcoxon’s rank test: p = 0.898 to 0.973). Table 3 depicts the distribution of scores assigned by Rater 1 during the first assessment. Scores for the second and third assessments showed similar tendencies. Many items had high item-total correlation coefficients (Spearman’s correlation coefficients: 0.525 to 0.937). ICCs and Cronbach’s α coefficients are presented in Table 4. Inter-rater and intra-rater ICCs and Cronbach’s α coefficients for total pilot scores were both greater than 0.9. Further, inter-rater and intra-rater ICC values over 0.6 (substantial) were obtained for almost all items on the pilot scale: Only item 2 exhibited a moderate inter-rater ICC. Cronbach’s α coefficients were greater than 0.9 for all items.

Table 3 Distribution of scores for rater 1 (first assessment) for the pilot scale (n = 32)
Table 4 Reliability of the pilot scale

Additionally, we considered prototype pilot scales consisting of five to seven items by excluding either a single item or a combination of three items (item 1: hand writing, item 2: finger taps, item 5: turning trunk) with relatively low inter-rater ICCs from the original pilot scale. Exclusion of such items maintained high total scores, intra-rater and inter-rater ICCs, and Cronbach’s α coefficients.

Discussion

Patients in the present study exhibited characteristics similar to those reported in previous studies of Asian/Japanese populations (Table 2) [14,15,16]. The distribution of UMSARS Part IV scores indicated that this study included relatively unbiased patients with mild to severe symptoms.

Scores on the pilot scale significantly correlated to scores obtained on the UMSARS and SARA (Fig. 1b), indicating the criterion-related validity of the pilot scale. The ability to administer this pilot scale in a short period of time further suggests its usefulness in the evaluation of MSA symptoms (Fig. 1a). In addition, ICC and Cronbach α coefficients remained high (Table 4), indicating high intra- and inter-rater reliability. Test-retest reliability and internal consistency were also high. When either one or three low inter-rater ICC items were excluded from pilot scale (Table 4), ICCs and Cronbach’s α coefficients remained relatively unchanged, indicating that a scale consisting only of items related to gait/standing is equally useful in assessing symptoms of MSA.

The present study possesses some limitation. Pilot scale items with low inter-rater ICC (handwriting, finger taps, turning trunk) exhibited ambiguity with respect to differentiating between scores. Further improvement in these areas of evaluation is required in order to more accurately assess changes in MSA symptoms. One such possibility involves combined assessment utilizing both the pilot scale and a gait accelerometer to record quantitative data. In addition, semi-quantitative scales such as that utilized in the present study often exhibit a ceiling effect [17]. This pilot scale also might show a ceiling effect among patients of advanced stage. Then, this pilot scale was not suitable for advanced MSA patients. On the other hand, in clinical trials, many participants would be early cases with mild to moderate symptoms, so influences of a ceiling effect were thought to be less likely. Further investigation regarding this point is required to more fully examine the effect of time course on the utility of the pilot scale. Additionally, this study included patients with mild to severe symptoms of MSA. And MSA-P patients were indeed relatively few. It is desirable that the reliability of this pilot scale would be presented in a larger cohort.

The SARA score of MSA-C group was similar to that of MSA-P group in this study. It should be noted that the score of SARA may be influenced by other symptoms such as Parkinsonism. The pilot scale of this study reflected symptoms of Parkinsonism and ataxia. It can be applied equally in both group without any modification. And the pilot scale showed larger standardized response mean than SARA and UMSARS [4]. It meant the pilot scale could sensitively capture symptom changes among MSA patients. The pilot scale was superior to SARA in terms of sensitivity even if it took some more time (5.0 ± 1.5 min) than SARA (3.8 ± 1.0 min). It is useful if it can suppress the deterioration of items with rapid symptomatic change (= items of this pilot scale) in clinical trials.

Conclusions

The results of the present study indicate that the eight-item pilot scale for the assessment of MSA symptoms is both valid and reliable and may be useful for evaluation of patients in the early stages of MSA. However, due to the limitations of the present study and small sample size, further research involving improved scales as well as larger patient populations is required.