Cognitive impairment (CI) and associated neurobehavioral symptoms (e.g., fatigue, depression) are frequent and often highly debilitating in multiple sclerosis (MS) [1]. Particularly cognitive processing speed, executive functions such as working memory capacity as well as verbal and figural episodic memory show a disease-related decline with adverse effects on patient’s vocational status and quality of life [2, 3]. CI has been shown to be present in the earliest disease stages of MS as well as in clinically isolated syndrome (CIS) [4, 5]. Several studies suggest that CI can be present independent of physical disability and that its development and progression is most pronounced during the first years after disease onset [6, 7]. Despite its increasingly recognized clinical relevance for patients with early MS, little is known about risk factors that contribute to CI, its short-term course and a potential progression after initial diagnosis of MS [3, 8, 9]. Associations between clinical disease severity markers (e.g., EDSS, number of relapses, disease duration), conventional MRI parameters of disease burden (e.g., number and/or site of lesions, degree of atrophy) and both severity and profiles of CI have been reported in large cross-sectional cohort studies on a group level [8, 10,11,12]. However, these associations were less evident in patients with early disease stages [13]. A range of studies have also investigated longitudinally risk factors and prediction of long-term outcome of CI in patients with MS mainly based on clinical and MRI parameters [5, 6, 14,15,16]. Compatible with results from cross-sectional studies, baseline brain volume [14, 15] and to a lesser degree lesion metrics [6, 16] usually contribute to long-term prediction of CI but predictive abilities were generally low and inconsistent for short-term follow-up periods and early disease stages [5, 14]. Both cross-sectional and longitudinal studies, moreover, display a substantial heterogeneity regarding (i) assessments and definitions of CI, (ii) selection and measurement of predictor variables, (iii) homogeneity of sample characteristics (e.g., disease severity, intake of medication, etc.) and (iv) employed MRI techniques and length of follow-up periods. These methodological issues currently impede an integration and extrapolation of results onto individual cases with newly diagnosed MS [6, 8, 10, 11, 14,15,16,17]. In turn, this gap in key-knowledge hinders incorporation of cognitive monitoring into standard clinical care which in turn hampers the development and evaluation of specific programs for the prevention and rehabilitation of CI in MS [1].

Here, we aimed to investigate whether CI and its short-term progression can be effectively predicted by a single marker or combinations of conventional demographic, clinical and MRI parameters that are readily available to clinicians at the time of diagnosing MS. We were further interested in the relative importance of these potential risk factors both for CI as well as for its longitudinal change. To this end, we analyzed cognitive screening data from the German National MS cohort (NationMS) of patients with initial diagnosis of either MS or CIS [18]. We assumed standard sociodemographic data, established clinical markers of MS disease burden and/or conventional MRI parameters at baseline to be predictive for CI. We further analyzed whether changes in cognitive test performance during the first year after diagnosis may be effectively predicted using these baseline parameters.

Materials and methods

NationMS cohort study

The German National MS cohort is a prospective longitudinal observational study comprising (a) detailed assessment of patients with first diagnosis of MS or CIS and (b) yearly follow-up assessment with a standardized protocol across 22 centers in Germany. It was approved by the ethics committee of Ruhr-University Bochum (Registration no. 3714-10), and consecutively, by all local committees of the participating centers. All patients provided written informed consent. Inclusion and exclusion criteria as well as assessment plans are laid out in detail elsewhere [18]. In short, inclusion required a recent diagnosis of either CIS or RRMS according to Barkhof [19] or 2005 McDonald [20] criteria, respectively; exclusion criteria implied previous intake of disease-modifying therapies (DMTs), other neurological or psychiatric conditions as well as progressive courses of MS. Assessment involved sociodemographic data, detailed neurological status, medication status regarding DMTs, standardized cranial MRI evaluation regarding signs of disease burden, collection of biomaterial as well as neuropsychological screenings and self-report questionnaires. Datasets from N = 1123 patients were included for baseline statistics. Data from N = 958 patients were available for follow-up assessment at an average of 12.13 (SD = 1.54) months after baseline.

Cognitive screening data

MUSIC: Multiple Sclerosis Inventory for Cognition

The MUSIC is a brief multiple-domain cognitive screening test geared towards rapid assessment of the most frequently impaired cognitive domains in MS [21]. It is widely used as a screening for CI in German-speaking countries and consists of six subtests, in the following order: (1) Word List Learning (number of words learned over two consecutive trials out of a list with 10 words), (2) Interference Word List Learning (number of words learned from a 10 word interference list), (3) Category Fluency Switch Condition (number of correctly associated words within 1 min from two continuously alternating semantic categories), (4) Modified Stroop Task (speed of correctly naming animal silhouettes either in a congruent or incongruent condition with printed animal names on them), (5) Word List Recall (number of correctly recalled words from the initially learned word list after a short delay). For easier inter-test and inter-subject comparisons, individual test scores were z standardized based on normative data from N = 158 German-speaking healthy young adults as laid out in detail elsewhere [21].

PASAT: Paced Auditory Serial Addition Test

The PASAT 3-s version is a widely used cognitive screening test in MS tapping into processing speed, divided attention and working memory. PASAT data were extracted from the Multiple Sclerosis Functional Composite (MSFC) [22]. Participants are asked to add numbers in a 1-back-like fashion during a continuous auditory presentation (one number presented every 3 s) and verbally state the correct sums continuously. Outcome measure is the number of correct calculations during a fixed time period. Administration was carried out in accordance with the manual including a preceding training trial and the use of a parallel version at follow-up. Analogous to the MUSIC data, individual PASAT test scores were z standardized, stratified for age and education based on normative data from a German sample of N = 241 healthy controls [23].

Across all cognitive tests (i.e., subtests of MUSIC and PASAT), a normative z score of − 1.645 was used as a cut-off for “impaired performance” as this value approximately represents the 5th percentile rank. Following the criterion put forth by Amato et al., impaired performance in two or more subtests was required to classify individual patients as having CI [6]. Additionally, an unweighted mean z score of all cognitive tests was calculated for each patient as a proxy for overall severity of CI.

Prediction parameters

A priori-considered predictors for CI and longitudinal change are depicted in Table 1. Besides general sociodemographic factors known to influence cognitive status, we examined a range of previously discussed disease-specific risk factors for CI in MS [9]. In total, we considered 17 predictor variables assessed at baseline pertaining to the domains demographics, clinical disease severity markers, MRI ratings of disease burden and self-reports on psychopathology (depressive symptoms and fatigue).

Table 1 Baseline predictors and sample characteristics (total N = 1123)


SPSS 25 (IBM Corporation) was used for data preparation and R 3.3.0 (R Foundation, Vienna, Austria) for statistical computations. Descriptive statistics (means and SD as well as frequencies (%) of impaired cases) for baseline and follow-up cognitive data were computed. Change of CI from baseline to follow-up was evaluated using paired t tests. Linear multilevel models were applied to predict baseline cognitive test values as well as baseline to follow-up changes in cognitive test values and to control for possible dependency between observations gathered in the same participating center. All predictors were entered into the multiple regression model simultaneously so that co-variance between predictors was controlled for. Models were fitted adopting a Bayesian multilevel approach with the brms package [24] using the probabilistic programming language Stan. For all analyses, a 5% significance level was used and Bonferroni correction was applied within each regression model (that is over 18 regression coefficients per model). Prior to analyses, dichotomous variables (e.g., sex, presence of brain atrophy) were dummy-coded to include them into the regression models. Missing values in predictor variables were imputed by means of 20-fold multiple imputation by chained equations using the mice package [25]. The full analysis is available within the Open Science Framework (


Frequencies of patients with and without CI are depicted in Fig. 1a for baseline and follow-up for each cognitive subtest/domain separately. At baseline, a total of 245 (22%) of patients were classified as having CI with the highest frequencies observed in the interference subscore of the Modified Stroop Task (N = 185; 17%) of the MUSIC followed by the PASAT (N = 135, 12%). Other subtests (e.g., verbal learning and memory) were substantially less frequently impaired. At follow-up, the general profile of relatively frequent impairments in processing speed and executive functions compared to other cognitive domains was similar to baseline. However, substantially less frequent impairments were observed across all tests at follow-up (overall CI in N = 120; 14%).

Fig. 1
figure 1

a Frequencies of patients with overall CI (≥ 2 tests impaired compared to age- and education-corrected normative data) and of patients with impairments (z score <− 1.645) in single cognitive tests for baseline (BL) and follow-up (FU) assessments. b Mean normative z scores stratified for age and education for overall CI (mean z score of all tests) and for each cognitive test separately for baseline (BL, left) and follow-up (FU, right)

Regarding the severity of deficits, normative z scores of baseline cognitive tests and significances of changes from baseline to follow-up are presented in Fig. 1b and Table 2.

Table 2 Mean (SD) of unstandardized raw scores and mean normative z scores of cognitive tests for baseline, follow-up and longitudinal change

Additionally, spaghetti plots depicting individual cognitive changes from baseline to follow-up can be found in Supplementary Figure 1 for each subtest.

Compared to normative data, the sample’s average overall cognitive ability was not pathological with a mean of all cognitive tests of z = − 0.06 at baseline. Compatible with frequency data, processing speed (PASAT, z = − 0.20) and executive functions (modified Stroop Test interference seconds, z = − 0.40) were the domains with the lowest performances on average. At follow-up, patients performed significantly better on the mean cognitive z score (z = 0.16 p < 0.0001). Likewise, significant gains from baseline to follow-up were observed in the majority of subtests with the exception of the Stroop Inhibition Quotient and the Learning trial of the Interference word list for which no change occurred.

Results of the multilevel linear regression models are presented for the mean z score of all cognitive tests representing a proxy for overall CI. Regression coefficients of the model including all predictors for baseline CI are provided in Table 3.

Table 3 Regression coefficients for baseline mean cognitive test scores

The proportion of variance explained by this model was R2 = 0.27 when including the variance explained by the participating center and R2 = 0.21 without it. The predictors that remained significant after Bonferroni correction were age (“more CI in older patients”), years of education (“more CI in patients with fewer years of academic education”), EDSS score (“more CI in patients with higher EDSS”), BDI-II score (“more CI in patients with more self-reported depressive symptoms”), and sex (“more CI in males”). Other MS-specific clinical or MRI characteristics did not significantly contribute to the prediction of baseline CI. Regression coefficients of the model including all predictors for the baseline to follow-up changes in cognitive test scores are provided in Table 4.

Table 4 Regression coefficients for baseline to follow-up changes in mean cognitive test scores

No predictor remained significant after Bonferroni correction indicating that longitudinal cognitive change could neither be effectively predicted by the considered baseline variables nor the additional variable of DMT initiation after baseline (yes vs. no). The proportion of variance explained was R2 = 0.06 when including the variance explained by participating center and R2 = 0.05 without it. Likewise, results for each separate cognitive subtest were non-significant regarding the prediction of cognitive change from baseline to 1-year follow-up. These and other additional analyses are provided as supplementary material on


Despite the increasingly recognized burden of CI in MS, little is known about an increased individual risk for CI after initial diagnosis of MS, hampering research on early prevention and treatment. In the current study, we aimed to characterize CI and identify risk factors for its severity and short-term course in a large, clinically homogeneous cohort of patients with first diagnosis of MS or CIS. To this end, neuropsychological screening data from N = 1123 patients enrolled in the multicentric German National MS cohort study were analyzed. We used linear multilevel regression models to predict CI and the short-term progression of CI from conventional MRI characteristics and other clinical and demographic parameters that are usually accessible to clinicians at the time of diagnosis.

Frequency, severity and profile of CI

Adopting conventional criteria of overall CI, we found 22% of patients to be impaired at baseline, with largest deficits in subtests for processing speed and executive function and lowest impairments in verbal learning and memory. The result of a relatively larger impairment in attention and processing speed as compared to other cognitive domains is well in line with previous studies on the cognitive profile of patients with early MS [6, 8] and CIS [1, 3]. Overall frequency and mean severity of CI was lower in our sample than commonly reported: the majority of previous studies found approximately one-third of patients with CI in early MS or CIS [3, 6, 7], although reported frequencies range from < 15 to > 50% [5, 26]. One explanation for this discrepancy may be that the current sample is unique in terms of a homogeneous sample in a very early disease stage with a median disease duration of only 0.33 years [18]. Compensatory mechanisms such as cognitive reserve may attenuate direct measurability of CI specifically in young patients with low overall disease burden and high formal education resulting in lower frequencies [17]. Hence, patients with larger cognitive reserve capacity may be able to compensate for brain pathology despite suffering from clinically relevant CI [13]. An additional explanation for our finding of a lower prevalence of CI in patients with early MS and CIS may be that the employed screening tests are less sensitive to detect CI in these early disease stages that might extend beyond executive and speed-related domains. Reports on the prevalence of CI in MS depends on (a) the employed tests (e.g., screening tests only or extensive test batteries), (b) the formal definition of CI (e.g., one or two standard deviations below the norm; comparison to a control group), and (c) the composition of the sample (e.g., patients with progressive MS show a different degree of CI than patients with early MS or CIS [27]). Internationally accepted standards regarding screening for CI have been proposed in terms of the Brief International Cognitive Assessment in MS (BICAMS battery) and may allow a higher sensitivity to detect relevant CI in MS throughout the different disease stages [28]. For instance, the Symbol Digit Modalities Test (SDMT) has been shown to be a more reliable, and sensitive measure of cognitive processing speed than the PASAT employed in this study [29, 30]. More specific cognitive functions like calculation skills may as well influence individual PASAT results. Thus, while the Modified Stroop Task of the MUSIC was able to detect early deficits in processing speed and executive function, the single-trial ten-item list might be insufficient to reveal subtle memory changes that might unfold in a multiple-trial learning-paradigm.

Predictors of CI and its progression

We found baseline CI to be significantly associated with three general demographic characteristics: male sex, fewer years of education and higher age. These factors have previously been linked to lower (verbal-)cognitive test performance in healthy adults suggesting influences that are not specific to MS or CIS but may, nevertheless, be of clinical importance for the interpretation of MS patients’ test performances [31, 32]. Considering MS-specific clinical characteristics, only EDSS (a marker for mainly physical disease burden) and severity of depressive symptoms (BDI-II) were associated with severity of CI at baseline. These results are in line with previous evidence from large patient samples finding that higher EDSS and depressive symptoms negatively influence cognitive status [10, 27, 33]. Surprisingly, none of the conventional MRI (e.g., visual inspection of atrophy, number of T2 lesions) or other clinical predictors (e.g., type of disease CIS/RRMS, total number of relapses) that have previously been directly linked to CI and its long-term course contributed to prediction. This result may again cast doubts on the sensitivity of the employed screening tests to reliably detect CI in early disease stages. In the current sample, however, brain pathology and disease severity were also homogeneously low and relationships between CI and conventional markers for structural brain damage may be generally weak in early MS, even when using more sophisticated neuropsychological assessments. In a recent large cohort study, lack of association between brain pathology (as measured by voxel-based morphometry) and performance in the BICAMS test battery was termed a “clinico-radiological paradox” and attributed to both, stronger compensatory mechanisms (e.g., cognitive reserve) and a statistical restriction of range within a homogeneous sample of patients in early disease stages [13]. Despite the large sample size and the numerous considered clinical, demographic and conventional MRI baseline parameters as well as the variable of DMT initation after baseline, the longitudinal change of cognition over the course of 1 year could not be sufficiently predicted. One explanation may be that, for instance, for the considered MRI parameters and DMT initiation, the categorization was too broad (e.g., dichotomization DMT start yes vs. no, visible MRI atrophy yes vs. no). On the other hand, the follow-up interval of 1 year may be too short to detect clinically relevant changes. However, significant gains in cognitive performance were observed in the majority of patients and in most cognitive subtests. This strongly suggests that test performances in both, MUSIC and PASAT, were substantially influenced by practice effects, potentially masking clinically relevant longitudinal changes after 1 year. A recent review has estimated the average effect size of cognitive retesting in a 12-month interval to be as high as 0.25 while some standard neuropsychological tests reached effect sizes of 0.73 [34]. Likewise in patients with MS, carryover effects from one testing session to another is a frequent problem in longitudinal test designs and common to a range of neuropsychological tests including the PASAT and to lesser degrees also the SDMT [29, 35, 36]. Although alternate test versions matched for difficulty and modern regression-based normative data (including estimates for retesting effect-sizes) may attenuate the influence of practice effects, few standardized cognitive tests employed in testing patients with early MS provide these features. Moreover, despite the use of an alternate version in the PASAT in this study, patients on average performed significantly better at follow-up, highlighting a likely influence of familiarity that is not dependent on the particular stimuli. Practice effects may endure for approximately 1 year after a baseline assessment and are most pronounced between the first and second evaluations [37, 38]. This is, particularly, true for tests assessing memory, learning and executive functions while visuo-perceptive tasks are less prone to practice effects [39]. Hence, the difference of some cognitive tests in their resilience against practice effects has to be considered more rigorously when planning re-evaluation schedules. Moreover, additional cognitive testing (performed outside of the study or by patient self-assessment and training) needs to be controlled for.


In patients first diagnosed with MS or CIS, demographic characteristics (male sex, higher age, lower education) as well as more severe depressive symptoms (BDI-II) and higher physical disability (EDSS) are significantly associated with severity of CI. In patients with these characteristics, neuropsychological monitoring and potentially cognitive rehabilitation should be considered. No other disease-specific clinical or conventional MRI parameters from clinical routine were significantly related to the presence of CI in this large cohort of patients in earliest disease stages. Moreover, longitudinal prediction of short-term cognitive change over the course of 1 year was insufficient despite the large number of patients and the inclusion of numerous conventional yet disease-specific and previously discussed predictor variables. These findings indicate that three branches of research are highly needed to increase our understanding of CI, its clinical relevance and its risk factors in early MS to blaze the trail for early interventions: (1) establishment and evidence-based proof of sensitive and change-sensitive cognitive outcome parameters providing free-to-use longitudinal normative data. (2) Evidence that these assessments are able to detect disease-specific and clinically relevant CI (i.e., by validation with patient-centered outcomes) from the earliest to advanced disease stages. (3) Improving the prediction of these measurements by the development of refined clinical scales and standardized automation of MRI parameters for use in clinical routine [40].