Introduction

Multiple sclerosis (MS) is a chronic autoimmune, inflammatory neurological disease of the central nervous system [1]. MS can lead to a range of physical, psychiatric and cognitive impairments [2]. Cognitive difficulties have a negative impact over and above physical impairments. Cognition affects employment, disease management, personality, and many aspects of psychosocial function [3,4,5]. Patient self-report of cognitive performance is confounded by psychosocial variables, most markedly depression [6], and health professionals are poor at detecting cognitive impairment in routine clinical consultations [7]. Objective evaluation of cognitive status in MS is therefore required.

Traditional Cognitive Batteries for MS

Over the last 30 years, a handful of batteries have been developed to assess MS cognition. The seminal work of Rao and colleagues resulted in the Brief Repeatable Battery (BRB), which comprises the Selective Reminding Test (SRT, verbal memory), the Symbol Digit Modalities Test (SDMT, information processing speed), the Paced Auditory Serial Addition Test (PASAT, information processing speed), the 10/36 Spatial Recall Test, and the Word List Generation Test (WLG, executive skills) [8]. The battery takes about 45 min to complete. Until recently, the Rao battery was the most widely used assessment of MS cognition, with validation in several countries and frequent use in pharmacological trials. In an effort to develop a more comprehensive battery to assess MS cognition, an expert group was convened and recommended the Minimal Assessment of Cognitive Function in MS (MACFIMS) [9]. The MACFIMS replaces the 10/36 with the BVMT-R for visual memory, replaces the SRT with the CVLT-II for verbal memory, and adds the Judgment of Line Orientation (spatial skills) and the Delis–Kaplan Executive Function System (D-KEFS) Sorting Task (executive skills–flexibility). This battery takes about 90 min to complete. The MACFIMS has been validated in a few countries and used quite extensively in research. However, both of these batteries must be administered by a trained neuropsychologist, which limits their feasibility for widespread use in routine clinical assessment. In addition, the tests have been validated in few countries outside of the United States, which limits their validity for multinational trial use.

The Development of the Brief International Cognitive Assessment for Multiple Sclerosis (BICAMS)

Historical Background

The Brief International Cognitive Assessment for MS (BICAMS) was designed by an expert consensus group to facilitate the assessment of cognition in MS [9]. In June 2010, an expert committee of seven neurologists and five neuropsychologists convened to develop recommendations for a clinical tool for neurologists and healthcare professionals working with people with MS. The purpose of the monitoring instrument would be for baseline ratings and regular follow-up assessments, which could be incorporated into routine clinical practice. It was decided that the recommended battery should be completed in 15 min and should not require any special equipment (beyond paper, pen and stopwatch). It should also be appropriate for international use (i.e. translation).

Development and Content

The committee critically evaluated the available cognition scales, their psychometric properties and their feasibility in the clinical setting. The group agreed that the monitoring tool should assess the domains of information processing speed and immediate verbal and visual recall, and would be sufficiently specific and sensitive to measure significant cognitive impairment in large clinical samples. The following three tests were selected.

Symbol Digit Modalities Test (SDMT)

The Symbol Digit Modalities Test (SDMT) was selected to measure information processing speed. The test consists of a number of single digits, each paired with a particular abstract symbol. The patient is presented with rows of the nine symbols that are arranged pseudo-randomly, and must say the numbers that go with each symbol in turn. The SDMT can be completed within 5 min, including the delivery of instructions and time allocated for practice and testing. The SDMT has shown superior psychometric properties. In particular, it has been reported to have high sensitivity of 82% to cognitive impairment in MS [10,11,12] and cognitive change [13,14,15], and moderate specificity of 60% [16], and has been validated in several countries [10,11,12]. It is most closely linked to magnetic resonance imaging (MRI) parameters [17]. The SDMT also has good external clinical validity and is associated with current [18] and future employment status [15].

California Verbal Learning Test-II (CVLT-II)

The California Verbal Learning Test-II (CVLT-II), which comprises five learning trials, is an examination of immediate verbal recall. In the CVLT-II, patients are read a list of words at a slightly slower rate than one item per second, and asked to recall as many items as possible in any order, across five trials. The test is a 16-item word list, with four items belonging to each of four categories, arranged randomly. The first five recall trials of the CVLT-II have a high degree of interdependence compared to other sections [19, 20], and the committee decided that they had sufficient psychometric rigour to be sensitive to MS impairment when used alone [19, 20]. Total time to administer the CVLT-II five recall trials is 5–10 min including instructions, testing and responses. The full CVLT-II also has external clinical validity and has been shown to be able to differentiate employed MS patients from patients unemployed due to MS [20].

Brief Visuospatial Memory Test–Revised (BVMT-R)

The Brief Visuospatial Memory Test–Revised (BVMT-R) is an assessment of immediate visual recall. The first three recall trials of the BVMT-R were selected by the committee. The psychometric properties of the BVMT-R recall trials are good [21]. The BVMT-R requires the patient to observe a 2 × 3 stimulus array of abstract geometric figures. There are three learning trials, each 10 s in length. The array is hidden from view, and the patient is required to draw the geometric figures in the correct position from memory after each 10 second exposure.

International Validity

The BICAMS committee subsequently published an international validation protocol [22]. A number of countries have published national validations. The aim of the present systematic review and meta-analysis is to synthesise the relevant national validation literature regarding BICAMS.

Method

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was followed as a guide [23] for standardised conduct and reporting of the current systematic review and meta-analysis. According to the Cochrane Database (searched January 2018), there are no previously published reviews in the proposed area; therefore, the current review is the first to synthesise studies related to the validation of the BICAMS.

Systematic Literature Search

Search terms were developed and used to identify studies which were conducted as part of the BICAMS international validation protocol (Table 1). These keywords were searched within the title or abstract of the PubMed, PsycINFO, Medline and PsycARTICLES databases in January 2018.

Table 1 Search terms for systematic review

Eligibility Criteria

To refine the studies which were included as part of the international validation of the BICAMS protocol, only the following eligibility criteria were applied. The inclusion criteria for the studies in the present review were as follows: (a) peer-reviewed studies with no date restriction, written in the English language; (b) samples including adults with any clinical subtype of MS (or subtype combination); (c) studies which were undertaken as part of the international validation of the BICAMS protocol. The exclusion criteria were studies which included BICAMS but were not part of the international validation of the BICAMS protocol.

The additional criteria for inclusion in the meta-analysis were as follows: (d) studies included a healthy control (HC) comparison group; (e) they reported standard quantitative information based on the SDMT, CVLT-II and BVMT-R subscales (mean, standard deviation and sample size) of the MS cases and HC comparison group; and (f) there were a minimum of four studies, as specified by Rosenthal [24], which met the above criteria to be included in a meta-analysis.

To examine the eligibility criteria, all titles and abstracts which were returned were screened. The full texts of all studies considered to meet the eligibility criteria were accessed (see Fig. 1).

Fig. 1
figure 1

PRISMA flowchart for selection process of studies in systematic review and meta-analysis

Data Extraction

A headed table was developed to guide the extraction of relevant information from full texts, and to assess their eligibility for the final review. Extraction was initially carried out by the authors (FC and DL), and studies were organised according to whether they met criteria for the systematic review, meta-analysis or both. Following data extraction, 21 studies were excluded from the final review according to the exclusion criteria. Extrinsic moderators, including participant characteristics, were extracted from the 16 short-listed studies, comprising recruitment selection, diagnostic criteria, age, and subtype of MS, disease duration and time since diagnosis, education, and score on the Expanded Disability Status Scale (EDDS) [25]. Where appropriate, the characteristics of the control group were detailed, including sample size and age. The profession of the examiner administering the BICAMS was also recorded. Fourteen studies met the criteria for the meta-analysis from those included in the systematic review. For the meta-analysis, the standard quantitative information based on the subscales of SDMT, CVLT-II and BVMT-R (mean, standard deviation and sample size) of the MS cases and HC comparison group were extracted for baseline assessments of the BICAMS. All of the relevant data for the present review and meta-analysis was obtained from numerical information in texts, tables and graphs, and statistical analysis.

Quality Assessment

The quality ratings for the studies included in the systematic review were derived from the Effective Public Health Practice Project (EPHPP) [26] quality assessment tool. This instrument assigns ratings in quantitative studies according to selection bias, study design, confounders, blinding, data collection methods, and withdrawals or drop-outs. A total quality rating can be derived from the individual ratings of the measures. The EPHPP is particularly useful for examining the quality of studies in health care settings and has previously demonstrated strong inter-rater reliability [27]. The two authors (FC and DL) independently examined the articles, and any disagreements were discussed and resolved.

Statistical Analysis

The meta-analysis was conducted using the Comprehensive Meta-Analysis version 3 software program [28]. Three individual analyses were performed based on the averages of the SDMT, CVLT-II and BVMT-R subscales, which measure information processing speed and immediate verbal and visual recall, respectively. Effect sizes were calculated as standardised mean differences with Hedges’ g using the following interpretation: 0.2 = small; 0.5 = medium; 0.8 = large. Hedges’ g was selected because it offers the same interpretation as Cohen's d, while correcting for any potential biases that occur from small sample sizes, whereas Cohen's d is disposed to overestimating the absolute value of the standardised mean difference in small samples.

The effect size was modelled using a random-effects model, which estimates a mean of a distribution of effects, to examine the degree of difference between the MS group and the HC based on their performance across the former BICAMS subscales. The random-effects model assumes that the allocation of study weights is based on the inverse of the total variance, which includes both within- and between-study variance. Compared to the fixed-effect model, this model yields a wider confidence interval (CI) when there is significant heterogeneity among effect sizes. The predicted direction of results was that HC would have superior BICAMS subscale totals compared with adults with MS.

Heterogeneity was assessed using Cochran’s Q test, and the magnitude of heterogeneity using the I2 statistic. A significant Q statistic indicates dissimilar effect sizes across the included studies, and methodological or sample differences might be introducing variance in the results across studies. The I2 statistic assesses the proportion of heterogeneity across studies not due to random error and is interpreted as a small (25%), moderate (50%) or high (75%) level of heterogeneity [29].

Forest plots were created to visually assess for the presence of outliers. Sensitivity analyses were conducted to assess for publication bias by visually inspecting funnel plots of standardised mean differences against standard error, and then assessed using Egger’s test of funnel plot asymmetry [30] and Rosenthal’s fail-safe N [24]. If publication bias was indicated (Egger regression test: p < 0.1), then the trim-and-fill method [31] for random-effects models would be applied in order to impute “missing studies” to redress funnel plot asymmetry with adjusted pooled effect sizes and 95% CIs reported after the addition of potential missing studies.

Compliance with Ethics Statement

This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Results

Systematic Review of BICAMS Validation Studies

Systematic Review Overview

Sixteen studies met the criteria for inclusion in the systematic review (Table 2). The BICAMS has been widely validated across 14 countries: the United States, Argentina (two papers), Belgium, Brazil, Canada, Czechoslovakia, Greece, Hungary, Iran, Ireland, Italy (two papers), Japan, Lithuania and Turkey. BICAMS was validated twice in both Italy and Argentina. From its original form in English, the instrument has been extensively translated into 11 languages, including Dutch, Czech, Greek, Hungarian, Italian, Japanese, Lithuanian, Persian, Portuguese, Spanish and Turkish. It is important to note the distinction between language and culture. BICAMS was validated across three English-speaking countries (USA, Canada and Ireland), since the cultural norms of these specific populations are markedly different, and this is likely to interfere with the reliability of cognitive testing (e.g. [32]). The proportion of individuals tested in the current study relative to the estimated total number of those individuals in the country [33] is presented in Table 3.

Table 2 Study design and participant demographics
Table 3 Comparison of number of people with MS in validation studies and total number in each country

Quality Rating

The overall quality of the studies ranged from “moderate” to “weak” on the EPHPP template [26]. It is interesting to note that high-quality validation studies are rated as “weak” on several dimensions (e.g. “blinding” and “data collection method”), because the requirements for stringent international validation do not coincide with the parameters typically applied to general studies. This was evidenced by the polarity in scoring, with other dimensions rated strongly (“selection bias” and “study design”). Sensitivity for assessing the quality of validation studies using this tool is likely to be unsatisfactory for this reason.

Recruitment Method

Adults with MS were recruited from a variety of locations, including university hospitals, MS centres, specialist clinics and tertiary referral centres. In comparison, HC were either recruited from the community or an established normative sample, or were known to the adults with MS.

Sample Size

Thirteen studies included a group of adults with MS and an HC group, while two studies did not. The studies which deviated from this sampling style included either an HC group or an MS group only. The sample size differed greatly between studies. In adults with MS, this ranged from 369 to 44, while for HC the samples ranged from 200 to 20. Three studies contained equal-sized groups.

Gender

The gender ratio in the majority of the samples of adults with MS and HC disproportionally favoured women. This ranged from 61 to 80% in those with MS and from 55 to 86% in HC. It is likely that this trend is related to the gender differentiation in presentation, as the sex ratio in MS disproportionally favours women to men [34]. However, there was one anomaly: the gender ratio favoured mem in just one study, which is likely to have been influenced by the relatively small sample size (32% women with MS; 10% healthy women). Only two studies reported groups that were exactly matched on gender (61 and 75%).

Mean Age

There was a relatively consistent range of mean ages among adults with MS reported across the studies, from 34.0 (10.0) to 45.4 (9.9) years. The age of HC followed a similar spread, 33.7 (9.5) to 45.2 (9.9). The average age of adults with MS was 39 years, and among HC was 40 years.

Diagnosis and Selection Criteria

There was a great deal of disparity between the inclusion and exclusion criteria of the studies in the review. The most common form of diagnosis of MS was through the revised McDonald criteria [35] or later versions.

Type of MS

The most common subtype of MS represented across studies was relapsing-remitting (RRMS); other subtypes included secondary-progressive (SPMS), progressive-relapsing (PRMS), primary-progressive (PPMS) and clinically isolated syndrome (CIS). Three studies included only RRMS, while the remaining studies included a mix, although this was not uniformly represented in each study.

Disease Duration and Time Since Diagnosis

The average disease duration varied among studies from 6.07 (5.08) to 12.97 (7.16) years.

BICAMS Type and Mode of Delivery

All of the studies used the BICAMS paper version with the MS or HC groups. As noted above, various translations of the BICAMS have been created as part of the international validation protocol. Three studies reported the profession of the examiner: a neuropsychologist, MS nurse specialist and PhD student. Most studies did not report who administered BICAMS.

Meta-Analysis of BICAMS Validation Studies

The results from all three subscales of BICAMS—SDMT, CVLT-II and BVMT-R—showed that adults with MS performed significantly worse than controls. To examine the overall effect size for this, a meta-analysis was conducted.

SDMT

Figure 2 depicts a forest plot showing the effect size for each study which included SDMT. Overall, information processing speed was significantly lower in adults with MS than in HC, with a large effect size (g = 0.943, 95% CI 0.839, 1.046, p < 0.001). There was no evidence of outliers, heterogeneity (Q = 20.65, p > 0.050) or publication bias (Egger test: p > 0.050, two-tailed). The funnel plot (see Fig. 3) shows that the effect sizes were symmetrical. Trim-and-fill analysis [31] estimated that there were no studies missing from the analysis.

Fig. 2
figure 2

SDMT forest plot

Fig. 3
figure 3

SDMT funnel plot

CVLT-II

The effect size for each study which included the CVLT-II is shown in the forest plot in Fig. 4. Overall, verbal memory for immediate recall was significantly lower in adults with MS than in HC, with a medium effect size (g = 0.688, 95% CI 0.554, 0.822, p < 0.001). There was no evidence of outliers, although heterogeneity was indicated (Q = 36.07, p < 0.001) to a moderate extent (I2 = 63.95). Using the trim-and-fill method, one study would need to fall on the left of the mean effect size to make the plot symmetrical (see Fig. 5). Assuming a random-effects model, the new imputed mean effect size remained medium (p = 0.674, 95% CI 0.541, 0.808). The Egger test remained non-significant (p = 0.735, two-tailed), indicating that there was no publication bias.

Fig. 4
figure 4

CVLT-II forest plot

Fig. 5
figure 5

CVLT-II funnel plot

BVMT-R

Figure 6 displays the forest plot of effect sizes for the studies which included the BVMT-R. Overall, immediate recall of visual memory was significantly lower in adults with MS than in HC, with a medium effect size (g = 0.635, 95% CI 0.534, 0.736, p < 0.001). There was no evidence of outliers or heterogeneity (Q = 20.694, p > 0.050). The funnel plot presented in Fig. 7 shows that the effect sizes were symmetrical, and the trim-and-fill analysis estimated that there were no studies missing from the analysis. There was no evidence of publication bias according to the Egger test (p = 0.801, two-tailed).

Fig. 6
figure 6

BVMT-R forest plot

Fig. 7
figure 7

BVMT-R funnel plot

Sensitivity

Only one paper compared BICAMS to a longer battery [36]. In this large study, applying the operational criterion of “one or more abnormal tests”, BICAMS identified 58% of MS patients as cognitively impaired, while the MACFIMS [37], applying the MACFIMS operational criterion of “two or more abnormal tests”, identified 55% of MS patients as cognitively impaired. Interestingly, for the MS patients with disease duration over 21 years (n = 25), BICAMS identified a greater proportion as cognitively impaired (96 vs 76% on the MACFIMS).

Discussion

Summary of Findings

The aim of the current paper was to synthesise the national BICAMS validations to date. A total of 16 studies were included in the systematic review, of which 14 articles were then assimilated into the meta-analysis. The systematic review showed that BICAMS had been widely applied in many different languages, cultures and locations, with a diverse range of clinical samples including different disease subtypes, durations and severity. Most studies had included an HC sample with a similar educational background. BICAMS was administered in person by a neuropsychologist in most cases, although this was underreported. BICAMS was able to identify cognitive difficulties in adults with MS compared to HC, with difficulties identified in all three areas of information processing speed and immediate verbal and visual recall. Cognitive impairment was most marked in the domain of information processing speed. These findings fit with established knowledge and opinion and show that the selected scales can detect cognitive dysfunction in a 15-min battery. BICAMS also relies on previous findings regarding the component scales for its psychometric provenance.

Strengths

Several strengths of this review must be considered. Countries reported that BICAMS could be feasibly administered in around 15 min, with minimal materials, and was recommended for routine clinical cognitive assessment in MS. Within the validation studies there was a wide representation of cultures, languages and locations involved in the initiative. With the success of the international validation protocol and the number of those estimated within the population [33], it is possible to generalise the findings beyond the review. The proportion of samples represented varied between 0.2 and 1.91%, with the largest representation of adults with MS recruited from Lithuania. This is a significant first step in translating an understanding of cognitive impairment in MS globally. The direction of the results was as expected on all three scales; however, of note was that the CVLT-II was the most heterogeneous. This is likely a result of the extra linguistic and cultural demands of the stimuli in the subscale.

Limitations

There are a number of limitations which need to be recognised. In terms of sampling, there was a great deal of heterogeneity in sample size and MS type, ranging from disproportionally mixed subtypes to the inclusion of only one subtype. It is possible this may have biased the size of effect found, as cognitive impairment is more common in progressive forms of the illness. The HC were recruited from the community or were related or known to the adult with MS. In cases where HC were related to the adult with MS, this would pose a threat to the statistical assumption of independence between the samples examined. There was also a disparity in inclusion criteria, with variations of the McDonald criteria used [35], the control of comorbidity and medication, the details of which would have likely contributed to the observed heterogeneity. The aforementioned is likely to be in addition to the unequal gender ratio, in favour of women. It is important to consider the average age of those tested in the current review, and that the degree of cognitive impairment identified by BICAMS is likely to be negatively skewed, as the range of illness duration exceeded 6 years. The studies were cross-sectional, and baseline rather than re-test scores were included in the review; therefore, further longitudinal data is needed to understand the course of cognitive decline. Although BICAMS is reportedly able to be administered by qualified clinical health professionals, due to underreporting in the current studies, it appeared that most BICAMS were delivered by a neuropsychologist. Clinical thresholds of BICAMS scores have been proposed, but these were not recorded in the current review [38]. Few studies included accompanying neurocognitive assessments. Finally, this review provides only an interim analysis of the wider international validation of the BICAMS endeavour, since many countries have not yet published their data.

Conclusions

BICAMS has been shown to be relatively robust across a variety of cultures, languages and locations as a valid measure of cognition in MS. Working together with multiple sources of funding, the international MS community has participated in this endeavour to great success in improving optimisation of cognitive assessment. As a result of this international cooperation, a psychometrically sound assessment has become available, leading to increased awareness of MS cognition, which will improve cognitive symptom management.

Future Directions

A further dozen international validations are in progress. The range of countries with access to cognitive assessment for MS is increasing. Large combined datasets are facilitating the development of illuminating regression models [39]. It is anticipated that the new iPad version of the BICAMS, which is currently undergoing validation, will further extend the validity and reach of BICAMS within the MS community.