Introduction

The mini-mental state examination (MMSE) is a test widely used to screen for cognitive impairment as well as to track development of cognitive function over time (Molloy and Standish 1997). The test comprises tasks that examine various cognitive functions and is relatively easily administered (Folstein et al. 1975).

Panel studies and other large data collections among older adults are dependent on efficient design to ensure high response rate and participation in follow-ups. Restrictions in time and magnitude may be crucial as extensive interviews can be tiring for the participant (Lundberg and Thorslund 1996). Different methods of collecting data might therefore be required in order to obtain a representative sample (Kelfve et al. 2013). Such methods may entail face-to-face interviews, interviews by telephone, and by proxy interviews. Alternatives to direct interviews may enable participation by individuals that otherwise may be constrained due to age or impairments (Fong et al. 2009).

Different cognitive screening tests are commonly used in studies of older adults, and the MMSE is one of the most well-known and used tests. However, with the exception of MMSE and a few others, they are rarely validated (Cullen 2007). Several short forms and versions of MMSE have been found to correspond well to the original MMSE which, even if concise, may sometimes be too extensive to include in multipurpose surveys. The accuracy of previous short versions has depended on the items included, where the cutoff was set and, to some extent, how and in what context the version was used (Davies and Larner 2013).

The two abridged versions tested in this study were developed for use in multipurpose studies. They were initially designed for the SWEOLD study: a Swedish panel study with a representative sample from the older population (≥77 years) (Lennartsson et al. 2014). Abridged versions were needed because SWEOLD data collection was broad in scope with very limited time allocated to cognitive screening. Moreover, the advanced age of the population created a concern that many might be exhausted by lengthy interviews. Items were selected from the standard, while Swedish full scale (Palmqvist et al. 2013). The selection of items drew in part on an earlier study by Braekhus et al. (1992) to identify the most efficient items for identifying cognitive impairment. Selection was further guided by theoretical (e.g. to include most of the cognitive domains) and pragmatic considerations (e.g., ease of administration, time constraints) (Parker et al. 1996). These abridged versions have been used in a range of research papers, using the average score (Fors et al. 2009; Parker et al. 2013), a cutoff (Meinow et al. 2011), or both (Andel et al. 2007, 2011).

The aim of this paper is to present a validation of the two abridged MMSE scales, using data from two large nationally representative studies that include cognitive screening data and clinical diagnosis of dementia: the study of dementia in Swedish twins (HARMONY) in Sweden, and the aging, demographics, and memory study (ADAMS), in the US. One of the abridged scales is intended for use in face-to-face interviews (MMSE-SF), and the other is a complementary version which does not require any physical engagement from the respondent, making it viable for use in telephonic interviews (MMSE-SF-C).

Methods

The data used for the analyses are from two different national panel studies: HARMONY (Gatz et al. 2005) and ADAMS (Langa et al. 2005). Both are substudies, focused on cognitive impairment and dementia, with population samples derived from two large national cohort studies: the Swedish Twin Registry (Lichtenstein et al. 2002) and the Health and Retirement Study (Sonnega et al. 2014).

Participation in HARMONY included an initial screening phase and a subsequent clinical phase. The clinical phase consisted of both in-home physical examination and neuropsychological testing. A final clinical diagnosis was given in accordance with DSM-IV. The diagnosis allowed for three outcome categories: dementia, questionable dementia and no dementia (Gatz et al. 2005). Questionable dementia corresponds to meeting two of the three DSM-IV diagnostic criteria for dementia (impaired memory, other cognitive disturbance, and difficulties in functioning). In the present study, we included participants of 75 years or older who completed the clinical phase (n = 794).

The clinical assessment in ADAMS contains a large variety of neuropsychological tests. The clinical diagnosis was based on DSM-III-R and DSM IV. There were three outcome categories: dementia, cognitive impairment with no dementia (CIND), and no dementia. CIND was defined as self or informant reported cognitive impairment that did not meet the criteria for dementia or reached the threshold for impairment in each cognitive domain (Langa et al. 2005). In this study, we included only those who were 75 years or older (n = 648). On account of a variation in completion of the different items, the sample size also differed slightly between the full MMSE (n = 576), the MMSE-SF (n = 594) and the MMSE-SF-C (n = 638). However, only subjects with no missing data from the full MMSE test were included in the analyses.

The full MMSE covers 11 domains: registration, orientation, recall, attention, or calculation (serial sevens or spelling), naming, repetition, comprehension (verbal and written), writing, and construction. The items included in the abridged versions are shown in Table 1. The items that were applied in the two abridged versions were compiled from the full versions that were available in both HARMONY and ADAMS. The scoring for registration and attention was adjusted to be in better proportion to the other items. For registration, a correct repetition of all three words was scored as one point. For attention (serial sevens), each correct answer was given 0.4 points for a maximum of two points. The two abridged versions are identical except for one item. The MMSE-SF included the task of copying a figure. The MMSE-SF-C instead included one additional orientation item.

Table 1 Overview of items included in the two abridged versions of MMSE (max score 11 in both abridged versions)

The abridged scoring algorithms were applied to corresponding MMSE items from both HARMONY and ADAMS, such that each subject had scores on the full MMSE and the two abridged versions. The orientation item State was not included in the HARMONY data collection and was instead replaced with the item Country. These two items had comparable percentage of correct responses (in HARMONY, Country 91 % and in ADAMS, State 90 %).

The MMSE was treated both as a continuous and a dichotomous variable. In the dichotomized alternative, a cutoff was applied to separate the cognitively impaired and the cognitively nonimpaired, while a <24 cutoff for dementia is commonly used (Bassett and Folstein 1991), as there is no conclusively defined cutoff for the MMSE. Analyses were performed to estimate optimal cutoffs based on the best-compiled outcome from a range of sensitivity and specificity levels when testing the continuous scale against a dichotomous test of reference. Optimal cutoffs were performed using the roctg command in STATA. Initial analyses used the clinical diagnosis as the reference test in each dataset. Subsequently, analyses were performed using the full MMSE as reference to the abridged scales. Based on the aggregated results from these analyses, cutoffs of <24 for the full scale and <8 for the two abridged scales were adopted for evaluating sensitivity and specificity.

The clinical diagnosis in each dataset was used as the reference test in the analyses for validity. The three subcategories; normal cognitive function, questionable dementia/CIND, and dementia were recoded into a dichotomous variable where dementia and questionable dementia/CIND both were coded as presence of disease.

Statistical analyses

Difference in proportions of cognitively impaired in the abridged scales compared with the full MMSE were assessed with Chi-square tests and Fisher´s exact test. In order to test the validity of the two abridged scales, sensitivity and specificity levels as well as the receiver operating characteristics (ROC) were calculated. The analysis of sensitivity and specificity shows the agreement between the applied cutoffs and a clinical diagnosis. Sensitivity is the rate of subjects with the condition that also get a positive test result. Specificity is the rate of subjects without the condition that get a negative test result. Based on this, the positive predictive value (PPV) and the negative predictive value (NPV) can be estimated. PPV gives the probability that a positive test also means that there is the presence of disease. NPV, on the other hand, is the probability that a negative test means that there is the absence of disease.

The ROC curve graphically shows the validity conditions by testing the whole scale and its agreement with other alternative tests as well as the reference test. ROC analysis was carried out in order to test how the abridged MMSE scales corresponded to the full MMSE scale in terms of accuracy with the reference test. The three tests were tested simultaneously against the clinical diagnosis but separately for the two datasets. A numerical value can be calculated to describe the area under the ROC curve (AUC). The AUC is a value derived from the unit square but as 0.5 equals a random result, the AUC value will be between 0.5 and 1.0, while 1.0 is equivalent to full correspondence to the test of reference (Fawsett 2006).

Results

Demographic characteristics of the HARMONY and ADAMS samples, and mean scores and proportions scoring below cutoffs on the full MMSE and the two abridged scales, for the total sample and for demographic and diagnostic subsets, are presented in Table 2. There were statistically significant differences between the proportions of the populations classified as cognitively impaired with any of the short forms compared with the full MMSE, with the exception of the dementia category in the ADAMS sample regarding the MMSE-SF-C. In general, the full MMSE classified slightly more participants as cognitively impaired compared with the abridged forms.

Table 2 Statistics of HARMONY (n = 794) and ADAMS (n = 576), including the cognitive scales, the full MMSE (0–30), the MMSE-SF (0–11), and MMSE-SF-C (0–11)

As the two datasets had different criteria for including participants in the in-home assessment for dementia, and different diagnostic criteria for the middle category, the proportions with dementia and questionable dementia/CIND differed. In the HARMONY data, 49.1 % were diagnosed as not having dementia, while 14 % were questionable, and 36.9 % had dementia. In the ADAMS data, 35.4 % were diagnosed as not having dementia, 30.9 % as CIND, and 33.7 % as having dementia.

Mean values of MMSE scores and the proportion of cognitive impairment, based on the appointed cutoffs (<24 and <8), varied in relation to demographic factors. The MMSE cutoffs corresponded predictably to the categories of clinical diagnosis. As seen in the right-hand half of Table 2, among those with dementia, more than 95 % scored below the MMSE cutoff. Among those with normal cognitive function, about 25–33 % in the HARMONY sample and 12–18 % in the ADAMS sample scored below the MMSE cutoff.

The rates of sensitivity and specificity are presented in Table 3, comparing MMSE cutoffs to clinical diagnosis. In both datasets, the measured values of sensitivity, specificity, PPV, and NPV were similar for the full MMSE and for both abridged versions. In the analyses with the HARMONY data, sensitivity levels were overall high (>90 %), while the levels of specificity were lower. Levels of PPVs and NPVs were also consistent across tests. Sensitivity levels from the ADAMS analyses were moderate (<80 %) in all the three versions of the test, while specificity rates were higher. The PPVs in the ADAMS data were comparably high, while the NPVs were lower in comparison (<70 %); however, the levels did not differ much between versions. Significance testing of similarity between the tests within each dataset showed that the different versions did not have significantly different sensitivity or specificity levels (Table 4). Additional validity tests were performed on stratified samples based on sex, education, and age groups: these results did not indicate any marked differences in all measured values of validity within the different strata. However, it should be noted that the statistical power for these tests was limited.

Table 3 Validity tests of the three versions of the test on data from HARMONY and ADAMS
Table 4 Significance test for testing if the sensitivity levels are equal (H0: p1 = p2)

The ROC curves for predicting dementia with the full version of the MMSE and the two abridged tests showed similar results for all three versions (Supplementary material). The three versions had the following unadjusted AUC values: full MMSE = 0.87, MMSE-SF = 0.89 and MMSE-SF-C = 0.89. After adjusting for gender, age, and education the AUC values were: full MMSE = 0.85, MMSE-SF = 0.87 and MMSE-SF-C = 0.86.

Discussion

The aim of this study was to validate two abridged versions of the MMSE. The results show that both versions had validity comparable to the full MMSE in relation to the clinical diagnoses. These findings were consistent in both the Swedish (HARMONY) and the US (ADAMS) data.

A limitation of the study is that the two abridged tests were not collected independently from the participants in HARMONY and ADAMS, but instead the items were derived from the full original MMSE tests that were administered to the participants in those studies. Additional limitations stem from the restricted inclusion criteria in the ADAMS sample. Many participants were excluded due to missing items in the original MMSE and, more importantly, they were not missing at random, as a large share belonged to the dementia category.

However, a prominent feature of the abridged tests is that in some contexts, they can achieve a higher response rate (Fong et al. 2009). This involves both the actual participation in a test as well as the probability of completing all items. That is, the likelihood that a frail individual will participate in a test may be dependent on length and scope of the test. The complementary version (MMSE-SF-C) may therefore enable interviews with groups that are not available for face-to-face interviews and may also be appropriate for subjects with vision impairment or other physical impairments. While the two tests may have slightly different applications, they were both comparable and performed equally well. An additional benefit of these validated scales in comparison to other available short scales, e.g., TICS (Brandt et al. 1988) and COGTEL (Kliegel et al. 2007), is that they are comparable to those of the MMSE. Although the original MMSE has been shown to be imprecise in differentiating between clear-cut dementia cases and cases of questionable cognitive impairment (Mitchell 2013), it is still the most widely used test. This therefore allows for comparisons both between studies and nations. With the exception of the high sensitivity levels in HARMONY, the validity levels were moderate for the two abridged tests but comparable with the original MMSE. The lower sensitivity rates (for all versions of the test) in the ADAMS sample compared with the HARMONY sample can probably be attributed to the difference in proportions in the questionable dementia/CIND categories, reflecting differences in the criteria for questionable dementia and CIND. Adjustments of age, gender, and education lowered the validity of all tests somewhat, but there was no loss of predictive precision when using the abridged forms rather than the full MMSE. This ultimately means that short scales are comparable with the full-length version.

Even if the original MMSE is relatively quick to administer it might still be too demanding for older people taking part in an already lengthy study. Our findings suggest that these two abridged versions of the MMSE have adequate validity and perform well against the original MMSE, and may therefore be feasible alternatives that can be helpful in reaching more participants and to ensure that samples are more representative of the population. The abridged versions could therefore be alternatives worthy to consider in larger population studies where interview length is restricted and the respondent burden is high.