Introduction

Age-related cognitive decline, potentially associated with decline in other neurological functions such as motor function (Baltes and Lindenberger 1997; Oh et al. 2011; Taylor et al. 2010; Vidoni et al. 2010), has been well documented in rodents, nonhuman primates, and humans (Bondi et al. 1995; Geinisman et al. 1995; Rapp and Amaral 1991). With increased human aging in the current population and associated risk to develop dementia, it is important to detect age-related cognitive decline in the absence of frank dementia (Bondi et al. 1995; Jolles et al. 1995; Taylor et al. 1992). The most widely used cognitive task in the elderly is the mini-mental state examination [MMSE; (Folstein et al. 1975; Helkala et al. 2002; Soto et al. 2005)]. While useful for identifying frank dementia, MMSE may not detect more subtle age-related cognitive changes (Berteau-Pavy et al. 2007; Haley et al. 2010; Leveille et al. 1998). Thus, different tests are needed to identify these changes. Other tests including word recall and logical memory (Wechsler 1997) are becoming more widely used in the clinic and are sensitive to the effects of dementia (Der et al. 2010; Lautenschlager et al. 2008; Mikos et al. 2010; Sachdev et al. 2009; Villemagne et al. 2008), and these tests have unequivocal findings regarding age-related changes in absence of dementia in people over 60 years of age (Nyberg et al. 2003). In addition, clinical dementia rating (CDR) scale, CDR sum of boxes, collateral clinical dementia rating scale (CCDR), and CCDR sum of boxes are useful in identifying clinical cognitive decline at a very early stage in the elderly (Cedarbaum et al. 2010; Dreyfus et al. 2010; Lynch et al. 2006; Rossetti et al. 2010). These tests are a crucial component of the quality of life in the elderly and critical for considering initiation of therapeutic options.

Contrary to these tasks, the novel image–novel location (NINL) object recognition task (Rizk-Jackson et al. 2006) is a sensitive test to assess cognition in nondemented elderly (Berteau-Pavy et al. 2007). NINL performance is sensitive to the effects of apolipoprotein E4 (Berteau-Pavy et al. 2007; Haley et al. 2010), a risk factor for developing age-related cognitive decline (Adak et al. 2004; Aggarwal et al. 2005; Brayne et al. 1996; Howieson et al. 2003; Wilson et al. 2002) and Alzheimer’s disease (Corder et al. 1995; Corder et al. 1993; Farrer et al. 1997; Poirier et al. 1995). Moreover, the NINL test does not assess verbal memory, as do MMSE, word recall lists, and logical memory, but rather visuospatial memory which may be more sensitive to the early effects of mild cognitive impairment or Alzheimer’s disease (Iachini et al. 2009).

As little information is available on NINL test–retest reliability and ability to detect cognitive changes, performance on the MMSE and the NINL tests were compared over a 4-year period. Furthermore, as low scores in word list recall and logical memory tests are reported to indicate cognitive decline, we assessed correlations between the NINL and those tests in a final session within a subset of the participants.

Methods

Study participants

Study participants were recruited from two neighboring retirement communities in Portland, OR, USA. Presentations of the planned research were given at the retirement communities and interested people were encouraged to sign up for the study. The inclusion criteria based on the first visit were: (1) older than 55 years and (2) stable health (i.e., no health issues that would interrupt testing). Study participants who were severely visually or hearing impaired were excluded. Baseline MMSE scores were required to be greater than 22. When cognitive status of the participants was assessed using the MMSE (Folstein et al. 1975), all but one had MMSE score greater than 23 which corresponds to the cutoff score for cognitively healthy people (Howieson et al. 2003). The participant with an MMSE score of 22 performed well on the other cognitive tests from the original screening (Berteau-Pavy et al. 2007) and was included in the testing sessions. All study participants provided informed consent for participation. All procedures were approved by the institutional review boards of Oregon Health and Science University and the retirement communities. The group of study participants consisted of 27 elderly (age range 62–92; six men and 21 women; all Caucasian), all of whom completed sessions 1–4.

There were four longitudinal sessions; baseline (session 1), 6 months (session 2), 18 months (session 3), and 40 months (session 4). An additional session was completed 12 months after session 4. Participants were invited for all sessions, but three did not return for the final session. The MMSE was administered at sessions 1, 3, and 4, and the NINL task was administered at all sessions. The sequence of testing was the MMSE, NINL (immediate), health questionnaire, and NINL (delay; (Berteau-Pavy et al. 2007; Haley et al. 2010)). Logical memory and word recall were performed in the final session [session 5; (Wechsler 1997)] following the NINL test. CDR, CDR sum of boxes, CCDR, and CCDR sum of boxes scores of the subjects were obtained from a collaborative study with the Oregon Center for Aging and Alzheimer’s Disease Center during the same time period as session 5. The health questionnaire consisted of questions involving changes in the participant’s health over the course of testing which could alter performance in the cognitive tasks, including any hospitalizations, surgeries, or major life altering events (i.e., the death of a spouse or loved one).

The NINL has been previously described in detail (Berteau-Pavy et al. 2007; Haley et al. 2010; Rizk-Jackson et al. 2006). Briefly, a set of 12 panels, each containing three images in three of the four quadrants (one quadrant was empty) were presented to the participants to memorize. After no delay (immediate), participants were presented with a second set of panels that were either the same (no change), contained one novel image (novel image), or contained an image moved to the empty quadrant (novel location). Then after a delay (5 min), the participants were presented with a third set of panels that were either the same (no change), contained one novel image (novel image), or contained an image moved to the empty quadrant (novel location). The sequence of the panels was different in the second and third sets. The participants were asked to correctly identify if there was a change, and if so how the panel changed (novel image/novel location) and what was the location of the change (quadrant).

Analysis of cognitive tests

A total of 12 points could be earned for NINL immediate and delay scores, for a total of 24 points per session. NINL subscores were calculated for no change (NC), novel image (NI), and novel location (NL). The NC subscore reflects the ability to correctly identify the panels from the reference set. The NI and NL subscores reflect the ability to identify the novel image or novel location of a familiar image, respectively. These scores were analyzed for each session (8 points maximum for each subscore) and compared across sessions.

Statistical analyses

Separate repeated measures ANOVAs were used to analyze MMSE score, NINL total score, and each NINL subscore scores over time (score × session), with Tukey’s post hoc tests. Because all subjects did not complete all five sessions, only sessions 1–4 were used for the repeated measures, while correlations were made with the session 5 subset of participants. Pearson correlations were assessed for NINL individual scores over the different sessions as well as MMSE and NINL within each session. To protect against multiple comparisons, an ANOVA was run on the session scores with Bonferroni’s corrections. Pearson correlations were also used to assess correlations between NINL and word list recall and logical memory in the final session. Spearman correlations were used to assess correlations between total NINL scores and CDR scores, CDR sum of boxes, CCDR, and CCDR sum of boxes. Student’s t test was used for change in scores between session comparisons. SPSS 16.0 (SPSS Inc, Chicago, IL) and Prism (Graphpad Software, La Jolla, CA) software was used for all statistical analyses. Significance was considered at P < 0.05.

Results

Mini-mental state examination

Between the first, third, and fourth session, there was no difference in MMSE scores of the participants (P = 0.4; Fig. 1a). Similarly, there was no difference in individual MMSE scores calculated as percentage of the baseline score (P = 1.0; Fig. 1a inset). There were no significant correlations in MMSE scores between any of the sessions, consistent with other findings studying this time period in the elderly (Crivello et al. 2010; Mozaz et al. 2010; Schwingel et al. 2009).

Fig. 1
figure 1

a Mini-mental state examination (MMSE) scores for session 1 (baseline), session 3 (18 months), and session 4 (40 months). Inset MMSE scores expressed as a percentage of individual baseline scores. b Novel image–novel location (NINL) scores for all sessions. Inset NINL scores expressed as a percentage of individual baseline scores. ***P < 0.0001 compared to session 1

Novel image–novel location

NINL total scores significantly decreased over the four sessions [F(3, 107) = 4.97; P = 0.003; Fig. 1b]. Similarly, when analyzed as a percentage of baseline, individual NINL total scores decreased [F(3, 107) = 10.31; P = 0.0002; Fig. 1b inset]. Moreover, each of the three subscores significantly decreased over the sessions; NC [F(3, 107) = 4.8, P = 0.004], NI [F(3, 107) = 4.2, P = 0.007], and NL [F(3, 107) = 13.07, P < 0.0001; see Table 1]. Similar results were found when the subscores were analyzed as a percentage of baseline subscores; NC [F(3, 107) = 5.42, P = 0.007] and NL [F(3, 107) = 11.18, P < 0.0001]; however, the NI subscore expressed as a percentage of baseline subscore was not found to be significantly changing over time (P = 0.37).

Table 1 NINL subscores in all sessions

One-way ANOVA, with Bonferroni’s corrections, of the sessions demonstrated a significant difference in session scores [F(3, 107) = 4.8, P = 0.003]. The NINL total scores in each of the sessions significantly correlated (Fig. 2). NINL scores in session 1 correlated with those in session 2 (r 2 = 0.50, P < 0.0001), in session 3 (r 2 = 0.43, P = 0.0001), in session 4 (r 2 = 0.44, P = 0.0001), and in session 5 (r 2 = 0.22, P = 0.01). In addition, NINL total scores in session 2 correlated with those session 3 (r 2 = 0.50, P < 0.0001), session 4 (r 2 = 0.54, P < 0.0001), and session 5 (r 2 = 0.29, P = 0.004). NINL total scores in session 3 correlated with those in session 4 (r 2 = 0.67, P < 0.0001) and session 5 (r 2 = 0.29, P = 0.003). Finally, NINL total scores in session 4 correlated with those in session 5 (r 2 = 0.50, P < 0.0001).

Fig. 2
figure 2

Representative correlations of total novel image–novel location (NINL) scores between sesions. a Session 1 NINL total scores correlated with those in session 2 (r 2 = 0.50, P < 0.0001). b Session 2 NINL total scores correlated with those in session 3 (r 2 = 0.50, P < 0.0001). c Session 3 NINL total scores correlated with those in session 4 (r 2 = 0.67, P < 0.0001). d Session 4 NINL total scores correlated with those in session 5 (r 2 = 0.50, P < 0.0001)

The NINL total scores between session 1 and session 3 decreased 2.1 ± 0.7 points while between session 1 and session 4 the scores decreased significantly more, 5.1 ± 0.8 points (t = 2.6; P = 0.01). There were individual differences observed in the change in scores between session 1 to session 3 and session 1 to session 4. Six out of 27 subjects did no change their score while 2 out of 27 decreased their score substantially more than the mean change in score, having changes in scores two times the standard deviation higher than the mean.

Compared to NINL total scores, there were fewer correlations between NINL subscores in the different sessions (Table 2). The NC subscores in session 3 correlated with those in session 4 ( =0.27, P = 0.005; data not shown). For the NI subscores, correlations were observed between few sessions. The NI subscores in session 2 correlated with those in session 3 (r 2 = 0.34, P = 0.002) and session 4 (r 2 = 0.32, P = 0.002). In addition, the NI subscores in session 3 correlated with those in session 4 (r 2 = 0.37, P = 0.0008). The NL subscores in session 1 correlated with those in session 2 (r 2 = 0.22, P = 0.006; Table 2) and session 3 (r 2 = 0.37, P = 0.0004). The NL subscores in session 2 also correlated with those in session 3 (r 2 = 0.42, P < 0.0001).

Table 2 Correlations (r values) of NINL total scores and subscores between sessions

Novel image–novel location and mini-mental state exam

Significant correlations were observed between the MMSE and NINL scores at sessions 1 (r 2 = 0.28, P = 0.004), 3 (r 2 = 0.23, P = 0.02), and 4 (r 2 = 0.21, P = 0.02).

NINL scores and logical memory

NINL total score significantly correlated with logical memory recall delay (r 2 = 0.51, P = 0.0008, Fig. 3), but only showed a trend towards significance with logical memory recall immediate (P = 0.09). Logical memory immediate and delayed scores significantly correlated with NINL (r 2 = 0.43, P = 0.003).

Fig. 3
figure 3

Scores from the final session (session 5). a Novel image–novel location (NINL) total score in session 5 correlated with logical memory (r 2 = 0.51, P = 0.0008). b NINL total score in session 5 correlated with word list acquisition (r 2 = 0.43, P = 0.003). c NINL total score in session 5 correlated with word lists delayed recall (r 2 = 0.23, P = 0.02)

NINL scores and word list recall

Significant correlations were observed between NINL total score and word list acquisition (r 2 = 0.37, P = 0.002), as well as word list delayed recall (r 2 = 0.23, P = 0.02) (Fig. 3). Word list acquisition and delayed recall correlated as well (r 2 = 0.18, P = 0.04).

NINL scores and CDR measures

Significant negative correlations were observed between total NINL scores, NI subscores, and NL subscores and CDR, CDR sum of boxes, CCDR, and CCDR sum of boxes (Tables 3 and 4). The negative correlations between the CCDR and CCDR sum of boxes and the total NINL scores, NI subscores, and NL subscores were identical to those in the CDR and CDR sum of boxes and therefore are not repeated in the table. In contrast, the NC subscores did not significantly correlate with CDR, CDR sum of boxes, CCDR, and CCDR sum of boxes (P = 0.5).

Table 3 NINL total scores negatively correlate with clinical dementia rating scales
Table 4 CDR and sum of boxes scores compared to NINL scores

Discussion

This study shows the NINL test–retest reliability in elderly nondemented humans over a 5-year period. Although the sessions were not equally spaced, the scores between the sessions correlated significantly, indicating that the performance over time was measured similarly among all participants. In addition, total NINL scores significantly decreased over this period while the MMSE scores did not. These results were also found when individual NINL and MMSE scores were analyzed as a percentage of baseline scores. The frequency of testing (four times in 4 years) could have hyper-inflated the test scores, especially the MMSE, yet, this type of testing occurs readily in the clinic to assess cognitive function in elderly patients. Although there were significant correlations observed between MMSE and NINL within the sessions, the strength of the correlation was weak at session 1 and decreased in correlation strength with each successive session. These results indicate that the NINL test is potentially more sensitive than the MMSE to detect age-related cognitive decline. In addition, the NINL total score correlated with logical memory recall and word list recall, two tests with low scores associated with dementia (Der et al. 2010; Sachdev et al. 2009; Villemagne et al. 2008). For both the logical memory and word list recall test, the immediate and delayed scores significantly correlated, indicating that the group of participants had a similar transition from the “learning” or immediate parts to the “memory” or delayed recall parts of the test. Moreover, CDR, CDR sum of boxes, CCDR, and CCDR sum of boxes correlated with NINL scores, indicating that the amount of cognitive decline as measured by multiple domains are not all related to neuropsychological testing (Berg 1988). Taken together, these results support the building hypothesis that the NINL test could be a valuable tool for identifying preclinical cognitive decline.

While there were some significant correlations between the sessions, NINL subscores did not correlate across sessions as strongly as NINL total scores did. One potential reason for this might be the difference in spread between scores of the total and subscores. While for each session the maximal total score possible was 24 points, the maximal subscore was 8 points. The reduced spread of possible scores might have contributed to stronger correlations in NINL total scores than NINL subscores. Within the subscores, the NL and NI subscores correlated stronger than the NC subscores did. This might be due to the fact that the NC response might be a default when the participants do not recall the correct answer (Haley et al. 2010).

Although there was a significant decrease in the NI subscores, the significance disappeared when the data was analyzed as a percentage of baseline score, suggesting an inconsistent decrease of NI score across subjects. Moreover, the NI subscores did not correlate with session 1, but significant correlations were observed between sessions 2, 3, and 4. However, the longer the time between the sessions, the weaker the correlation was. Similarly, the NL subscores correlated between sessions 1, 2, and 3, but there was no correlation with session 4. Examining the scores across the sessions, the NL subscores showed a greater decline between sessions 3 and 4. This decline might contribute to the fact that no correlations were found between these two sessions. Novel location recognition is particularly interesting, as it involves a spatial component and distinguishes NINL from other object recognition tests. Since spatial memory shows a rapid decline in aging in humans (Iachini et al. 2009), nonhuman primates (Haley et al. 2009; Rapp et al. 1997), and rodents (Benice and Raber 2009; Benice et al. 2006; Siegel et al. 2011), it might have contributed to the sharp decrease in the NL subscore between sessions 3 and 4.

In the cohort tested, the change in NINL total score between session 1 to session 3 and session 1 to session 4 was significantly different. This could be attributed to individual differences, which might provide insight into how the NINL might predict cognitive decline. For instance, in session 4 six of the 27 participants had no difference in their score, suggesting they are “super-agers” and not experiencing cognitive decline. On the other hand, in session 4 two out of the 27 participants had a change in score that was greater than two standard deviations over the mean, suggesting they could be at a higher risk for mild cognitive impairment.

Some cognitive tasks are sensitive to level of education, as many of the tasks require verbal memory, creating a bias towards highly educated individuals. Although education level does affect cognitive decline, the effect is not as straightforward as once thought (Leibovici et al. 1996). Many tests used have a language or a reading component. Lower educated individuals and non-native English speakers might have more difficulty with such tests and therefore score lower. Even though the scores might be lower, depending on the highest education attained, the “lower” score might be deflated. The NINL task is based on object recognition tests in rodents and does not necessarily require verbal memory, but rather visuospatial memory. Visuospatial memory declines rapidly in early Alzheimer’s disease stages and is hypothesized to be a predictor of progression of mild cognitive impairment to Alzheimer’s disease (Iachini et al. 2009). Using visuospatial memory instead of verbal memory, the NINL task is less biased towards highly educated individuals and native speakers and might be a valuable tool to assess cognition in a variety of individuals and populations in a clinical setting. Also, although the MMSE has been used for over 30 years in the clinic, it might not be the best determinant of normal cognitive aging as little difference was observed in MMSE score over the 4-year period assessed here. Prior to any clinical manifestations or ability to detect a cognitive change using the MMSE, NINL might help identifying those at risk to develop major cognitive decline, including mild cognitive impairment, Alzheimer’s disease, or other dementias. Earlier detection with tests such as the NINL would allow starting therapeutic strategies earlier, which is critical for beneficial treatment effects (Rountree et al. 2009).