Computer-based tools for assessing micro-longitudinal patterns of cognitive function in older adults

Brown, Laura J. E.; Adlam, Tim; Hwang, Faustina; Khadra, Hassan; Maclean, Linda M.; Rudd, Bridey; Smith, Tom; Timon, Claire; Williams, Elizabeth A.; Astell, Arlene J.

doi:10.1007/s11357-016-9934-x

Computer-based tools for assessing micro-longitudinal patterns of cognitive function in older adults

Original Article
Open access
Published: 29 July 2016

Volume 38, pages 335–350, (2016)
Cite this article

Download PDF

You have full access to this open access article

AGE Aims and scope Submit manuscript

Computer-based tools for assessing micro-longitudinal patterns of cognitive function in older adults

Download PDF

Laura J. E. Brown¹,
Tim Adlam²,
Faustina Hwang³,
Hassan Khadra⁴,
Linda M. Maclean⁵,
Bridey Rudd⁶,
Tom Smith⁷,
Claire Timon⁸,
Elizabeth A. Williams⁹ &
…
Arlene J. Astell^10,11

2861 Accesses
12 Citations
Explore all metrics

Abstract

Patterns of cognitive change over micro-longitudinal timescales (i.e., ranging from hours to days) are associated with a wide range of age-related health and functional outcomes. However, practical issues of conducting high-frequency assessments make investigations of micro-longitudinal cognition costly and burdensome to run. One way of addressing this is to develop cognitive assessments that can be performed by older adults, in their own homes, without a researcher being present. Here, we address the question of whether reliable and valid cognitive data can be collected over micro-longitudinal timescales using unsupervised cognitive tests.In study 1, 48 older adults completed two touchscreen cognitive tests, on three occasions, in controlled conditions, alongside a battery of standard tests of cognitive functions. In study 2, 40 older adults completed the same two computerized tasks on multiple occasions, over three separate week-long periods, in their own homes, without a researcher present. Here, the tasks were incorporated into a wider touchscreen system (Novel Assessment of Nutrition and Ageing (NANA)) developed to assess multiple domains of health and behavior. Standard tests of cognitive function were also administered prior to participants using the NANA system.Performance on the two “NANA” cognitive tasks showed convergent validity with, and similar levels of reliability to, the standard cognitive battery in both studies. Completion and accuracy rates were also very high. These results show that reliable and valid cognitive data can be collected from older adults using unsupervised computerized tests, thus affording new opportunities for the investigation of cognitive.

Effects of Excessive Screen Time on Neurodevelopment, Learning, Memory, Mental Health, and Neurodegeneration: a Scoping Review

Article 16 December 2019

Eliana Neophytou, Laurie A. Manwell & Roelof Eikelboom

Relationship of device measured physical activity type and posture with cardiometabolic health markers: pooled dose–response associations from the Prospective Physical Activity, Sitting and Sleep Consortium

Article Open access 13 March 2024

Matthew N. Ahmadi, Joanna M. Blodgett, … Emmanuel Stamatakis

The impact of social activities, social networks, social support and social relationships on the cognitive functioning of healthy older adults: a systematic review

Article Open access 19 December 2017

Michelle E. Kelly, Hollie Duff, … David G. Loughrey

Introduction

Age-related changes in cognitive function can be examined over various timescales. Most commonly, change is considered over relatively long periods, such as months or years, for instance when monitoring the rate of decline associated with neurodegenerative conditions (Patterson et al. 1999) or to examine improvements in function following an intervention (Antunes et al. 2015). At the other end of the continuum, moment-to-moment variability in performance (i.e., over seconds or minutes) can be assessed using indices such as the standard deviation of reaction times in speeded response time tasks (Jensen 1992). Such moment-to-moment variability is known to increase with age (Li et al. 2004) and has been associated with diverse health and functional outcomes, including increased risk of falling (Graveson et al. 2015), everyday behavioral mistakes (Steinborn et al. 2015), as well as future mortality (Shipley et al. 2007) and cognitive decline (MacDonald et al. 2003).

Variability in cognitive function can also be considered at intermediate or “micro-longitudinal” (Palmier-Claus et al. 2011) timescales, such as over hours or days. For instance, diurnal variability in cognitive processing is a robust phenomenon in all age groups (Baddeley et al. 1970). Importantly, the extent of this variability is known to increase with age (May et al. 1993) and in people with cognitive impairment (Paradee et al. 2005), indicating its relationship with health status. Fluctuations in cognitive performance over periods of hours or days are also characteristic of some acute health conditions, such as delirium (American Psychiatric Association 2013), and with physiological changes, such as altered levels of ammonia (Balata et al. 2003) and glucose (Somerfield et al. 2004) in the blood, demonstrating the importance of micro-longitudinal changes as general health indicators. Furthermore, the physiological mechanisms underlying micro-longitudinal patterns are believed to differ from those underpinning shorter-term, moment-to-moment variability (Schmiedek et al. 2013), and so may offer unique information about the mechanisms underlying age-associated changes in cognition and health (Gamaldo and Allaire 2015).

Compared with research over macro (i.e., months-years) or moment-to-moment timescales, knowledge of the nature and relevance of micro-longitudinal patterns of function are relatively limited (Gamaldo and Allaire 2015; Schmiedek et al. 2013). One reason for this is the practical difficulties associated with performing repeated cognitive assessments. First, there are the general issues associated with all repeated cognitive testing, such as accounting for practice effects (Bird et al. 2004) and producing multiple sets of equivalent stimuli (Sullivan 2005). However, more challenging is the high density of assessments needed to track patterns of performance over multiple sessions, which can result in high levels of burden and inconvenience to participants, and cost to the researcher or clinician. These issues are further multiplied in studies that involve the monitoring of additional health or behavioral variables, such as when investigating their temporal associations with cognitive change. To advance our understanding of micro-longitudinal patterns of function, there is therefore a need for assessment methods that enable repeated measures of cognition to be taken over periods of hours and days, and that place low burden on participants, researchers, and clinicians.

In order to address this, we developed the Novel Assessment of Nutrition and Ageing (NANA) toolkit, which is touchscreen-based software for tracking cognitive function, as well as other health and behavioral domains, across micro-longitudinal timescales (Astell et al. 2014). To minimize the cost and burden of micro-longitudinal assessment, the NANA system was specifically developed with older participants, for them to use in their own homes, without a researcher being present. The cognitive tasks were designed to be particularly sensitive to cognitive processing speed, which is known to be indicative of a broad range of health and well-being outcomes in later life (Lara et al. 2013). Age-related declines in processing speed have also been shown to account for a large proportion of variance in other cognitive tasks (Tucker-Drob 2011), and thus provide an efficient way of gathering informative indicators of cognitive change.

Self-administered computerized tests have already shown promise as a feasible way of collecting valid, single assessments of cognitive function in older adults in experimental situations (Tierney et al. 2014). However, the performance of such tests has not yet been examined over micro-longitudinal time periods, in unsupervised settings. In this paper, we therefore address the question of whether it is possible to collect reliable and valid cognitive data over micro-longitudinal timescales, without a researcher being present. We do this by assessing the performance of two NANA cognitive tasks under both controlled and naturalistic conditions. In study 1, we assessed the usability, validity, and reliability of the NANA cognitive tasks when administered in a supervised, laboratory-based environment, but with minimal researcher involvement. In study 2, we assessed the performance of these tasks when used by older adults, unsupervised, in their own homes to collect data over micro-longitudinal timescales. The validity of the tasks as measures of age and health-relevant cognitive function was assessed by examining the extent to which performance on the NANA cognitive tasks correlated with performance on standard tests of cognitive processing speed, as well as tests of higher cognitive functions (episodic memory and executive function), and participant age. Reliability was determined by examining correlations and changes in performance over time.

Study 1

Methods

Participants

Forty-eight community-living adults (17 males) aged 65–89 years (mean = 72 years) provided written informed consent to participate in this study, which had been approved by the Fife and Forth Valley Committee on Medical Research Ethics (Ref: 08/S0501/104) and the University of St Andrews Teaching and Research Ethics Committee.

NANA cognitive function tasks

Two touchscreen tasks (the Shopping List task and the Squares task) were programmed in Embarcadero Delphi 2010 and administered on a 15″ touchscreen computer (Asus EeeTop, model ET1610PT). The Shopping List task was designed to draw on a broad range of cognitive functions that are known to be markers of age and health. In particular, the task was modeled on principles of symbol substitution tasks that require participants to use a digit-symbol pairing key to identify the corresponding symbols for a series of stimuli as quickly as they can (Lezak et al. 2012). Performance on these tasks is believed to depend on a range of cognitive functions, including attention (Strauss et al. 2006) and processing speed (Deary et al 2010).

At the start of the Shopping List task, the instruction to “Report what is on the shopping list as quickly as you can” was presented on the screen. The instruction remained on the screen until the participant touched a box containing the word “start.” Following this, a screen containing a “shopping list” in the top right quadrant of the screen, and four response boxes (containing the numbers 2–4) along the bottom of the screen, was presented. The shopping list was a white box containing the names of four items (apples, carrots, lemons, onions), each preceded by one of the numbers 2, 3, 4, or 5. The order of the four items, and of the numbers preceding them, was randomly determined each time the task was administered, and then remained the same for the duration of the task.

After a 1000-ms delay, the first trial was presented. For this, a white box containing the question stem “How many” with an empty box below it was first presented in the top left quadrant of the screen for 1000 ms. The name of one of the items on the shopping list (appended with a question mark) was then presented in the box underneath the question stem. This display remained on screen until the participant touched one of the response boxes at the bottom of the screen (or for a maximum duration of 10 s if no response was made). Following a response, the question text and surrounding box were removed from the screen for 1000 ms, and then, the second trial began. The shopping list and response boxes remained on screen for the duration of the task. An example of a Shopping List task trial is shown in Fig. 1.

As there have been no previous published examples of symbol substitution tasks that require touchscreen responses, we created two different response option formats so that we could determine which format led to the best psychometric test properties. In one version of the task, the response options were presented in an ascending order (i.e., 2, 3, 4, 5), and in the other version, they were presented in a random order. For the random order version, a new random order was created each time the task was administered, and the same random order was then retained for the duration of the task. Each participant completed 20 trials of each version of the task. The order in which participants completed these two versions was counterbalanced between participants so that psychometric properties of each version could be compared with one another. In each 20-trial iteration of the task, each of the four items on the shopping list (apples, carrots, lemons, onions) was presented five times. The order in which the items were presented was randomly determined, with the restriction that the same item was never presented twice in succession. This was to minimize confusion to participants from being asked the same question twice in a row.

Performance on the Shopping List task was assessed according to the accuracy of responses (i.e., the proportion of correct responses made) and average response time of correct responses in each session. Median rather than mean response times were calculated for each participant in order to minimize the effects of extreme values (Jensen 1992).

The Squares task was a speeded choice response time task, a measure of cognitive processing speed (Deary et al. 2010). This task was designed to be simpler to understand than the Shopping List task in case participants struggled to complete the Shopping List task in unsupervised settings.

At the start of the Squares task, the instruction “Touch the boxes as quickly as you can” was presented on the screen. The instruction remained on screen until the participant touched a box containing the word start. The first trial then began. For this, a black fixation cross was presented in the center of the screen for 1500 ms. The fixation cross then disappeared, and a gray box (containing a black square) was presented in one of the four locations along the bottom of the screen. The four possible locations and the sizes of the response boxes were the same as those presented in the Shopping List task. The response box disappeared after it had been touched (or after a maximum duration of 10 s, if no response was made). The next trial then began with the fixation cross again being presented. Figure 2 shows a schematic example of a trial in the Squares task.

Each participant performed 20 trials of the Squares task. In each 20-trial session, each of the four locations was presented five times. The order in which the response boxes were presented was randomly determined, with the restriction that the same location was not presented more than three times in succession. Performance was assessed by calculating median response times for each session.

Battery of standard measures

A battery of standardized cognitive tests was also administered so that the concurrent validity of the NANA tasks could be established. Four of these tests provided measures of processing speed, which is considered fundamental to many other higher-order cognitive functions (Tucker-Drob 2011) and to be sensitive to age-related change (Lara et al. 2013). These were as follows:

(1)
A computerized Speeded Reaction Time task adapted from the PEBL battery (Mueller 2009), in which participants were asked to make a speeded keyboard key press each time they saw a black cross in the middle of the computer screen. After two short practice blocks, participants performed two blocks of 15 trials each. The time between a response being made and the next stimulus being presented varied from 1400 to 3200 ms. Performance was measured as the median response time of responses made within the valid time window of 150–3000 ms across the two blocks.
(2)
The Symbol Digit Modalities Test (SDMT: Smith 1982), in which participants were required to write the corresponding number for each of a series of abstract symbols, according to a number-symbol key printed at the top of the page. The number of correct responses made in a 90-s period was recorded.
(3)
A Number Copy Task, in which participants were asked to simply copy randomly generated sequences of the digits 1–9. This task was scored according to the number of correct responses made in 30 s.
(4)
Part A of the Trail Making Test (Spreen and Strauss 1998), in which participants are asked to join together numbered circles as quickly as they can. This task was scored as the time taken to correctly join all 25 circles, with any mistakes being called to the participant’s attention by the researcher during task performance.

Executive functions were assessed using three measures.

(1)
Part B of the Trail Making Test (Spreen and Strauss 1998) was used to assess the task-switching component of executive function. In this task, participants are asked to join together a series of numbered and lettered circles, alternating between numbers and letters. This task is scored in the same way as part A, with shorter completion times indicating better performance.
(2)
A forwards and backwards digit span task (Lezak et al. 2012) was used as a measure of working memory. For this, the length of the string started at two digits, and then increased by one digit every two trials to a maximum length of nine digits for the forward span task, and eight for the backward span task. The task was discontinued if the participant failed both items of a given string length. The number of correct responses made to the forwards and backwards task was summed together to give a total digit span score.
(3)
A Stroop task (Stroop 1935) was administered to assess inhibitory executive functions. This task was administered in three parts: first participants were given a sheet containing 16 rows of 6 rectangles, each colored red, blue, or green, and were asked to name the color that each rectangle was printed in as quickly as they could. For the second part, the rectangles were replaced with the neutral words “when,” “and,” and “hard,” and participants had to name the color that the words were printed in. In the third part, the neutral words were replaced with the color words “red,” “blue,” and “green,” which were always incongruent with the color that the words were printed in. The number of correct responses made in 30 s was recorded for each part. A measure of interference was then calculated for each participant by dividing the number of correct responses made in the third part by the number made in the second part. Lower interference scores therefore indicate a higher amount of interference.

Verbal episodic memory was assessed using a word recall task. For this, 15 words from the Rey Auditory-Verbal Learning Test (Lezak et al. 2012) were read aloud three times, and the participant was asked to recall as many words as possible each time. A score for immediate recall was calculated by summing the number of words correctly recalled on each of the three occasions. After a delay of approximately 20 min, the participant was again asked to recall as many of the words as possible. The number of words correctly recalled on this occasion was recorded as the delayed recall score.

Two additional tests were included as measures of global cognitive function and prior cognitive ability, respectively. The Mini Mental Status Examination (MMSE: Folstein et al. 1975) contains a series of brief tasks designed to screen for cognitive impairment. It is scored out of 30, with scores below 24 generally taken as an indicator of potential impairment (Iverson 1999). The National Adult Reading Test (NART: Nelson, 1982) requires participants to read aloud a series of 50 words with irregular pronunciations, providing a proxy measure of reading ability that is indicative of prior intellectual functioning (Crawford et al. 2001). The task is scored according to the number of errors made, and performance has been shown to be relatively resistant to dementia (McGurn et al. 2004) and short-term cognitive disturbance (Brown et al. 2011).

Procedure

Each participant was invited to attend three separate, individual testing sessions, over a week-long period, so that the validity and reliability of the tasks over micro-longitudinal timescales could be established. In the first session, participants provided demographic details, as well as details about their current use of computers, and completed the short-form Geriatric Depression Scale (Sheik and Yesavage 1986). Participants then completed the NANA tasks and standard measures of cognitive function. The order in which participants completed the NANA and standard measures was counterbalanced between participants in order to allow performance on the two sets of tasks to be fairly compared with one another.

Prior to starting the NANA tasks, participants completed a brief process of familiarization with the touchscreen by undergoing a series of practice operations that involved making touchscreen responses to on-screen instructions. When completing the NANA tasks, participants were asked to follow the simple instructions on screen and make their responses by touching the appropriate part of the screen. As the tasks were being developed for future unsupervised use, the researcher who administered the tasks minimized additional contact with the participant while they completed the tasks, and only provided additional clarification or reassurance when absolutely necessary.

In addition to the NANA tasks described above, each participant also completed a number of other short touchscreen measures of cognition, mood, and appetite that were also being considered for inclusion in the NANA system during the session. The additional cognitive tasks were not selected for further development, and so are not reported here. The validation of the mood and appetite measures is reported elsewhere (Brown et al. 2016).

The NANA tasks and a subset of the standard measures that were suitable for repeated testing (see Table 3) were repeated on each of the subsequent two testing sessions. All participants received a commemorative study mug at the end of their first session, as well as a £5 (approximately $7.5) expense payment for each session they completed.

Data analysis

As a number of cognitive tasks produced data that were ordinal, not normally distributed, and/or had outliers, non-parametric tests of correlation and difference were used for all analyses. Kendall’s Tau tests were used rather than Spearman’s rank to assess correlations as the former are better suited to data containing several tied ranks (Field 2013), which was the case with a number of the variables. Correlation values produced by Kendall’s Tau tests tend to be lower than those of Spearman’s rank due to the different way in which they are calculated (Capéraà and Genest 1993).

In order to assess the concurrent validity of the NANA tests, the degree of correlation between participants’ performance on each of the NANA tasks and the standard cognitive battery in the first testing session was calculated. In order to assess test-retest reliability of the NANA tasks, the degree of correlation between performance across sessions was calculated. As we were expecting cognitive function to vary over micro-longitudinal timescales, it is not possible to assess reliability from these values alone. Therefore, for comparability, between-session correlations were also calculated for each of the standard cognitive function tasks that were administered on each testing session. Friedman tests of difference were also performed for each task to determine whether any change in performance occurred over the three testing session, for instance due to practice effects.

Results

Participant characteristics

A procedural error meant that two participants used a different model of computer to perform the NANA tasks, and so, their data were excluded from these analyses. The remaining 46 participants (16 males) had a mean age of 72 years (SD = 5.9). Their mean MMSE score was 28.1 (SD = 1.98), and their mean GDS score was 1.32 (SD = 1.73), indicating low levels of cognitive impairment and depression. Two of these participants (both female) were not able to attend a third testing session within the time period of the study, and so only contributed data to the first and second testing sessions.

Education levels among participants were generally high: 46 % were educated to degree level or above, a further 22 % held professional or semi-professional qualifications, 15 % were educated up to the equivalent of A-level, 11 % up to the equivalent of GCSE, and just 7 % held no educational qualifications. Self-reported levels of computer use were also high, with the majority of participants (74 %) reporting using them on most days, and a further 15 % using them up to 5 days per week. In response to a question asking how competent they felt when using computers without assistance, five participants (11 %) selected the option “very,” 23 (50 %) selected “fairly,” 11 (24 %) selected “a little,” and just seven (15 %) selected “not at all.”

NANA task performance

Only two participants failed to respond to a single trial of the Shopping List task (one during the second testing session and one during the third), and no participant failed to respond to any trials in the Squares task. The accuracy of participants’ responses was very high in both versions of the Shopping List task and did not differ across testing sessions (Table 1).

Table 1 Accuracy rates for the ascending and random order versions of the Shopping List task in each testing session of study 1

Full size table

The results of the correlation analyses between performance on all of the NANA and standard cognitive tasks and participant age are shown in Table 2. They show that performance on each of the NANA tasks correlated significantly with almost all of the standard tasks of cognitive function. Of the NANA tasks, the random version of the Shopping List task showed the strongest pattern of correlation with the standard cognitive function tasks. As expected, performance on this task was particularly well correlated with performance on the Symbol Digit task, indicating high levels of similarity in the cognitive operations involved. Correlations between the NANA tasks and the NART measure of prior cognitive ability were generally lower than with the measures of current cognitive function, indicating that the NANA tasks were better measures of current, rather than prior, cognitive ability. As with most of the standard cognitive tasks, all of the NANA tasks also correlated with age, showing their sensitivity to age-related change.

Table 2 Kendall’s Tau correlation coefficients between session 1 performance on the NANA tasks, standard cognitive battery, and participant age in study 1

Full size table

The results of the reliability analyses are shown in Table 3. They show that cross-session correlations in performance were significant for all of the NANA and standard cognitive tasks. The strength of the correlation coefficients for the NANA tasks were similar in magnitude to those of the speeded reaction time task, indicating comparable levels of reliability.

Table 3 Mean performance levels for the NANA and standard cognitive tasks that were administered in all three testing sessions in study 1

Full size table

Some of the NANA tasks and standard cognitive tasks also showed evidence of significant improvements in performance across the sessions. Pairwise Wilcoxon signed rank tests (one-tailed, uncorrected for multiple comparisons) showed that, for the ascending version of the Shopping List task and the Squares task, there were significant improvements in performance between sessions 2 and 3 (Z = 3.18, p = 0.001; Z = 3.20, p < 0.001, respectively), but not between sessions 1 and 2 (Z = 1.21, p = 0.12; Z = 0.14, p = 0.45, respectively). For the Stroop task, significant reductions in interference were seen between sessions 1 and 2 (Z = 2.03, p = 0.02), but not between sessions 2 and 3 (Z = 1.43, p = 0.08). For the Symbol Digit task, significant improvements were seen between sessions 1 and 2 (Z = 3.67, p < 0.001) and between sessions 2 and 3 (Z = 3.25, p < 0.001). There were no significant changes in performance across sessions for the random version of the Shopping List task (Table 3).

Discussion

Both the Shopping List and the Squares task show validity as reliable tests of processing speed, which has been considered a biomarker of cognitive aging (Deary et al. 2010). There were also significant associations with measure of executive function and verbal episodic memory, perhaps reflecting the fundamental role of processing speed in these higher cognitive abilities (Tucker-Drob 2011), as well as with participant age. Although there was evidence of practice effects for the ascending version of the Shopping List task and the Squares task, no significant practice effects were observed for the random version of The Shopping List task. This task also showed larger correlations with the standard cognitive tasks and participant age than the other NANA tasks, making it the psychometrically strongest of the three.

Study 2

The aim of study 2 was to examine the performance of the two NANA cognitive tasks when used by participants, in their own homes, without a researcher being present, over micro-longitudinal timescales. This was done as part of a larger validation study of the entire NANA toolkit, which included computer-based measures of participants’ dietary intake, mood, appetite, grip strength, physical activity, and exhaustion (Astell et al. 2014).