Verbal and non-verbal recognition memory assessment: validation of a computerized version of the Recognition Memory Test

Background The use of computerized devices for neuropsychological assessment (CNADs) as an effective alternative to the traditional pencil-and-paper modality has recently increased exponentially, both in clinical practice and research, especially due to the pandemic. However, several authors underline that the computerized modality requires the same psychometric validity as "in-presence" tests. The current study aimed at building and validating a computerized version of the verbal and non-verbal recognition memory test (RMT) for words, unknown faces and buildings. Methods Seventy-two healthy Italian participants, with medium–high education and ability to proficiently use computerized systems, were enrolled. The sample was subdivided into six groups, one for each age decade. Twelve neurological patients with mixed aetiology, age and educational level were also recruited. Both the computerized and the paper-and-pencil versions of the RMT were administered in two separate sessions. Results In healthy participants, the computerized and the paper-and-pencil versions of the RMT showed statistical equivalence for words, unknown faces and buildings. In the neurological patients, no statistical difference was found between the performance at the two versions of the RMT. A moderate-to-good inter-rater reliability between the two versions was also found in both samples. Finally, the computerized version of the RMT was perceived as acceptable by both healthy participants and neurological patients at System Usability Scale (SUS). Conclusion The computerized version of the RMT can be used as a reliable alternative to the traditional version.


Introduction
In the past few decades, the use of computerized neuropsychological assessment devices (CNADs) has received increasing attention in both clinical and research settings, because it is considered as a more advantageous alternative to conventional examiner-based tests [1].The interest in CNADs has exponentially increased in response to the COVID-19 pandemic, which has forced significant changes in clinical practice.Indeed, one of the properties of CNADs is to allow remote encounters between clinicians and patients [see 2 for review].Furthermore, focusing on the fact that neuropsychological tests on digital platforms are sensitive and cost-effective, several studies have already used these tools with many clinical populations [3].However, to date, "paper-and-pencil" tests remain the gold standard for neuropsychological assessment [4,5].The American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology, in a joint document, defined CNADs as any tool that administers, evaluates, or interprets tests of brain function using a computer, portable devices (such as digital tablets or smartphones), or any other digital interface instead of a human examiner [3].However, such a position statement, that emphasizes the administration of cognitive tests by a computer "instead of" a human examiner, can be extended to all computerized neuropsychological platforms, including a broader spectrum of ever-growing technology and, therefore, include tools that may allow a better presentation of the stimuli and data recording through computer presentation (such as immersive and non-immersive Virtual Reality) [4].As aforementioned, CNADs allow to present with extreme accuracy a wider range of tasks including, for example, those involving multi-tasking, multiple measurements, divided attention, speed processing and response times for multiple measurements.Furthermore, CNADs provide new parameters, more easily and quickly available [as, for instance, recording of response latencies, see 6], leading to greater accuracy in detecting cognitive changes [7][8][9], in both the clinical and research fields [10,11].Finally, computerized tests can be administered to many subjects in shorter times and with lower costs; the evaluation of the responses can be more precise, enabling automatic administration and scoring procedures, and thus reducing the probability of human error [3].On the other hand, in a conventional neuropsychological assessment, there can be a great deal of variation between different examiners or between the same examiner along an assessment session, in the manual use of tools (such as a stopwatch), or in the manual presentation of material (such as stimulus cards) [12].Furthermore, while in an examiner-centred paperand-pencil assessment the patient interacts with an examiner who presents stimuli, records responses, notes key behavioral observations and also understands the patient's level of effort and motivation, in CNADs, the patient interacts with a computerized test station or a tablet through one or more alternative input/output devices (keyboard, voice, mouse, joystick, touchscreen, head-mounted display), sometimes without any supervision or observation.Additionally, computerized assessment does not permit examiners to introduce flexibility into their evaluation or provide any structured encouragements to the examinee.For those reasons, the use of CNADs may require familiarity with technical skills that can significantly impact on relevant evaluation parameters.Patients, especially the elderly, may have limited familiarity with computer interfaces, as well as negative attitudes and anxiety about computers [13]: as a result, their performance may differ significantly from pencil-and-paper measurements [14][15][16].On these assumptions, computerized tests are not simply the replacement of paper-and-pencil for a computer screen and electronic response capture and they could not be considered tout court directly comparable to an examiner-administered evaluation [17][18][19].Rather, a traditional test administered by computer becomes a new test, different from the conventional one both qualitatively and technically.Indeed, several studies have proved that there are substantial differences between the computerized measurements and the corresponding examiner administered tests in different samples [20][21][22][23].In addition, the "replication crisis" [24] has increased the need for appropriate and robust statistical inference: when examiner-based tests are proposed in a computerized version, new psychometric data should be provided according to the same standards of psychometric tests.It cannot be assumed a priori that the normative data for a test administered by the examiner apply equally well to a computerized version of the same test, due to changes in the administration method and patients' familiarity with the computer.Thus, if a computerized test adapted from a conventional test is not simply a slightly different format for an existing test, it becomes essential that equivalence testing between the paper-and-pencil and the computerized versions of that test is provided.New normative data for computerized tests must be collected when no equivalence with the paperand-pencil version of the tests is proved.Several tests have already been adapted for computerized presentation, such as the Montreal Cognitive Assessment -MoCA [25], the Clock Drawing Test, the Pentagon Drawing Test [26], the Rey Complex Figure copy task [27], the Bells Cancellation, Line Bisection and Five Elements Drawing Tests [28]: these tests are, of course, critical in the assessment of cognitive functioning.
Among neuropsychological patients, memory problems are one of the most reported symptoms and, due to growing life expectancy, differentiating early neurodegenerative disorders from normal aging is for sure one of the greatest challenges for the future.Notably, despite the extensive body of literature supporting the use of recognition memory paradigms to enhance differential diagnosis between neurological patients, depressed ones and healthy controls [29], its clinical assessment is still limited due to the paucity of validated, clinical tools available.As for the Italian population, only the verbal and non-verbal recognition memory test (RMT) [30] and the visual long-term recognition memory test [31] are available.The RMT represents a relatively fast screening tool, which may be useful for the early detection of memory disorders; two parallel versions are also available for multiple testing [32].However, despite its promising features, the original examiner-centred version of the RMT may be prone to human error: clinicians are indeed required to control for several factors, namely the duration of the stimuli presentation and the registration of patients' responses and behaviours [see 30, methods section].For all these reasons, we decided to provide a validation of a computerized version of the verbal and non-verbal recognition memory test (RMT-Form A) for words, unknown faces, and buildings [32], comparing it to the results of the original paper-and-pencil test.To this end, a computerized version of the RMT was built and we administered both the pencil-and-paper and the computerized version to different age-balanced groups of healthy participants, with medium-high education.Furthermore, in order to evaluate the possible impact of different neurological conditions on patients' ability to interact with the computer interface, the two version of the RMT were also administered to a small sample of neurological patients with mixed aetiology.

Materials
The previously published battery of verbal and non-verbal recognition memory tests (RMT) [32] and its computerized version were used.The battery evaluates different components of recognition memory for words, unfamiliar faces, and unknown buildings.Stimuli selection and test construction are described elsewhere [30].Briefly, the original, pencil-and-paper version had the same administration procedure for the three subtasks (i.e., for words, faces and buildings): during the study phase, the 30 target stimuli were individually displayed at a 3-s.interval per item and, for each stimulus, participants were asked to judge whether they like it or not to enhance an adequate level of attention.In the recognition phase, performed immediately after the end of the study phase, three stimuli were presented simultaneously: the target stimulus and two distractors.Participants were asked to recognize, among the three items, the previously seen one.Targets were distributed randomly to the top, bottom, or centre of the sheet, ensuring an equal number of placements in the three positions.The number of correct answers corresponds to the total score of each subtask (range: 0-30).Overall, the test lasts about 20 min.
Here we present a computerized version of the RMT.The test was built on PsychoPy and the stimuli were shown to the participants using a MacBook Pro 13" (with a set screen resolution of 1680 × 1050).The stimuli were the same as in Smirni et al.'s [30] original version and, in the study phase, each stimulus was displayed at a 3-s.interval.
To ensure an adequate level of attention on each stimulus, participants were asked to press "1" on the keyboard if it produced any emotion or feeling (e.g.happiness, sadness, etc.), "0" if it did not produce any particular emotion or feeling (Fig. 1).With respect to the original one, in this computerized version, administration and scoring procedures are fully automatic: indeed, instructions are shown on the PC/laptop monitor and participants are invited to provide responses using some keys on the keyboard.Furthermore, after each subtest completion, an electronic sheet with all the answers provided by the participant, the raw score and its conversion into adjusted and equivalent scores is available.The system automatically applies the regression formulas of Smirni et al. [30] to the raw scores obtained by the participants, even in the case of minimum or maximum raw score (0/30 and 30/30 respectively).In agreement with Novelli et al. [33], in the former case the equivalent score is always zero, in the latter, the equivalent score is equal to four, disrespectfully from the foreseen automatic correction.
In addition, the participants were administered the Montreal Cognitive Assessment (MoCA) [34] to assess the general cognitive level, and the System Usability Scale (SUS) [35], which evaluates the subjective opinion from a user regarding a particular software.The SUS consists of 10 items in which evennumbered questions are posed in negative form.All the items need to be rated on a Likert scale ranging from 1 (completely disagree), to 5 (completely agree).

Methods
The two versions of the RMT (i.e., the original pencil-andpaper and the computerized versions) were administered by one of the authors (BF) to a sample of 72 healthy participants, in two testing sessions over two separate days, at least one week apart and within 10 days.The examiner was always present in the same room during the computerized administration of the test, but no further instructions or clarifications were requested from the examinees.The administration order of the two versions (first pencil-and-paper; first computerized), was counterbalanced across subjects as well as the administration order of the three subtasks.Inclusion criteria for participants were: i) age between 20 and 79 years; ii) at least 5 years of education.Exclusion criteria were: i) a history or clinical evidence of neurological or neuropsychiatric disease; ii) subjective complaint of memory or other cognitive impairments; iii) a Montreal Cognitive Assessment (MoCA) adjusted score below the Italian cut-off score [34].The sample was then subdivided into six groups, one for each age decade; the number of males and females was matched across the six groups.
Furthermore, a continuous series of 12 neurological patients with mixed aetiology, age and educational level was recruited at the Centre for Cognitive Rehabilitation (CeRiN) between May and August 2023.The two versions of the test RMT were administered by two psychologists expert in neuropsychology (EB and ST), to evaluate the possible impact of different neurological conditions on patients' ability to interact with the computer interface.The administration order of the two versions was counterbalanced across the patients as well as the administration order of the three subtasks.
Written informed consent was obtained from all participants and the study was conducted in accordance with the Declaration of Helsinki, with the approval of the local ethics committee.
Socio-demographic characteristics of both healthy participants and neurological patients are shown in Table 1, together with the SUS scores.

Learning effect
A series of linear mixed effects models were run to check for possible effects of learning.The raw scores at the three RMT subtasks were considered as a continuous dependent variable.As a factorial variable, the version of test (Version, two levels: paper-and-pencil; computerized), the administration order of the two versions (Version order, two levels: first paper-and-pencil; first computerized), and the administration order of the three subtasks (Task order; six levels: faces-words-buildings; faces-buildings-words; buildings-faces-words; buildings-words-faces; words-facesbuildings; words-buildings-faces), were added to the model.A by subject random intercept was also added to account for inter-subject variability.The same analysis was separately performed for healthy participants and neurological patients.

Inter-rater reliability
Inter-rater reliability between the computerized and the paper-and-pencil versions of the RMT was separately assessed for healthy participants and neurological patients through intra-class correlations.Results were interpreted according to Koo and Li's guidelines [36].

Healthy participants
Equivalence between the two versions of the RMT for faces, buildings and words was assessed through a two one-sided test procedure (TOST) for dependent samples [37].Indeed, while the lack of a significant difference between scores on a paired t test cannot be taken as an evidence of agreement, equivalence tests demonstrate that scores from different tests are equal within a predetermined boundary.Normality assumptions on raw variables were checked by the evaluation of skewness and kurtosis values (judged as abnormal if ≥|1| and |3|, respectively [38]).Given the non-normality of the raw scores, data were reflected and log transformed prior to analyses with equivalence tests.The TOST procedure involves an a priori determination of an equivalence boundary (± delta), within which score difference can be considered trivial.Delta was set a priori at 1.1 raw score points, which log transforms to 0.1 and results in an equivalence interval of − 0.1 and 0.1.α level was set at 0.005.

Neurological patients
The presence of any difference in the performance between the two versions of the RMT for faces, buildings and words was assessed through paired samples t-tests.

Inter-rater reliability
In the healthy participants, inter-rater reliability between the two versions of the RMT was moderate for faces (ICC = 0.699), buildings (ICC = 0.644) and words (ICC = 0.746).In the neurological patients inter-rater reliability was moderate for buildings (ICC = 0.539), and good for faces (ICC = 0.756) and words (ICC = 0.836).

Healthy participants
When comparing the two versions of the RMT for faces, no significant differences were found (t (71) = 0.298, p = 0.767, d = 0.003).Consistently, they also showed statistical equivalence on the TOST procedure (both upper and lower equivalence bound yielding a p < 0.005; computerized version mean = 27.95 ± 2.11; paper-and-pencil version mean = 27.93 ± 2.20).Also when comparing the two versions of the RMT for buildings, no significant effect was found (t (71) = 895, p = 0.374, d = 0.105).Consistently, they also showed statistical equivalence on the TOST procedure (both upper and lower equivalence bound yielding a p < 0.005; computerized version mean = 27.87 ± 2.46; paper-and-pencil version mean = 28.23 ± 1.98).Finally, no significant differences were found even when comparing the two versions of the RMT for words (t (71) = 1.88, p = 0.064, d = 0.221).Consistently, they also showed statistical equivalence on the TOST procedure (both upper and lower equivalence bound yielding a p < 0.005; computerized version mean = 27.25 ± 2.56; paperand-pencil version mean = 27.66 ± 2.26) (Fig. 2).Table 2  We also enrolled a small sample of neurological patients with mixed aetiology, age (range: 20-76) and educational level (range: 8-18) to evaluate the possible impact of different neurological conditions on the patients' ability to interact with the computerized version of the RMT.Results showed a moderate-to-good inter-rater reliability between the two versions of the RMT.Furthermore, no difference was found in patient's performance between the two versions of the RMT for words, unknown faces and buildings.Finally, results on SUS showed that the system is perceived as "good" and "acceptable" [39] (SUS mean score: 84.4/100; range: 57.5-100).Moreover, none of the patients referred feeling of discomfort while interacting with the computer interface, nor did they need further instructions from the experimenter, others then those provided by the computerized test.These findings support the possibility to use, in clinical practice, the computerized version as an alternative to the traditional one.
The computerized version of the RMT can offer several advantages in the assessment procedures, such as, for example, the reduction of the examiner's influence, higher consistency administration, presentation of the stimuli, and scoring.It can also support a higher number of screening and follow-up activities on large populations [7,40,41], due to the ease of data collection and the scoring procedure, the immediate availability of results and a significant reduction in costs linked to administering the tests.Furthermore, another strength is that, in the spirit of Open Science, test materials and instructions are freely available on the web under a Creative Common licence (https:// osf.io/ yuqbh/?view_ only= 1edfc 3db31 b74ac 581f8 3dfbc 4b7cc ed).Therefore, interested neuropsychologists can use them in both clinical and research settings.Finally, this computerized version of the RMT may be suitable for online assessment, which has recently been acknowledge as a reliable alternative to the "in-presence", traditional, neuropsychological assessment for diagnosing neurocognitive and degenerative disorders [42,43], such as preclinical dementia [44], as well as in healthy population assessment and in Alzheimer's disease [45].Online assessment makes it certainly easier for patients to access medical care permitting to save time and money even if several potential limitations must be considered (e.g., technological devices malfunctioning during the session; video or audio quality variability during the assessment).
The computerized version of the RMT is not exempt from limitations.For example, this test cannot be administered to poorly educated people over 80 years, because they usually are not familiar with technological devices.In fact, our validation sample included participants aged from 20 to 79 years, with medium-high education, and their rating of usability for the computerized version of the RMT was almost at ceiling.However, technological devices are expected to become usual tools for the whole population.

Fig. 1
Fig. 1 Example of the study and recognition phases for the computerized version of the RMT

Fig. 2
Fig. 2 Mean and standard error of the scores obtained by the healthy participants (top of the image), and the neurological patients (bottom of the image), in the three subtasks of the two version of the RMT

Table 1
Socio-demographic characteristics, MoCA's adjusted score and SUS's raw score from the healthy participants (top of the table), and the neurological patients (bottom of the table) N number; M males; F females; MoCA Montreal Cognitive Assessment; SUS System Usability Scale Healthy participants Decade N M/F Age (years ± sd) Education (years ± sd) MoCA (adjusted score ± sd) SUS (raw score ± sd)

Table 2
Mean scores and standard deviations of the two versions of the RMT in the validation sample divided by decades of age (top of the table), and in the neurological patients (bottom of the table) Computerized Mean (sd) 22.83 (3.77) 20.91 (5.50) 21.83 (4.96) 21.41 (5.44) 19.66 (6.80) 20.50 (4.10)