Introduction

During and after the COVID-19 pandemic, an increasing interest has been devoted to remote neuropsychology, i.e., tele-neuropsychology, to provide widespread access to healthcare services. The restrictions associated with the COVID-19 pandemic have increased the necessity of some alternatives to the in-person clinical neuropsychology, such as new tools and new ways of providing adequate assessment [1]. Tele-neuropsychology may reduce the limitations in assessing individuals who cannot move from their homes to be evaluated for different reasons. Previous studies highlighted the importance of implementing and improving tele-neuropsychological services and care, for groups of individuals that cannot easily reach the health services for risks related to infections, like in the case of immunosuppressive diseases like multiple sclerosis (and other neurological disorders, such as Guillain-Barré syndrome) [2], showing that remote testing essentially allows a greater access to health care [3, 4]. Crucially, although the in-person modality is preferred for detecting clinical and non-clinical signs in a neuropsychological assessment, a significant agreement between tele-neuropsychological and in-person assessment has been shown [5, 6].

Tele-neuropsychological assessment has been shown to optimize costs and reduce in-person evaluation expenses [5]. In this context, some researchers have proposed, as a feasible approach, the combination of tele-neuropsychological and in-person approaches in a kind of “hybrid” model [7].

Tele-neuropsychology can be used for repeated neuropsychological assessments in longitudinal studies. It is nowadays associated with the use of technologies that allow a video call between the clinician and the patient [8, 9]. Communication devices can be considered desirable and some qualitative information can be collected through video, such as the patient’s posture or facial expressions. This appears to be particularly relevant, given the increase of cheap and widespread technologies and software among the population of industrialized countries (e.g., smartphones and video-call applications) [6]. However, recent evidence also suggested the limitations of such approaches. In particular, among the elderly population, the telephone is still a highly feasible and widespread technology that helps for assessment purposes [10].

Telephone devices have already been used for remote cognitive assessment, in different cultures and with different languages [11,12,13]. As an illustrative example, a more extensive inclusion of telephone-based cognitive assessments has occurred in England to keep health services running [9]. In Portugal, telephone-based global screenings have been compared with traditional in-person tests, highlighting a high agreement between these two modalities [14], especially on executive and working memory functions, which are particularly vulnerable to healthy and pathological aging [14]. In Italy, some new tools have been developed [15]. For example, a remote telephone-based version has been developed and improved for the MMSE [Itel-MMSE, [16,17,18] ]. Outside Europe, for example, in the USA, telephone-based cognitive assessments have mainly been adopted and compared to in-person tools, indicating high agreement between scores, in both healthy and pathological populations [19]. Telephone-based tools have also been used in Asia, showing a high sensitivity in the differentiation between pathological and healthy conditions [see [20] for a study in Japan].

In this paper, we aimed to develop and investigate the psychometric properties of a new screening tool, Tele-GEMS (Tele-Global Examination of Mental State). Tele-GEMS is constructed as a follow-up of another developed in-person test, the Global Examination of Mental State [21]. Tele-GEMS contains several items tapping on different aspects of cognition. It takes inspiration from other tests, such as MMSE [22] or MoCA [23]. However, it covers a broader range of cognitive domains that may be particularly susceptible to decline or impairment, e.g., pragmatic abilities [24]. As such, Tele-GEMS mirrors (as much as possible) the structure and the items of GEMS to allow a meaningful comparison of the in-person assessment with the remote one. It is not developed for some specific pathological populations; we designed it to assess the global cognitive profile, to be used in longitudinal studies, with cut-offs that take into account not only age, sex, and education, but also a more comprehensive proxy of cognitive reserve (i.e., a cognitive reserve index called “CRI,” which better characterizes adult cognitive resources, potentially available) [25,26,27,28]. Indeed, previous findings have shown that, together with education, further cognitively stimulating activities in adulthood (or deprivation, on a continuum) are associated with efficiency of cognitive functioning, along the lifespan [27, 29]. Moreover, we used a life-experience cognitive reserve proxy (“CRI”) based on results from previous studies showing how it can significantly improve the accuracy of normative data and allow a finer estimation of cognitive performance [29, 30], possibly leading to a more tailored approach to patient assessment.

Methods

Participants

A total of 601 healthy persons having different social backgrounds were recruited across the North, Centre, and South of Italy. Inclusion criteria were 18 years old and older, Italian mother tongue, autonomous in main daily living activities, and without relevant neurological diseases or medical conditions that can affect cognition. Table 1 describes the distribution of the participants, stratified by age, education, and sex [31]. As for normative data there was no particular expected effect size, we followed as a rule-of-thumb enrolling at least 500 participants, in line with other studies on normative data [32, 33].

Table 1 Distribution of the variables age, education, and sex values across participants

A preliminary remote interview was carried out before administering the tests, to collect general information about the physical and mental health of participants: individuals with a medical history of stroke, traumatic brain injury, or any other neurological or psychiatric disease requiring medical treatment were not included in the normative sample.

Tele-GEMS was also tested on a clinical group of patients. Some clinical conditions, in fact, may be characterized by a limited possibility of accessing in-person health care services due, for example, to immunosuppression and increased susceptibility to infection. Patients with multiple sclerosis (MS) were recruited for testing the feasibility and construct validity of Tele-GEMS [2].

Starting from the observed correlations found in the development of GEMS [1], we estimated, conservatively, that the expected correlation between GEMS and Tele-GEMS in MS patients was 0.5. Thus, to obtain a power of 90%, the estimated sample size would have been 28 (G*Power): we thus collected 30 clinical participants to account for possible dropouts. The subgroup of 30 patients with multiple sclerosis (23 females) was administered Tele-GEMS and also a pool of tests usually adopted for clinical purposes at the Multiple Sclerosis Centre, Department of Neurosciences-DNS (University of Padova, Italy). In this clinical group, the mean age was 48.56 (range = 30–69; SD = 10.15) and the mean education was 10.14 years (range = 5–21; SD = 3.85). The MS group was also administered the Expanded Disability Status Scale (EDSS, mean = 2.81, SD = 1.69). The aim was to examine whether the score Tele-GEMS was satisfactory in summarizing the cognitive capacities in this clinical group.

Procedure

Each participant was informed about the purpose of the study and its duration, and provided their consent to participate. Informed consent was acquired via e-mail or phone (in this case, it was audio-recorded). The examiner ensured that participants were in a quiet and distraction-free room and had a stable telephone connection. The procedure for all participants started with the administration of CRIq, followed by Tele-GEMS and other tests for stratified subgroups of subjects. The study was approved by the Ethical Committee of the School of Psychology of the University of Padua (Italy) and conducted under the principles of the Declaration of Helsinki.

Materials

(1) Tele-GEMS was administered to all healthy participants and MS patients, (2) the Cognitive Reserve Index questionnaire [CRIq, 28] was administered to all healthy participants and MS patients. (3) A parallel form of Tele-GEMS was administered to a sub-group of participants, i.e., Tele-GEMS B. (4) The in-person version of Tele-GEMS (i.e.,: GEMS, Global Examination of Mental State, Mondini et al., 2022) was administered to a stratified sample of 100 healthy participants, and the (5) Montreal Cognitive Assessment [MoCA, 23] to a stratified subgroup of 50 healthy participants. The protocols and the instructions have been translated and made available also in English for possible Cross-Cultural validation (https://osf.io/t3bma/).

  1. (1)

    Tele-GEMS is made up of ten tasks: orientation, immediate memory recall, backward months (working memory), spatial representation, naming, delayed memory recall, verbal comprehension, auditory attention, verbal fluency, and metaphor comprehension (pragmatic of language). The whole examination lasts about 10 min. The items of Tele-GEMS assessing verbal cognitive functions were selected considering psycholinguistic variables, such as frequency of use and lexical agreement. The semantic distance was controlled considering all possible combinations within the words, based on WEISS (i.e., Word-Embeddings Italian Semantic Space) [34].

    A more detailed description of the instructions for each task included in Tele-GEMS is available in Supplementary information (S1). The Tele-GEMS scores for each of the ten tasks are obtained by transforming the raw scores into proportions associated with the maximum obtainable in a task and then averaging these proportions (ranging from 0 to 1). In such a way, each task equally contributes to the final composite score [see also 35]. The materials are available online at the OSF link (https://osf.io/t3bma/) with instructions for the examiners, an Excel file to calculate Tele-GEMS total score, and cut-offs according to age, education, and Cognitive Reserve Index (CRI).

  2. (2)

    The CRIq [28] is a semi-structured interview to measure the potential reserve of cognitive resources available to a person, collected during the lifespan. In a single index (CRI), the CRIq conveys the three primary sources of CR: education, working activity, and leisure time activities. The CRIq assigns a score to each item based on frequency and number of years of practice. A full description of all the items included in the CRIq, the instructions for administration, and the scoresheets are available at https://www.cognitivereserveindex.org/. For automatic calculation of CRI, see https://www.cognitivereserveindex.org/calcolo/calcolo.html.

  3. (3)

    GEMS comprises 11 tasks that tap into a range of instrumental and executive skills such as orientation, memory, working memory, visuospatial, constructional, and planning abilities, perceptual and visual attention, language (naming, comprehension, verbal fluency), and pragmatics. GEMS is administered in-person and it requires about 10 min to be administered; it has been developed for the Italian population [21].

  4. (4)

    MoCA test [23] is a widespread screening that assesses visuospatial, executive, memory, attention, language, abstraction, and orientation (time and place) abilities. The assessment consists of a 30-point test and can be administered in 10 min.

  5. (5)

    Neuropsychological tests for Multiple Sclerosis (MS) patients. The patients with MS were also administered three cognitive tests to examine typically impaired functions in this clinical condition:

    • the Symbol Digit Modalities Test (SDMT), which assesses high-level attentional and psychomotor skills [36];

    • the 15-word list recall test, which assesses immediate and delayed memory [37];

    • the Cognitive Estimation Task (CET), which assesses executive processing [38].

Statistical analyses

We obtained the percentage of the ceiling and floor values of the normative sample who performed Tele-GEMS. We investigated construct validity by Pearson’s correlation between Tele-GEMS and MoCA (i.e., convergent). To further test construct validity, we calculated, in the sample of MS patients, the correlation between Tele-GEMS and the three clinical cognitive tests. To check whether Tele-GEMS total was satisfactory in summarizing the cognitive performance of this clinical group, we performed a principal component analysis (PCA) on the three cognitive tests’ scores (after z-score transformation of their raw values). We calculated the correlation between Tele-GEMS total and the first component extracted from this PCA, which indeed can be interpreted as a summary score of all the tests. We also assessed the criterion validity of Tele-GEMS in terms of the capability of Tele-GEMS to be predicted by its in-person version (GEMS), i.e., by administering both Tele-GEMS and GEMS to a subgroup of 100 participants. We assessed the reliability of Tele-GEMS through internal consistency, test-retest reliability, inter-rater reliability, parallel-form reliability, and significant change analyses. We evaluated internal consistency by intra-class correlations [[39, 40], ICC function in R psych package]. We analyzed test-retest reliability on a subgroup of 50 participants assessed at a 2-month interval, inter-rater reliability on a subgroup of 50 participants, and parallel-form reliability on a subgroup of 101 participants assessed at a 2-month interval, through Pearson’s correlation analyses. The significant change was calculated by a regression-based approach [41]. The practice effect was calculated through paired t-tests. All analyses were performed on the total Tele-GEMS score, except the exploratory factor analysis. Exploratory factor analysis was performed using the fa function in the psych R package, R software [42]. The number of factors was first set to one and then increased by one (factor analysis results were considered acceptable when the chi-squared p-value was >0.05 and RMSEA was <0.06).

We assessed the relationship between age, sex, education, CRI, and Tele-GEMS by multiple regressions, with Tele-GEMS score as the dependent variable. As continuous predictors, age, education, and CRI were included in multiple regressions, whereas sex was included as a factorial variable. The best model among the regression models was then visually inspected following the procedure already used in [35] and [30].

We explored the possibility of improving the fit by allowing non-linear terms: for all the variables that showed a non-linear trend in the inspection of partial residuals, we tested whether adding quadratic terms yielded better models. The syntax of the models is reported below:

figure a

The model with the lowest AIC was then chosen as the best one, and clinical cut-offs were obtained using the regression-based method [i.e., 38] on such model. All the analyses, except for the parallel-form reliability, refer to Tele-GEMS A. All the correlation analyses showed the same pattern of results (significance) with both parametric and nonparametric methods. The analyses were performed with the free statistical software R [42].

Results

Among the enrolled participants, two participants were eliminated from the data analysis as they presented significant subjective memory complaints in everyday life as a possible sign of alcohol abuse and a history of referred neurological problems. The Tele-GEMS normative sample was made up of 53.24% women and 46.75% men. Neither ceiling effect nor floor effect was found. Descriptive statistics are reported in Table 2.

Table 2 Descriptive statistics of the normative sample

Construct validity

Tele-GEMS correlates with MoCA [r(48) = 0.63, p < 0.001]. In the clinical group with MS, Pearson’s correlation showed that Tele-GEMS is positively correlated with scores on SDMT [r(28) = 0.44, p = 0.01], as also with the immediate r(28) = 0.41, p = 0.02] and delayed r(28) = 0.43, p = 0.01] memory recall tests (15-list recall), and negatively correlated with CET [r(28) = −0.23, p = 0.22]; this indicated that Tele-GEMS correlated with better performance on other tests. The PCA showed that a single component could capture 60% of the variance in the z-transformed MS scores. This component also significantly correlated with Tele-GEMS [r(28) = 0.50, p = 0.004].

Criterion validity

We investigated the criterion validity by analyzing the capacity of Tele-GEMS to be predicted by its in-person version: GEMS [21]. The regression analysis showed that Tele-GEMS is significantly predicted by GEMS (B = 0.92, SE = 0.05, t = 17.04, p < 0.001 adjusted R2 = 0.74) (Fig. 1).

Fig. 1
figure 1

Construct validity of Tele-GEMS. This figure represents, on the X-axis, the total score of GEMS, which is the in-person version of Tele-GEMS. On the Y-axis the Tele-GEMS total score is reported. GEMS was administered to a subgroup of the normative sample made of N=100 participants

Internal consistency

The internal consistency of Tele-GEMS was calculated using Cronbach’s alpha. Tele-GEMS has an acceptable internal consistency of 0.74 (standardized alpha). The correlation of each item with the total was r = 0.49 for orientation, r = 0.68 for immediate memory, r = 0.45 for backward months, r = 0.48 for spatial representation, r = 0.59 for naming, r = 0.65 for delayed memory, r = 0.54 for comprehension, r = 0.49 for auditory attention, r = 0.62 for verbal fluency, and r = 0.46 for metaphor comprehension (std.r from the alpha function in the psych R package).

Test-retest reliability

The test-retest reliability of Tele-GEMS was assessed in a subset of 50 participants and tested at a 2-month interval. Test-retest reliability was calculated by means of Pearson’s correlations, resulting in an r(48) = 0.62 (p < 0.001). The practice effect of Tele-GEMS at two months was evaluated using paired t-test comparing the total Tele-GEMS score at the two times of measurement and showed a significant effect [t(49) = 3.77, p = 0.001], which indicated that, in the retest, participants had higher scores (mean = 78.44) compared to the test (mean = 74.18).

Significant change

A regression approach was used to calculate the thresholds to detect significant change [38]. The methods investigate whether the score on the second measurement is significant “far” from the predicted one, thus indicating a significant change (stability, improvement, or worsening). More information and the thresholds for significant change are available in Supplementary Table S2.

Inter-rater reliability

We investigated inter-rater reliability by examining the intra-class correlation between two examiners who separately scored the same performance on the same participant. The intra-class correlation analysis, performed on a subgroup of 50 healthy participants, showed an ICC = 0.98, which indicates a robust inter-rater agreement.

Parallel forms

The “main” version of Tele-GEMS is Tele-GEMS A, while its parallel version is B. Parallel-form reliability was calculated using Pearson’s correlations between the score on Tele-GEMS B, performed after Tele-GEMS A with the result of r = 0.75 (p < 0.001). Practice effects between these versions were checked using paired t-tests; it showed a significant result [t(100) = −2.72, p = 0.01], indicating that people had significantly higher performance on Tele-GEMS B than on Tele-GEMS A.

Factor analysis

The factor analysis aimed to investigate the correlations among the subtests that compose Tele-GEMS and to see whether a meaningful pattern of relationships emerged (see above for more details about this analysis). The fit was statistically considered as satisfactory with 3 factors [p = 0.06 and RMSEA = 0.02]. An inspection of the loadings indicates that the first factor is presumably associated with memory demands. Indeed, the highest loadings are for both the immediate and the delayed memory tasks (that require short- and long-term memory capacity). The second factor is presumably associated with general language abilities: the highest loadings of this factor are for the naming, verbal comprehension, and verbal fluency tasks (that mostly require language capacity). The third factor is presumably associated with the mental representation of objects, as the highest loading of this factor is for spatial representation (that mostly requires visual-constructive representation); see Table 3.

Table 3 Results of the factor analysis for all the sub-scores of Tele-GEMS

Effect of demographic variables

Tele-GEMS score correlated with age [r(599) = −.53, p < .001], with education [r(599) = .51, p < .001], and with CRI [r(598) = .20, p < .001]. We entered age, education, CRI, and sex in a series of linear regression models. Inspection of the partial residuals of a full model, i.e., with age, education, and CRI, showed that such predictors were non-linearly related to Tele-GEMS. Thus, subsequent regressions were performed to check whether including non-linear terms would improve the fit (see Supplementary Table S3 for more details). The final model, i.e., that best fitted Tele-GEMS scores [model 6 F = 89.55, p < .001, adjusted R2 = 0.47], included age, education, CRI, and sex and non-linear terms for age and education (see Fig. 2). The results of this model can be summarized as follows: The higher the education, the better the Tele-GEMS score, with a decreased effect at the highest education levels possibly due to the smaller size of observation at the highest range; the higher the CRI, the higher the Tele-GEMS score. Sex showed a non-significant effect on performance. All the predictors included in the final model were taken into account for the computation of cut-offs.

Fig. 2
figure 2

Effect of age, education, and the Cognitive Reserve Index (CRI) on Tele-GEMS total score. Age, education, and CRI are reported on the x-axis, while Tele-GEMS total score is reported on the y-axis. Quadratic terms of age and education are included in this figure, according with the final regression model, which best fitted Tele-GEMS data

Cut-offs

We calculated clinical cut-offs by applying the regression-based method [i.e., 38] to the results of model 6. To calculate these cut-offs, i.e., reference scores that help for the interpretation of an observed performance, Tele-GEMS scores associated with p = 0.05 were rounded to the nearest integer. Cut-offs of Tele-GEMS are reported in Supplementary Material (S3). To calculate the Tele-GEMS total score, use this link: https://osf.io/t3bma/ then you may use also the following link, to the R Shiny App, for obtaining the Cut-offs https://sonia-montemurro.shinyapps.io/Tele-GEMS_Shiny/ 

Discussion

This study presents the normative data and the psychometric properties of the Tele-Global Examination of Mental State (Tele-GEMS), a new telephone-based tool developed for the brief assessment of cognitive state, through a set of 10 tasks tapping on several cognitive processes.

Tele-GEMS showed satisfactory construct validity when considering its relationship with MoCA [23]. The construct validity was confirmed in the clinical sample of persons with MS, for whom a significant correlation was found between Tele-GEMS and a summary score derived from other neuropsychological tests.

The criterion validity of Tele-GEMS was measured by analyzing the ability of Tele-GEMS to be predicted by its in-person version [GEMS, 21]. Good criterion validity was shown, indicating that Tele-GEMS can be used when remote testing is preferable, for example, in follow-ups after a preliminary in-person assessment with GEMS. This result is in line with previous studies in which a significant agreement between tele-neuropsychological and face-to-face assessment has been shown [5, 6].

Tele-GEMS also shows acceptable internal consistency (Cronbach’s alpha > 0.60). Test-retest reliability with the same version (Tele-GEMS A) at two different points in time and parallel-form reliability with different versions (Tele-GEMS A and Tele-GEMS B) at two different points in time were also shown to be adequate. However, re-testing with Tele-GEMS A or re-testing with Tele-GEMS B, two months from the first measurement, can lead to a significant practice effect, albeit smaller when Tele-GEMS B is administered after Tele-GEMS A. Such results underline that, in general, parallel forms do not ensure the absence of a practice effect [see 31 for further considerations about the psychometric properties of cognitive screenings]. To allow the comparison across repeated measurements with Tele-GEMS, we calculated and provided the thresholds of significant change, which usually allow investigation of whether a change of relevance occurred, taking into account possible practice effects.

Tele-GEMS shows high inter-rater reliability, indicating that there is a low rate of arbitrariness in the scoring. The factor analysis showed a meaningful pattern of results, with three factors accounting for the variance across the Tele-GEMS sub-tasks and underlying memory, language, and spatial representation. This result is in line with previous findings in which some similar underlying factors explained the inter-subject score variability using a wider neuropsychological battery [43].

The effect of demographic variables was investigated in Tele-GEMS by using linear regressions. Age and education predicted the performance of Tele-GEMS, with a negative effect on age and a positive effect on education, especially at the average ranges of education. Our analyses also considered Cognitive Reserve Index (CRI), which showed a robust and positive relationship with Tele-GEMS scores. Such results are consistent with the literature showing a strong relationship between life-experience CR proxies and global cognitive functioning [30, 44, 45]. We used age, sex, education, and CRI as potential predictors of the clinical Tele-GEMS cut-offs. To the best of our knowledge, this work provides, for the first time, normative data of a telephone-based measure of the cognitive state for the Italian population, using a comprehensive CR proxy for calculating the cut-offs [for other tests with the same approach, see 21, 30].

In developing Tele-GEMS, we opted for a simple telephone-based evaluation (not including video calls) to ensure its feasibility for elderly patients who may have (at least in these years) little familiarity with technologies. However, Tele-GEMS can be easily adapted to be performed video calls to ensure better interactions with the examinee; future studies could evaluate the impact of adding video support to the stability of psychometric properties of Tele-GEMS.

Tele-GEMS is not exempt from limitations. A wider pool of cognitive tasks, for example, targeting processing speed and visual tasks, would have helped assessing more cognitive functions; however, as it is, Tele-GEMS can be widely used for clinically relevant areas and across a wide range of conditions interesting memory and language functioning (see results from factorial analysis). In the group of patients with MS, a wider pool of tests or batteries [for example, [46, 47]] might have been helpful for integrating information about patients’ cognitive status; the available tests we adopted can be considered as reliable tools for evaluating patients’ cognitive profiles, based on consensus and clinical guidance for this specific clinical population with multiple sclerosis [48]. Concerning the pros and cons of telephone assessment modality, compared to video-based modality, telephone-based modality may limit the possibility of engaging patients and use visual cues to get a comprehensive look into the patient’s condition (e.g., hygiene); however, it presents unique benefits, such as greater ease of use that make it critical to engage with key populations, and as a backup for when a video was not an option [10].

To conclude, Tele-GEMS shows good psychometric properties, with adequate reliability and good validity as a screening to examine the global state of cognition remotely. In the spirit of open science, we shared all the materials and analysis code used for Tele-GEMS under a creative commons license to facilitate its diffusion among clinicians, but also adaptation and translation to different languages and sociocultural contexts. Tele-GEMS and its instruction form have been provided also in English (at the link: https://osf.io/t3bma/). Investigation in multiple languages will help testing its cross-cultural features of global cognitive performance assessed from remote.