Introduction

Developed by Philip Corsi in 1973, the Corsi Block Tapping Task has been used as a measure of visuospatial memory span in clinical and experimental settings for decades [1, 2]. In neuropsychological research, indeed, it is the most relevant and used non-verbal task and, in clinical assessment, it has proved to be a valid and reliable instrument to measure an individual’s visuospatial short-term and working memory [3,4,5].

In its classical version, the task consists of nine identical blocks on a board, and participants are asked to reproduce block-tapping sequences of increasing length. The repetition of the examiner’s sequence could be in the same (forward condition) or reverse order (backward condition). The longest sequence correctly reproduced in the former condition is the span, a measure of short-term memory storage capacity, whereas the one in the latter is a measure of visuospatial working memory. Aside from the table task, the test is also available in modified forms included in neuropsychological batteries [6], in a computerised [7] and touchscreen [8] format, and as a “walking Corsi” [9].

The Corsi span (CS) in Italy has relatively recent norms [10] and is frequently used to assess patients with different neurological conditions: Alzheimer’s disease [11], Parkinson’s disease [12], psychotic disorders [8] and traumatic brain injury [13].

The majority of studies have focused on the backward CS, as a reliable measure of visuospatial working memory and one of the measures of executive functions [2, 14], although in the clinical setting both forward and backward conditions are administered to almost all neurological patients.

The same setup of CS can be used to administer the Corsi SupraSpan Learning (CSSL) and Recall (CSSR), a test assessing visuospatial long-term learning and visuospatial long-term memory skills. This test, although less used in clinical practice, as well as in research, has undoubted advantages. First of all, it can be used to assess learning and long-term memory skills in patients with limited verbal abilities. Moreover it can be used to identify a dissociation between visuospatial and verbal memory deficits. Last but not least, it has a great sensitivity even to subtle long-term memory deficits [15,16,17], having the chance to be linked to the individual span.

Various supraspan techniques have been developed to assess nonverbal material acquisition. The CSSL procedure, scoring system, and normative values were first introduced in Italy by Spinnler and Tognoni [18]. Subjects learned a fixed sequence of eight block touches, which were repeated until they achieved a specific criterion (three consecutive sequences repeated correctly) or 18 presentations.

The previously published procedure, however, had one limitation, as pointed out by Capitani et al. [19]: an individual with a low short-term memory span may find it more difficult to learn a fixed-length sequence, while a long spatial sequence may be affected by a short-term memory span (CS). Consequently, Capitani and colleagues [19] provide separate CSSL norms that take into account the CS length for values ranging from 4 to 6.

According to the authors, CSSL administration is not possible if the CS score is lower than three or equal to eight, since the sequence length itself is equal to eight cubes. In other words, CSSL cannot be administered when the CS is too small or too big. However, this criterion does not consider that the memory systems involved in CS and CSSL are different and double-dissociated [20, 21]. As a consequence, a specific visual-spatial short-term memory impairment may impact CS but not CSSL. Moreover, in the case of a CS score of eight, the procedure for CSSL requires immediate repetition of the sequence almost three times to reach the highest score. Consequently, whoever gets a CS of 8 could fail CSSL.

This administration limit, reported by Capitani and colleagues [19], can be resolved by adjusting CSSL scores based on CS. On the other hand CSSL score could be adjusted for CS score, in addition to the other classical demographic variables (age, education and sex), using a single regression model instead of having a different set of norms.

An additional uncertain point in both previous CSSL studies is related to the recall of the learned sequence, named Corsi SupraSpan Recall (CSSR). The test procedure proposed by the authors included a recall part, but without reporting normative data in both publications [18, 19]. Having normative data available for recall would mean making the task able to provide a measure of long-term visuospatial memory, useful, for example, in patients with limited verbal abilities.

Therefore, the aim of this study was to update CSSL norms and, more importantly, to provide normative data for CSSR. Since CSSL seems to be influenced by CS, normative data for CS, based on the same sample, were also provided, as well as the relationship between the two tasks. Finally, CS and CSSL normative data will be compared to the available norms.

Materials and methods

Participants

A power analysis was performed to assess the minimum sample size required. Based on a regression model with four independent demographic variables (age, education, sex and CS), using α = 0.05, power = 0.80 and a small effect size f2 = 0.04, a minimum sample of 304 participants was required. Participants were native Italians without any history of neurological or psychiatric disorders, current or past (including stroke, brain injury, clinically diagnosed dementia, depression, alcohol or drug abuse), and obtained a normal score on the Mini-Mental State Examination-MMSE (adjusted score > 23.8) [22]. Each participant had normal or corrected-to-normal vision. A convenience sample of volunteers was selected among those who could be directly contacted by the different examiners. No compensation was provided. Initially, a sample of 342 participants was collected, but two participants did not meet the inclusion criteria (lower MMSE score) and consequently, a final sample of 340 participants was obtained (177 female, 163 male; mean age = 51.6, SD = 19.4, range 21–89; Education mean = 13.1, SD = 4.6, range 4–25). The subdivision of participants by age, education and sex is visible in the supplementary materials table A1 at https://osf.io/jcn9d/. Informed consent was signed by participants before the evaluation. This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of Milano Bicocca University (RM-2022-493; 7 February 2022).

Corsi block tapping board

The apparatus for the task consisted of a set of nine cubes arranged irregularly. Since different versions of the apparatus were described [1, 23] and the performance depends on path configuration [24], the apparatus is described in detail. The block tapping table consists of a white wooden board of 320 × 250 mm with nine white cubes with 40 mm edges positioned in a pseudo random order as visible in Fig. 1. The digits 1 through 9 were placed on the examiner side of the cubes and numbered as reported in the figure. Examiner side was at the bottom part of the figure and the participant’s side at the top side. The same board was used for all tests.

Fig. 1
figure 1

Physical measures of the apparatus used for Corsi Span block tapping test and sorsi supraSpan learning and recall test

Tests and procedure

Corsi span

The procedure is derived from the one presented by Orsini et al. [25]. The examiner sits at a desk in front of the participant and says: “Touch the cubes that I touch, in the same order”. With the forefinger of the dominant hand the examiner touches in sequence the cubes indicated on the notation protocol (Table A2 at https://osf.io/jcn9d/), one cube every two seconds returning to the space between himself/herself and the board after each cube touched.

For each block there are three sequences of cubes. Each block, compared to the previous one, consists of an increasing number of cubes to touch. In order to pass the block sequence, the subject must correctly execute two out of three trials. The span is given by the block in which the subject provided at least two correct answers. Since the performance at CS depends on path configuration [24], the paths used are standard and derived from the first three series from Orsini et al. [25] with a modification on the sequence with length of 8: the sequence identical to CSSL was not used.

Corsi supraSpan learning and recall

The examiner presents the fixed series of eight touches (5-8-3-2-6-7-1-9) at the rate of one cube every two seconds. The examiner should keep his or her hand between the table and the examiner after each touch. The examiner says: “Now we will do the same thing as before, but I will touch a greater number of cubes. You will observe carefully and when I have finished, I ask you to touch the same cubes that I touched, in the same order”. The examiner notes the sequence of cubes touched for each attempt, considering only the first eight touches. If the participant reproduces the sequence correctly, he/she is asked to reproduce it again without repetition from the examiner. The test ends when the criterion of three consecutive correct repetitions is reached, or after 18 attempts. The examiner does not say that the sequence is identical trial by trial. The examiner does not say that another attempt will be required later. After five minutes, with no interfering visuo-spatial task, a recall of the previously presented sequence is asked (CSSR). The examiner says: “Now I ask you to reproduce the sequence of cubes that we have reproduced several times before”. The examiner notes the sequence of cubes touched, considering only the first eight. The scoring system was based on that published by Capitani et al. [19]. Each response was scored based on the probability that the correct response would be given by chance (or at least a partial chance). Table A3 at https://osf.io/jcn9d/ shows the corresponding partial scores for each response. If participants achieved the criterion prior to the 18th trial, they received a score corresponding to the correct performance (1.62) for the remaining trials up to the 18th. Consequently, CSSL had a maximum score of 29.16 and CSSR of 1.62.

Statistical analyses

The analyses were then divided into three sections. As a first step, descriptive analyses were conducted along with correlations between demographic variables and raw scores. Normative values were defined in the second part. The regression-based procedure definition of norms was used, which includes several consecutive steps in line with different neuropsychological tests [26,27,28] and described in detail elsewhere [29]. To find the most appropriate transformation of demographic variables (age, education and sex) on the dependent variables (CS, CSSL and CSSR, taken separately) the general linear model was used. A series of bivariate regressions were compared based on the lowest Akaike’s Information Criterion (AIC) [30]. The most effective transformations of demographic independent variables were selected. They were included in a series of bivariate and multivariate regressions with one to four predictors. Based on the small AIC, the most appropriate regression model was then selected, if it was significant (p < 0.05). The same best regression model was then applied to deviations from the mean scores for independent variables (in their appropriate transformations) and dependent variables. Reversing regression coefficients of these last regressions, a correction regression equation was obtained. Based on the correction regression, a score grid correction was developed. Adjusted scores for demographic variables were obtained by adding the correction score to the raw scores. On these adjusted times, the one-sided nonparametric 95% tolerance limits, with a confidence interval of 95% were calculated. Percentile ranks and rank-based equivalent scores on the adjusted score were calculated [28, 31] (Table 1).

Table 1 Descriptive statistics (mean, SD and range) for demographic variables and the three scores of the tests (CS, CSSL and CSSR) together with their correlations. * p < 0.05; ** p < 0.01; *** p < 0.001

In the third part, in order to compare the diagnostic performance of the normative data here defined to the more recent norms available for CS and CSSL [10, 19], a simulation analysis was performed based on a generated dataset of memory impaired patients. This population was chosen for the comparison because data for CS and CSSL are both needed. The same procedure was used in a previous comparison of norms via simulation [32]. By this way, to compare norms, it is sufficient to use published means and standard deviations, overcoming some problems that were encountered when comparing different normative scoring systems only descriptively. Different normative diagnostic systems rely on different adjustment regression equations that were derived from data collected in different times (2013 vs. 2023 for CS and 1990 vs. 2023 for CSSL) and cannot be compared directly. In addition, in order to compare the classification abilities of different normative systems, one must utilise a sample that is independent from the sample used for the development of the normative system. This permits to generalise predictions to unseen data and most importantly, the characteristics of the sample used for comparison reflect at better the characteristics of a clinical sample. Specifically, classification accuracy and area under the curve (AUC) were used to compare diagnostic performance between normative scoring systems. The agreement was assessed using Cohen’s Kappa. All analyses were performed using the R statistical environment 4.3.2 and specific packages [33]. Supplementary materials are available at https://osf.io/jcn9d/.

Results

Descriptive statistics

The descriptive mean results and correlation on the three scores (CS, CSSL and CSSR) among the three demographic indicators are reported in Table 1. Mean results and SD on CS, CSSL and CSSR separated for decades of ages are reported in supplementary table A4, available at https://osf.io/jcn9d/. Results showed a negative moderate correlation between Age and CS, CSSL and CSSR; a moderate correlation between Education and CS and CSSL, but more importantly, a moderate correlation between CS and CSSL, a moderate to low correlation between CS and CSSR and a moderate to high correlation between CSSL and CSSR.

Considering the lower (2 and 3) and higher score (8) on CS, in the sample collected, the CSSL and CSSR scores were then described. Two participants out of 340 obtained a raw score of 2 on CS (0.6%), and five out of 340 participants obtained a score of 3 (0.9%). The score of 8 was obtained by 3 participants out of 340 (0.9%) and no one of these participants reached the maximum score of 29.16 in CSSL. Conversely, CSSL maximum score was reached by two participants (0.6%). Consequently, the ceiling and floor effects are marginal in this group of healthy participants. Afterall, using the normative data defined, the CSSL score of participants who showed a pathological score on CS, no one exhibited a pathological score in CSSL and CSSR. Again, we tested the reversed hypothesis, observing the CS score on CSSL pathological score participants. No one exhibited a pathological score in CS. Finally, 161 participants (47.4%) reached a ceiling effect on CSSR and 23 (6.8%) a floor effect. These results indicate that the performance on CS and CSSL are double dissociated, almost in the healthy controls participants tested.

Normative data definition

The bivariate and multivariate regressions selection using the AIC method showed that the model that better describes CS score include Age in its logarithmic transformation and Education in its square root transformation. CSSL was influenced by Age and Education both in their square root transformation and most importantly by CS using the square root transformation. CSSR was also influenced by Age and Education in square root and inverse transformation respectively and importantly by CS. AIC tables used for selecting the best models are shown in Table A5 available at https://osf.io/jcn9d/.

In order to obtain correction values, multiple regressions on the raw scores were redrawn from deviations from mean scores and their coefficients reversed. The regression for obtaining adjusted scores are reported in the caption of Tables 2, 3 and 4 for CS, CSSL and CSSR respectively. The R2 for each regression were 0.24, 0.34 and 0.22 respectively.

The correction grids derived from these regression equations are available in Tables 2, 3 and 4 for quick and easy application in clinical settings. The adjusted score can be calculated by adding the raw score to the reported value obtained by the table or by regressions.

Table 2 Correction grid for computing the adjusted scores of Corsi Span (CS). The adjusted score can be calculated by adding the raw score to the reported value obtained by the table. Age and education should be selected based on the nearest values. If precise scoring is required, the correction regression should be used. Adjusted score = raw score + 0.7616*(log(Age)-3.864) − 0.4229*(sqrt(Education)-3.555). The logarithms were intended to be computed on the natural base e
Table 3 Correction grid for computing the adjusted scores of CSSL based on Age, Education and Corsi Span size. The adjusted score can be calculated by adding the raw score to the reported value obtained by the table. Age and education should be selected based on the nearest values
Table 4 Correction grid for computing the adjusted scores of CSSR based on Age, Education and Corsi Span size. The adjusted score can be calculated by adding the raw score to the reported value obtained by the table. Age and education should be selected based on the nearest values

All the adjusted scores were not normally distributed (all ps < 0.01), consequently, the one-side inner (ITL) and outer (OTL) 95% tolerance limits with 95% confidence intervals were calculated using a non-parametrical approach. For a sample of 340 participants, using a score in which the higher, the better, they correspond to the 11th and 24th observations. Their values are reported in table 5

Table 5 Equivalent scores together with outer (OTL) and inner (ITL) tolerance limits for the three tests based on the adjusted score. CS = Corsi Span; CSSL, Corsi SupraSpan Learning; CSSR Corsi SupraSpan Recall

The cut-off scores provided by the outer tolerance limits, together with median and other intermediate intervals were subsequently transformed into rank-based Equivalent Scores (ES). Moreover, for each score, percentiles were calculated. They are listed in Table 5 and A6 available at https://osf.io/jcn9d/, respectively. To facilitate the scoring process, an Excel spreadsheet was available at the aforementioned link.

Comparison between norms

In order to compare the defined norms to those already available in the literature, a simulated sample of 100 memory impaired participants was created. The means and standard deviations of the simulated sample were retrieved from Cosi et al. [34] considering 50 participants with mild memory impairment and 50 participants with memory impairment as defined in the study. Since CS and CSSL were related, their values were assigned pseudo-randomly to allow them to correlate positively (r = 0.22, p < 0.01) as reported in the study. Since the previous CSSL norms does not take into account lower scores on CS (2 and 3), comparison of CSSL was performed only on a small number of simulated cases (43). Confusion matrices of simulated patients classification were reported in Table A7, performance metrics in Table A8 in the online supplementary materials, available with generated dataset and R script at https://osf.io/jcn9d/. Results of norms comparisons show a high agreement between old and new norms for CS and CSSL (accuracy about 0.90), confirming the similarity of both norms, even though for CSSL previous norms were developed more than 30 years ago also using different procedures.

Discussion

The aim of this study was to define updated normative data for CS, CSSL together with CSSR, considering also the effect of CS on these latter. For this purpose, the performance of a group of 340 healthy adult individuals were analysed in relation to socio-demographic variables, such as age, sex and education; CSSL and CSSR were analysed in regard to CS score, too.

As expected from previous studies [10, 18, 19], in all tasks increasing age was associated with progressively lower performance, while higher education was related to a better performance. Sex did not significantly influence performance on any task, as found by Capitani and colleagues [19] and in line with other several studies showing the absence of sex differences in visuospatial memory [9, 35]. Piccardi and collaborators [36], indeed, suggested the importance of correcting CS scores for education more than for sex, since age and education regulate sex effects. More importantly, CSSL and CSSR are both influenced by CS. By adjusting CSSL and CSSR norms according to various CS levels, the issues highlighted by Capitani and colleagues [19] could be solved.

Comparing norms, the CS and CSSL norms exhibited strong agreement in terms of detecting the absence or the presence of a memory deficit, even though the CSSL norms were developed over 30 years ago. This result is not a given one. While previous research has demonstrated satisfactory agreement between old and recent norms across various verbal memory tests [33], the level of concordance observed in this investigation is notably high.

However, our norms differ from the previous norms in two aspects. Firstly, they incorporate CS’s role in CSSL using a single regression model. Secondly, the norms expand the range of CS between two and eight, bypassing the limitations of Capitani [19]. The updated normative data for CS and CSSL and the computation of those of CSSR, not previously available, represents a step forward for the neuropsychological practice; this latter aspect, in particular, should allow the reliable assessment of the performance in visuospatial learning and long-term visuospatial memory also for those neuropsychological patients with limited verbal abilities. Using these tasks would enable the assessment of how visuo-spatial memory deficits impact the functioning of other cognitive functions as well. Furthermore, this contribution clarifies the administration methods of these instruments, making their execution more uniform among clinicians and researchers.

Considering the various applications of these instruments, such as clinical diagnosis, treatment monitoring, and research, this contribution also clarifies their administration methods. This would make their execution more uniform in different fields.

A limitation of the study or in general to the paradigm of CS/CSSL/CSSR is the insufficient number of studies on validity and reliability. CS has been validated in WISC-III [37] and with a factor analysis [38], its split-half reliability was 0.75 [39], and the test-retest reliability was between 0.70 and 0.79 [37]. CSSL demonstrated its ecological validity in a study in which navigation abilities were predicted by CSSL as Alzheimer’s Disease progressed [40]. In our knowledge, only one piece is available for reliability of CSSL which shows an r = 0.80 [18]. The absence of data regarding the validity and reliability of these tests was reported in 1998 and remains valid to the present day [3]. All these psychometric measures should be evaluated in future studies.

The average level of education of many populations, including Italians, has increased in recent years. Therefore, updated normative data are required to reflect not only evolving cognitive abilities, but also demographic changes. The average educational level of the sample was found to differ from the previous norms: 13.1 years (current study) compared to 10.7 years [19]. Using outdated data could lead to incorrect interpretations of test results for individuals with higher education levels [41, 42], thus neuropsychological tests should be interpreted using current norms to allow for a more accurate assessment of cognitive functions and, in this particular case, of visuospatial working memory.