Introduction

Working memory (WM) comprises a set of high-order, non-instrumental limited-capacity cognitive functions that allow “temporary storage and manipulation of information necessary for […] complex cognitive tasks” [2].

The original multi-component model [3] identifies a central executive (CE) component and two modality-specific sub-components: the phonological loop (PL) and the visuospatial sketchpad (VSS)—processing verbal and visual information, respectively. The CE is a control system of limited capacity that supports complex cognitive activities by suppressing irrelevant information; it allocates attentive resources and allows alternating between different tasks. The PL is a modular subsystem that retains the memory online and prevents it from decaying through both vocal and sub-vocal rehearsals. The VSS temporarily stores and processes visual and spatial information. Most recent formulations [31] introduce a further component, i.e., the episodic buffer—a multi-modal limited-capacity system integrating information from the other components into a unitary episodic representation. WM functioning emerges from the interaction between perceptual and attentive mechanisms and representations stored in the long-term memory system [11, 12].

Converging evidence from neuroimaging and brain injury studies hint at a widespread bilateral fronto-parietal both cortical and sub-cortical network being the neural substrate of WM [18, 25, 27, 33].

WM deficits are thus often associated with focal or diffuse brain damages; hence, the neuropsychological assessment constitutes a relevant aspect for cognitive rehabilitation [13, 20, 28]. Several studies have found that WM deficits impair the activities of daily living and affect rehabilitation outcomes [14, 23].

In clinical practice, the digit span (DS) [36] is one of the most widely used tasks to measure the capacity of the auditory-verbal component in WM. According to Baddeley’s model [31], the forward (FDS, forward DS) version evaluates the short-term, passive retention of verbal stimuli, whereas the backward version (BDS, backward DS) requires maintaining and actively manipulating information in order to reproduce in reverse order the sequence presented. Both forward and backward versions of the digit span have been validated and normed in Italy [24, 26].

Another instrument for assessing auditory-verbal WM is the Digit Ordering Test (DOT) [10, 17, 37], which requires clients to listen to a series of randomly ordered digits and then to recall items in ascending order immediately after their presentation. No Italian standardizations of the DOT are available so far.

Evidence regarding the influence of WM deficits on daily functioning highlights the relevance of ecological validity in cognitive testing [5]. Available tests for the assessment of WM may fail to detect its dysfunction in everyday life [35].

We aimed at developing a composite WM assessment battery (WoMAB) and more specifically at the following: (a) standardizing a novel task that investigates auditory-verbal WM from an ecological perspective; (b) validating and norming the DOT in Italian healthy individuals; and (c) providing updated normative data for DS tasks. A novel scoring procedure will be also proposed: WM and total (T) outcomes, i.e., the number of elements in the longest sequence and that of recalled sequences, respectively. The underlying hypothesis is that WM scores reflect a measure of the auditory-verbal WM capacity, whereas T scores provide insight into attentive monitoring abilities during task execution.

Methods

Participants

One-hundred and seventy-three Italian native-speakers individuals were initially recruited from different regions of both Northern and Southern Italy, as well as from the Canton Ticino region of Switzerland. Sample stratification is displayed in Table 1.

Table 1 Sample stratification for age, education, and sex

Inclusion criteria were as follows: (a) age between 20 and 90 years; (b) years of education between 5 and 18; (c) an adjusted scores on the Mini-Mental State Examination (MMSE) above the established cut-off [21, 22].

Participants were excluded if presenting with neurological disorders, traumatic brain injury, psychiatric disorders, previous brain surgeries, drug abuse, learning disabilities, psychotropic drug treatment, and visual/auditory impairments (participants with corrected-to-normal vision and audition were included).

After applying inclusion/exclusion criteria, N = 168 individuals were included.

Participants provided written informed consent before being enrolled. The study was approved by the Ethics Committee of the University of Pavia and conducted in accordance with the Declaration of Helsinki.

Materials

Four auditory-verbal WM tests were administered whose order was counterbalanced across participants to avoid carry-over effects.

Each task started with warming-up trials. Mistakes on preliminary items could be corrected, although without providing execution strategies. Stimuli were pronounced at the rate of one per second, with neutral intonation. Participants were given 15 s to recall the items. Two lists of the same length were administered; the task was interrupted after two consecutive fails. No cues were provided but self-corrections were accepted. Recalled sequences containing intrusions were scored as 0. Both WM and T outcomes were computed for each WM task.

The Ice Cream Test (ICT) is a novel ecologically valid tool investigating auditory-verbal WM. Participants were required to act as if they were waiters in an ice cream shop who have to keep track of customers’ orders. Each customer will order a single ice cream flavor; it is required to tell, within 15 s, how many ice creams have to be prepared for each flavor. ICT-WM outcome ranges 0–10 (longest sequence) and ICT-T 0–16 (recalled sequences).

FDS and BDS tasks were adapted from Monaco et al. [24]. Additionally, the present version introduced one two-digit warming-up sequence. Contrarily to other WM outcomes, that for the FDS was named “Span” (FDS-S)—as not unequivocally targeting WM abilities. FDS-S ranges 0–9 (longest sequence) and FDS-T 0–16 (recalled sequences). BDS-WM ranges 0–8 (longest sequence) and BDS-T 0–14 (recalled sequences).

The DOT [10, 17, 37] consists of presenting a list of randomly ordered digits that have to be recalled in ascending order. DOT-WM outcome ranges 0–8 (longest sequence) and DOT-T 0–12 (recalled sequences).

Test protocols will be provided to interested practitioners upon request to the corresponding author.

Statistical analyses

By conservatively assuming a small-to-medium size (f2 = 0.10) of background predictors effects (dfnumerator = 3) [6, 24, 26], the minimum sample size was set at N = 146 via a power analysis for multiple linear regression analyses [32] (R package pwr) [9]. α was set at 0.05 and 1-β at 0.9; N was yielded by dfnumerator + dfdenominator + 1.

Skewness and kurtosis statistics were regarded as suggestive of a violation of the assumption of normality if >|1| and |3|, respectively [19].

Associations of interest between quantitative variables were assessed by means of either Pearson’s or Spearman’s coefficients. Bonferroni correction for multiple comparisons was applied if adequate.

Norms were drawn by adopting the equivalent score (ES) method [8, 34], a regression-based approach adjusting raw scores (RSs) for significant predictors of interest (or their transforms) and then allotting adjusted scores (ASs) into a 5-level ability scale: ES = 0 ( “abnormal”); ES = 4 (“high-end normal”); ES = 1, 2, and 3 (respectively, “borderline”, “low-end normal”, “normal”). Outer and inner tolerance limits (oTL; iTL) were computed to provide an interval estimate for the cut-off (ASs < oTL fall within ES = 0). Average ESs (AES) [7] were computed for both T and WM/S outcomes in order to provide a global estimate of attentive monitoring and WM capacity across tasks.

R 3.6.3 [30] was used for implementing the analyses. Regression studies and calculations of both TLs and ES threshold were implemented as described in Aiello & Depaoli [1].

Results

Participants’ background features and cognitive scores are summarized in Table 2.

Table 2 Participants’ background features and WM measures

In agreement with Monaco et al.’s [24], the ratios between FDS and BDS tasks were computed (by dividing BDS measures by FDS ones) for both T al WM/S scores (DSR-T,DSR-WM/S).

Non-derived WM measures were highly internally related (0.39 ≤ r(168) ≤ 0.96; p < 0.001)—even when adjusting the significance threshold (αadjsusted = 0.05/28 = 0.002).

Ratios were associated with all remaining WM outcomes (0.23 ≤ r(168) ≤ 0.72; 0.003 ≤ p ≤ 0.001), whereas not with FDS measures.

All WM/S and T measures were negatively associated with age (range: − 0.32 ≤ r(168) ≤  − 0.59; p < 0.001) whereas positively with education (range: 0.2 ≤ r(168) ≤ 0.46; p ≤ 0.011). Males outperformed females on the BDS-T, BDS-WM, and DSR-T (range: 2.17 ≤ t(166) ≤ 2.56; 0.011 ≤ p ≤ 0.035); no other sex differences were detected.

When simultaneously tested, age and education transforms revealed to be predictive of DOT (-T and -WM), ICT (-T and -WM), and FDS (-T and -S) scores (age: range: |.28|≤ β ≤|.37|; |3.03|≤ t ≤|4.22|; p ≤ 0.003; education: range: − 0.35 ≤ β ≤  − 0.24; − 3.97 ≤ t ≤  − 2.7; p ≤ 0.008). Within a multiple regression model, sex, age, and transformed education were found to be predictive of BDS-T (sex: β =  − 0.21; t =  − 3.48; p = 0.001; age: β =  − 0.47; t =  − 5.93; p < 0.001; education: β =  − 0.21; t =  − 2.71; p = 0.008) and BDS-WM (sex: β =  − 0.18; t =  − 2.87; p = 0.005; age: β =  − 0.46; t =  − 5.61; p < 0.001; education: β =  − 0.19; t =  − 2.38; p = 0.018). With respect to ratios, sex and transformed age were found to be predictive of DSR-T scores (sex: β =  − 0.08; t =  − 2.49; p = 0.014; age: β =  − 0.16; t =  − 4.6; p < 0.001), whereas only inverse age significantly predicted DSR-WM/S scores (β = 4.08; t = 3.39; p = 0.001).

The mean AES scores were 3.06 ± 0.7 (0.6–4) and 3.11 ± 0.7 (0.6–4) for WM and T outcomes, respectively. No association with either age or education was found with respect to both AESs. However, AES-T was significantly higher (t(166) = 2.4; p = 0.018) for males than for males.

Correction coefficients for selected co-occurrences of background predictors along with equations for adjusting RSs are reported in Table 3 (DOT and ICT), Table 4 (FDS and BDS), and Table 5 (DSR). Normative values for all measures are reported in Table 6. For AESs, only TLS are provided [7].

Table 3 Adjustment grids according to age and education for Digit Ordering Test (DOT) and Ice Cream Test (ICT) total (T) and working memory (WM) raw scores
Table 4 Adjustment grids according to age and education for forward and backward digit span (F/BDS) total (T) and span/working memory (S/WM) raw scores
Table 5 Adjustment grids according to age, education, and sex for the backward digit span (BDS) total (T) and Working Memory (WM) raw scores
Table 6 Equivalent scores for DOT, ICT, FDS, BDS and RDS adjusted scores

Discussion

This work provides Italian neuropsychologists with a novel standardized tool for the ecological assessment of auditory-verbal WM abilities (ICT), as well as with norms and validity evidence for the DOT. Both ICT and DOT measures proved to converge with widespread WM measures (FDS and BDS).

Moreover, updated normative data are provided for both the FDS and the DBS. Cut-offs here reported are substantially comparable to those found by Monaco et al. [24] (FDS: 2.78 vs. 2.65,BDS: 4.46 vs. 4.26). Inconsistently with previous normative studies [24, 26], as well as with contributions on sex differences in WM abilities [16], males scored higher than females on BDS and DSR measures. The present findings thus counterbalance those hinting at a prominent male–female difference being detectable on visuospatial but not in phonological WM tasks [29].

It is also worth noting that the DSR in the present study differs from that of Monaco et al. [24] since it was computed on raw rather than adjusted scores.

This work also introduced a novel scoring procedure that provides insights about different facets of phonological WM-WM capacity (WM/S) and attentive monitoring abilities during task execution (T) [4, 15].

AESs here reported further contribute to the adaptive nature of this composite battery. Indeed, the WoMAB allows an in-depth profiling of WM abilities by yielding both single-task-level (ESs) and global (AESs) standardized scores with respect to considered outcomes (T and WM/S). Although both AESs proved to be independent of age and education [7], practitioners should nonetheless exert caution when interpreting AES-T measures due to sex differences.

A limitation has to be finally acknowledged regarding sample stratification: certain co-occurrences of age and education levels indeed happened to be poorly represented (e.g., highly educated individuals aged ≥ 75 years)—possibly due to sampling biases. This should lead to exercising attention when adjusting RSs of individuals with these background features. However, it is believed that the soundness of regression analyses as far as statistical power is concerned allows sufficiently adequate predictions of adjustment factors for the aforementioned co-occurrences too.

In conclusion, this study validated and normed the WoMAB, a multi-component, flexible battery for WM assessment in adult neurological populations. Its novel scoring procedure allows assessing both WM capacity (longest sequence) and task-related attentive processes (number of recalled sequences). Moreover, the WoMAB encompasses ecologically valid measures that can help practitioners evaluate the impact of WM deficits in patients’ daily activities.