Telephone-based Frontal Assessment Battery (t-FAB): standardization for the Italian population and clinical usability in neurological diseases

Background Despite the relevance of telephone-based cognitive screening tests in clinical practice and research, no specific test assessing executive functioning is available. The present study aimed at standardizing and providing evidence of clinical usability for the Italian telephone-based Frontal Assessment Battery (t-FAB). Methods The t-FAB (ranging 0–12), comprising two subtests, has two versions: one requiring motor responses (t-FAB-M) and the other verbal responses (t-FAB-V). Three hundred and forty-six Italian healthy adults (HPs; 143 males; age range = 18–96 years; education range = 4–23 years) and 40 participants with neurological diseases were recruited. To HPs, the t-FAB was administered along with a set of telephone-based tests: MMSE, verbal fluency (VF), backward digit span (BDS). The in-person version of the FAB was administered to both HPs and clinical groups. Factorial structure, construct validity, inter-rater and test–retest reliability, t-FAB-M vs. t-FAB-V equivalence and diagnostic accuracy were assessed. Norms were derived via Equivalent Scores. Results In HPs, t-FAB measures yielded high inter-rater/test–retest reliability (ICC = .78–.94), were internally related (p ≤ .005) and underpinned by a single component, converging with the telephone-based MMSE, VF, BDS (p ≤ .0013). The two t-FAB versions were statistically equivalent in clinical groups (ps of both equivalence bounds < .001). Education predicted all t-FAB scores (p < .001), whereas age only the t-FAB-M score (p ≤ .004). t-FAB scores converge with the in-person FAB in HPs and clinical groups (rs = .43–.78). Both t-FAB versions were accurate in discriminating HPs from the clinical cohort (AUC = .73-.76). Discussion The t-FAB is a normed, valid, reliable and clinically usable telephone-based cognitive screening test to adopt in both clinical and research practice. Supplementary Information The online version contains supplementary material available at 10.1007/s40520-022-02155-3.


Introduction
Telephone-based cognitive screening (TBCS) is relevant to clinical and experimental telemedicine, as allowing firstlevel neuropsychological evaluations when in-person access to clinics is not possible due to logistical [1,2], geographical [3], economical [4] and safety reasons [5]. TBCS also facilitates the implementation of epidemiological/population-based studies [6], decentralized clinical trials [7] and prevention campaigns [8]. Moreover, at variance with other remote assessment media (e.g., videoconference), the telephone proves to be well accepted by receivers of different socio-demographic backgrounds as requiring minimal digital and health literacy [9,10], being also judged by users as adequate to meet clinical/research questions [9]. It has to be then noted that, as other telehealth practices, TBCS has the potential to reduce the psychosocial burden related to in-person healthcare, weighting not only on patients, but also on their caregivers and professionals carers, which has significantly increased since the onset of the COVID-19 pandemic [11,12].
Although several TBCS tools for the assessment of global cognitive status have been developed in recent years [6], no standardized test selectively screening executive functions is available, especially in Italy [13,14]. Indeed, fully standardized Italian TBCS tests are currently limited to multi-domain screening tests for global cognition [2,15].
Nevertheless, executive deficits feature a variety of neurological diseases affecting cortical and subcortical frontal circuitries [16] and convey relevant information towards differential diagnosis and prognosis [17]. Moreover, subclinical or mild dysexecutive features are highly prevalent in healthy elderlies due to the anatomo-functional changes of frontal networks resulting from physiological aging [18,19], which were shown to negatively impact on daily living functional outcomes [17]. Screening for executive dysfunctions is thus crucial in both neurology and geriatrics, and the availability of standardized TBCS tests for frontalexecutive dysfunctions should be part of telemedicine [14], especially when dealing with elderly clinical populations. Indeed, aging individuals often come with a range of frailties that would prevent or delay their access to outpatient in-person clinics, due to both logistical (e.g., motor disabilities) and health safety reasons (e.g., an increase exposure to SARS-CoV-2 infection) [3,8,14]. Such instruments would be also useful to be adopted as outcome measures/endpoints within large-scale studies or decentralized clinical trials in neurological and geriatric populations with frontal-executive impairments [20].
Among frontal-executive screeners, the in-person version of the Frontal Assessment Battery (FAB) [21] represent one of the most psychometrically and diagnostically sound tests worldwide [22]. In Italy, the FAB has been shown to come with optimal psychometrics, diagnostics and clinical usability evidence [18,19,23,24]. Therefore, this study aimed at standardizing, in the Italian population, a telephone-based version of the FAB (t-FAB), also exploring its clinical usability in different neurological diseases.

Participants
The normative sample comprised 346 Italian healthy participants (HPs) from different regions of Italy. Inclusion criteria were: (1) negative anamnesis for neurological/psychiatric disorders, not being (2) under active psychotropic medications or (3) in uncompensated/severe metabolic/internal conditions, (4) not having system/organ failures, (5) presenting with uncorrected hearing deficits. Medical history for all participants was collected through a semi-structured interview. HPs were recruited via the authors' personal acquaintances and advertising at the University of Milano-Bicocca and the University of Padova.
A total of 40 participants affected by neurological diseases were consecutively recruited at two neuropsychology services in Northern Italy. Inclusion criteria for clinical groups were a neurologist-posed clinical diagnosis of a given target disease based on current diagnostic criteria and supported by neurological, neuroradiological and neuropsychological examinations. Ten participants with ischemic/hemorrhagic stroke presenting with unilateral, cortical/subcortical hemispheric lesion (5 right and 5 left brain-damaged) were recruited at the neuropsychology service of IRCCS Istituto Auxologico Italiano, Milano. Five participants affected with hypokinetic extra-pyramidal disorders (three with Parkinson's disease [25], and three with atypical parkinsonisms [26]), 20 with small vessel disease (SVD) [27] and 4 with a mixed, atrophic-vascular dementia (MD) [28] were recruited at the neuropsychology service of Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milano. The full selection process of participants with neurological conditions is shown as follows ("Results", paragraph "Standardization").

Materials
The t-FAB was adapted from Aiello et al.'s [24] version of the FAB. The in-person FAB total score ranges from 0 to 18 and it comprises 3 subtests (each ranging from 0 to 6), each including two tasks: Conceptualization, Mental flexibility (language-mediated executive functions, FAB-1), Motor programming and Sensitivity to interference (motor-mediated executive functions, FAB-2) and Inhibitory control and Environmental autonomy (inhibition; FAB-3) [24]. In the t-FAB, Conceptualization and Mental flexibility tasks were the same of the in-person version, whereas Programming and Environmental autonomy tasks were removed. Sensitivity to interference and Inhibitory control tasks were modified to adapt to the telephonic administration (i.e., for participants' responses to be audible by the examiner through the telephone). Two versions of both Sensitivity to interference and Inhibitory control tasks, different as to their response modality, were developed: the first requiring the examinee to provide a motor response (t-FAB-M; e.g., "tap once on the table when I tap twice"), the second to provide a verbal one (t-FAB-V; e.g., "say «one» when I say «two»"). In case of lateralized sensorimotor deficits, participants with neurological diseases were asked to use their unaffected hand. Both t-FAB-M and t-FAB-V versions of Sensitivity to interference and Inhibitory control tasks were administered to all participants (order: FAB-M first). In addition, each t-FAB is divided into two subtests: t-FAB-1 (Conceptualization plus Mental flexibility) and t-FAB-2-M/-V (Sensitivity to interference plus Inhibitory control). The total t-FAB score of both versions ranged 0-12.
The full t-FAB protocol is provided in Supplementary Material 1. All the above adaptations were jointly conceptualized and operationalized by an author board that comprised a neurologist (IA), three neuropsychologists (SZ, SM, NB), three neuroscientists (ENA, VP, LD) and two physiatrists expert in psychometrics (LT, SS), all with extensive expertise in the field of cognitive test standardization. No disagreement emerged in this first phase, and the protocol was subsequently approved without reservation by the remaining authors [30,31].
All the above HP subsamples were randomly selected through random number tables from the whole normative sample.
Participants with neurological diseases also underwent the in-person FAB, delivered 7-14 days before t-FAB administration.
Before telephone-based testing, all participants underwent a detailed sound-check to ensure a good quality of the call [15].
All tests were administered to both HPs and individuals with neurological diseases by licensed psychologists with thorough expertise in neuropsychological assessment and/or trainees neuropsychologists (ANP, ADL, GS, and TD). Testers underwent a thorough administration training performed by the corresponding author (ENA). Statistical analyses were performed by four neuroscientists with thorough expertise in cognitive data analyses (ENA, VP, LD, and ANP) and approved by the whole author panel.
Either Pearson's or Spearman's technique was adopted to test for construct validity in HPs based on data distributing normally or not, respectively (i.e., skewness and kurtosis values <|1| and |3|, respectively) [34]. Test-retest and interrater reliability in HPs were assessed through intra-class correlations. Factorial structure in HPs was explored through a principal component analysis (PCA) by entering each task separately for the t-FAB-M and t-FAB-V.
According to the recommendations by Hobart et al. [35], which relate to health measurement studies for clinical neurology, the minimum sample size for all the aforementioned reliability and validity analyses in HPs were set at N = 20 and N = 80, respectively, except for the PCA, for which a N = 100 was deemed as sufficient according to guidelines delivered by Kyriazos [36].
Equivalence between the t-FAB-M and t-FAB-V was assessed in HPs through a two one-sided test procedure (TOST) for dependent samples [36]. Accordingly, a between-mean effect size is regarded as equivalent to 0 if falling within the upper and lower equivalence bounds. In such a case, both equivalence bounds will yield a p < 0.05. Sample size estimation for this procedure, performed via the R package TOSTER (https:// cran.r-proje ct. org/ web/ packa ges/ TOSTER/ TOSTER. pdf) [37], yielded a minimum of N = 52 observations for detecting equivalence with a 95% power, α = 0.05 and upper and lower equivalence bounds of -0.5 and 0.5, respectively.
Convergence between the t-FAB-M/-V and its in-person version was tested in both HPs and clinical groups via correlational techniques. Sample size estimation for such analyses, as performed via the R package pwr (https:// CRAN.Rproje ct. org/ packa ge= pwr), yielded a minimum N of 20 by addressing a one-tailed hypothesis testing (i.e., positive correlation) at α = 0.05, an expected medium-to-large effect size ρ = 0.5 and an 80% power.
Norms were derived through the Equivalent Score (ES) method [38,39]; outer and inner tolerance limits (TLs), as well as ES thresholds, were computed on ranked scores adjusted for significant anagraphic-demographic confounders via a stepwise regression procedure. The quasi-continuous ES scale allows for clinical judgments as follows: ES = 0 (adjusted scores lower than the outer TL) indexing an "impaired" performance; ES = 1, which indicates a "borderline" score; ES = 2-3-4 indicate a "normal" score. To derive norms, consistently with previous studies [2] and by adopting the R package pwr (https:// CRAN.Rproje ct. org/ packa ge= pwr), a minimum sample size of 287 participants was deemed as adequate, by addressing a smallto-medium effect size (f 2 = 0.05), with a 95% power and α = 0.05, within a multiple regression model (df numerator = 3).
Diagnostic accuracy was tested via receiver-operating characteristics (ROC) analyses by addressing t-FAB scores of the whole clinical group (stroke, EPD, SVD, MD) against the whole normative sample. For this single-test ROC analysis, the minimum sample sizes for the normative and clinical groups were estimated, according to Obuchowski [40], by means of easyROC (http:// www. bioso ft. hacet tepe. edu. tr/ easyR OC/), at N = 190 and N = 19, respectively, a case-control allocation ratio of 10 and to detect an AUC = 0.7 with a power of 90% and α = 0.05.
Diagnostic accuracy was further explored for descriptive purposes (due to the small sample sizes addressed) by comparing (1) each clinical subgroups against the normative sample and (2), within the whole clinical cohort, t-FAB scores of participants performing defectively (i.e., ES = 0/1) on the in-person FAB against those performing within the normal range (i.e., ES ≥ 2). As to ROC analyses addressing the SVD group, due to the high heterogeneity in cognitive status as assessed by the MMSE, participants were subdivided into those obtaining an MMSE ES ≤ 1 (impaired/borderline performance; N = 6) and those obtaining an MMSE ES > 1 (normal performance; N = 14) [41].

Standardization
Normative sample stratification is shown in Table 1. HPs' background and cognitive profiles are summarized in Table 2. Ceiling effects at the t-FAB (i.e., a score of 12/12) were detected in 35.8% (t-FAB-M) and 39.3% (t-FAB-V) of HPs. Acceptability rate was of 100%. Construct validity data with the other telephone-based measures are displayed in Table 3. Each t-FAB score converged with the vast majority of the other telephonebased test scores, the highest associations being found with PVF, SVF and AVF scores. t-FAB total and subtests scores were all internally related at α adjusted = 0.017 (t-FAB-M: 0.19 ≤ r s (346) ≤ 0.9, p < 0.001; t-FAB-V: 0.15 ≤ r s (346) ≤ 0.84, p ≤ 0.005).
Inter-rater and test-retest reliability were high for both t- Education predicted all t-FAB scores (p < 0.001), whereas age only t-FAB-M and t-FAB-2-M scores (p ≤ 0.004). Selected correction factors, TLs and ES thresholds are shown in Table 4. An automated correction sheet is provided in the Supplementary Material 2.

Discussion
The present study provides Italian clinicians and researchers with a statistically sound standardization of a TBCS test for frontal-executive deficits, namely the t-FAB, along with evidence of its clinical usability in neurological populations. This study is unprecedented not only within the Italian literature but also within the international ones, and it enriches the range of standardized TBCS instruments already available in Italy [2,15]. As far as to its standardization in HPs, the t-FAB comes with regression-based norms, as well as with evidence on convergent validity, internal validity among its scores, a solid underlying factorial structure, high inter-rater and test-retest reliability and invariance as compared to its inperson version. With respect to its clinical usability, the present preliminary evidence suggests that the t-FAB is able to accurately discriminate neurological cases from healthy controls, identifying frontal-executive disturbances in neurological individuals as the in-person FAB does.
The t-FAB comes with two administration versions as to Sensitivity to interference and Inhibitory control tasks, one requiring motor and the other verbal responses (t-FAB-M and t-FAB-V, respectively). Although HPs reported lower scores on the t-FAB-M version as compared to the t-FAB-V, both versions showed statistical equivalence. Moreover, norms have been provided separately for each version, this allowing a flexible administration of the two versions according to the participant's clinical features. For instance, the t-FAB-M is more appropriate for individuals with motor speech disorders (e.g., dysarthria/anarthria) but spared hand movements, and vice versa with respect to the t-FAB-V. For the same reasons, the two t-FAB versions could be administered as parallel versions when performing follow-ups.
The slight discrepancies in norms between the t-FAB-M and t-FAB-V are likely due to the fact that inhibition of motor responses is more demanding than that of verbal responses [42]. This proposal is supported by the finding that age negatively predicts the t-FAB-M scores only, in accordance with the notion of motor inhibition being more affected than verbal inhibition with advancing age [43]. Notably, the influence of response modalities on inhibitory-related tasks has been proposed to be negligible in clinical populations, where inhibitory process impairment is likely widespread [44]. This further supports the adoption of the t-FAB as a valid screener of frontal-executive functioning in different neurological diseases. In fact, findings herewith reported as to the diagnostic accuracy of the two t-FAB versions show that the t-FAB-M and t-FAB-V are both able to identify frontal-executive dysfunctions in different neurological populations.
There are some limits of the present study that need to be acknowledged. First, the clinical usability of the t-FAB needs to be further verified in larger neurological samples, in particular focusing on diseases with predominant frontal involvement (e.g., motor neuron disease-frontotemporal degeneration spectrum disorders, acquired focal damages of frontal areas and networks due to traumatic, neoplastic, demyelinating, metabolic or infectious etiologies). Such an issue has to be particularly underlined as to those ROC analyses performed on clinical subsamples (Table 6), which should be treated as purely descriptive and preliminary. Second, further in-depth investigations are needed to assess the diagnostic properties, sensitivity to disease severity, responsiveness and reliable change of the t-FAB in neurological and geriatric cohorts [13]. Finally, with respect to the present stroke cohort, the discrepancies found in diagnostic accuracy between the t-FAB-M and t-FAB-V based on lesion side cannot be properly explored due to the small sample size of individuals presenting with either right-or left-lateralized damaged (N = 5 each).
In conclusion, the present work offers the first telephonebased version of FAB, a normed, valid, reliable and clinically usable TBCS test for screening frontal-executive functioning, encouraging its adoption in both clinical practice and research settings in neurology and geriatrics.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.