Introduction

Mild cognitive impairment (MCI) often represents a transitional period between normal cognitive aging and early dementia1. Individuals meeting the criteria for MCI have a slight but noticeable and measurable decline in one or multiple cognitive domains, but these deficits are not severe enough to significantly interfere with daily life activities and warrant a diagnosis of dementia2,3,4,5. A person with MCI has an increased risk of developing Alzheimer's disease (AD) or another form of dementia6,7: the rate of yearly progression to dementia in MCI subjects, particularly in the amnestic subtype of MCI (aMCI), is higher than in non-MCI older adults4,8. Therefore, emphasis has often been placed on memory deficits. However, recent studies observe that VF (Verbal Fluency) may be an accurate and efficient tool to discriminate between normal aging, a-MCI, and AD9. Furthermore, some findings show that the VF task is predictive of incident cognitive impairment and the transition to dementia10,11. The predictive accuracy of fluency tests can vary depending on the length of the follow-up. This indicates that its predictive value changes with the disease stage, unlike other tests, such as verbal memory, for which the predictive accuracy remains high and stable over12, consistent with other findings that demonstrate the total immediate recall score on the Selective Reminding Test and the Digit Symbol Test coding to be a stronger predictor of conversion over time within a comprehensive neuropsychological battery13. Therefore, the analysis of verbal fluency skills in preclinical forms of dementia, such as AD, is crucial both diagnostically and predictively.

VF is defined as the ability to produce words under specific constraints within a fixed time interval14. Usually, in clinical and experimental assessments, neuropsychologists use phonemic fluency, which requires the ability to rapidly name words that begin with a specific letter, and semantic fluency, which requires the capacity to produce words that belong to a specific category. The performance on the VF test is measured by counting the number of correct words said by the patient in one minute.

Both fluency tasks require strategic search and information retrieval from semantic memory: semantic fluency requires a constrained search of words from a superordinate category, whereas phonemic fluency may be accomplished with a less constrained search from a larger set of lexical exemplars15.

There is evidence in the literature that the verbal fluency task is multifactorial, and the total number of words is not enough to capture all aspects of performance, identify the underlying deficit, and facilitate diagnostic procedures16,17.

Qualitative aspects of lexical retrieval can offer additional insights that guide performance on VF tasks18. According to Troyer and colleagues, clustering and switching are the two main strategies of VF used to search, organize, and retrieve16. In particular, clustering is an executive-linguistic subprocess related to semantic verbal memory, word storage, and refers to the generation of words within a given subcategory19; on the other hand, switching refers to the ability to change from one subcategory to another and is related to cognitive flexibility and word process search19,20. Both components involve complex cognitive processes such as cognitive flexibility, and they are equally relevant for adequate performance on VF tasks21.

Some studies have investigated the effect of age, gender, and educational level on these qualitative components. The age effect is not uniform, considering that different categories have different decreasing tendencies22, with young adults often producing more words and switching more frequently than older subjects23.

Regarding the effect of gender, according to some studies, males seem to have higher scores on the semantic VF test22 and produce larger clusters, whereas females generate a greater number of switches23. However, not all studies have replicated these results. In fact, other studies found no effects of gender on both verbal fluency tasks24,25. There was also a significant difference between groups with different levels of education: people with higher education had a larger mean cluster size on the PVF task and switched more often in category fluency19,22,24. Cognitive reserve is associated in some research with better performance on the verbal fluency test26,27. Overall, the prognostic role of this analysis is not yet clear.

Moreover, some studies have investigated various aspects of the VF task to distinguish patients with different levels of cognitive impairment, such as aMCI or AD patients, from healthy older adults with no cognitive decline28,29,30,31. Some studies showed that switching and clustering capacity in letter and semantic fluency could be important to study in MCI subjects also to predict the conversion to dementia. The word lists generated by the fluency test are a rich source of information to foresee outcomes in MCI32. In particular, the shifting abilities could be involved in the early decline in semantic fluency performance in the prodromic phase of AD33 and clustering could be used to estimate dementia risk in cognitively normal individuals, especially when looking at the size of semantic clusters34.

Considering these observations, the qualitative analysis of verbal fluency tests could allow the examiner to detect some executive dysfunctions in prodromal disease conditions, but further research is needed to assess the applicability of such analysis in different categories and its predictive value. The aim of this paper is to examine the cluster size, the number of clusters, and switches, in addition to the number of correct words, in two VF tasks, phonemic and semantic fluency tests, with six subcategories, to determine which of these measures can sensitively distinguish the aMCI-converter group from the aMCI-no converter group to dementia.

Methods

Participants

Subjects were recruited from the general population at the Center for Cognitive Disorders and Dementia in Pisa. The eligible population for the study included older individuals aged between 65 and 80 with at least five years of education and with Mild Cognitive Impairment (MCI) in accordance with the recommendations of the National Institute on Aging and the Alzheimer's Association on diagnostic guidelines for preclinical stages2,3; specifically, only amnestic single-domain MCI individuals with normal performance in other cognitive domains were included in the study. Exclusion criteria comprised neurological pathologies, such as dementia, epilepsy, advanced neoplasia, recent cranial trauma, drug addiction, clinical evidence of depression, or other psychiatric disorders. The subjects provided written informed consent for participation, along with information on privacy and the treatment of sensitive data. The study protocol received approval from the Regional Ethical Committee for Clinical Experimentation (Comitato Etico di Area Vasta Nord Ovest—CEAVNO) for the publication of aggregated and anonymously reviewed data collected from medical records without an experimental procedure. The study adhered to the ethical guidelines of the 1975 Declaration of Helsinki.

Cognitive assessment

We administered a comprehensive battery of neuropsychological tasks to assess performance in various cognitive domains. In particular, cognitive evaluation was conducted using the following tests: Mini Mental State Examination (MMSE)35, widely used for screening cognitive impairment and estimating his severity as well as the need for further evaluation; short term memory was evaluated with Babcock Short Story (BSS)36 immediate recall (IR, immediate repetition after reading by the experimenter), Rey Auditory Verbal Learning Task (RAVLT)37 immediate recall (IR, immediate repetition after reading by the experimenter), the Rey-Osterrieth Complex Figure Test (ROCF)36 immediate recall (IR, 1 min after copy), digit span forward38 and Corsi block-tapping test forward38; retrospective memory was assessed using Babcock Short Story (BSS)36 delayed recall (DR, 20 min after the reading in the immediate recall), Rey Auditory Verbal Learning Task (RAVLT)37 delayed recall (DR, 15 min after the last reading in the immediate recall) and the Rey-Osterrieth Complex Figure Test (ROCF)36, delayed recall (DR, 20 min after the copy); Phonemic Verbal Fluency (PVF)37, Semantic Verbal Fluency (SVF)39 and Attentional Matrices (AM)40 were used to assess executive functions and attention; visuo-spatial abilities and constructional praxis was measured with the copy of Rey-Osterrieth Complex Figure Test (ROCF)36. The same neuropsychological assessment was administered after 18 months to determine the clinical conversion. All subjects were native Italian speakers, not bilingual, and were tested in their primary language, which is Italian.

Verbal fluency qualitative analysis

Verbal fluency is a cognitive function that facilitates information retrieval from memory. Successful retrieval requires executive control over cognitive processes such as selective attention, mental set shifting, internal response generation, and self-monitoring. Tests of verbal fluency evaluate an individual's ability to retrieve as many words as possible from a given category, either semantic or phonemic, within a given time frame (typically 60 s). In these tasks, participants were asked to name as many words as possible that begin with the letters “f”, “a’’, and “s” for PVF37, and words that belong to the categories “car brands”, “fruits”, and “animals” for SVF39. These are the three standard categories used to measure performance in the semantic fluency task for the Italian population. Participants were instructed to avoid using proper nouns (e.g., Roma, Roberto) and words that begin with the same sound but have different suffixes (e.g., love, lover, loving); errors of this nature were excluded from the total count of correct words.

In this paper, we analyzed several qualitative aspects of verbal production, including: number of switches, this represents the number of transitions between clustered or non-clustered words, including single words; number of clusters, this measures the number of multiword strings, with each cluster containing at least two successive words; mean cluster size, this is calculated as the total cluster size divided by the number of clusters. For each task, we also calculated the total number of words generated with errors and repetitions.

The qualitative aspects of verbal production for letter word fluency were scored using the exact system described by Ledoux and colleagues18. We utilized their letter and category scoring guidelines and customized the semantic categories “car brands”, “fruits” and “animals” to align with Italian culture and common usage. Refer to Appendix A for the adjusted subcategories for “car brands”, “fruits” and “animals”, which include frequently generated examples for each category (Supplementary file). Please note that these listings are not exhaustive; for example, we did not consider possible categorizations such as “electric cars” versus “fuel cars”, as brands that exclusively produce electric cars like “Tesla” were not mentioned. The clustering analysis for categories was constructed based on the country of origin according to the analyzed verbal production.

Verbal productions were scored by the same rater, and an additional trained rater, using the guidelines presented in Table 1 by Ledoux and colleagues18.

Table 1 Baseline characteristics of aMCI subjects; data are expressed as frequency (%) or mean (sd).

Statistical analysis

Continuous data were described by mean and standard deviation. To compare the group variable (aMCI-no converter, aMCI-converter) with continuous factors, we applied a two tailed t-test for independent samples. Additionally, ROC analysis was performed to identify the best cut-off point for the factors that resulted significant. Sensitivity and specificity were also calculated. Furthermore, we examined the influence of age, gender and education on the significant clinical outcomes (SVF number cluster, PVF number cluster and switch) through a multiple linear regression analysis. A significance level of 0.05 was used, and all analyses were conducted with SPSS v.28 technology.

Results

In this study, we included 61 individuals with amnesic MCI unidomain and analyzed demographic data (Table 1), neuropsychological tests results (Table 2), and qualitative data from VF (Table 3).

Table 2 Cognitive tests at baseline of aMCI subjects; data are expressed as mean (sd)
Table 3 Verbal Fluency Qualitative Analysis at baseline of aMCI subjects; data are expressed as mean (SD).

After 18 months from baseline, the subjects underwent a second neuropsychological assessment to evaluate changes in cognitive status. Specifically, 14 individuals with aMCI progressed to a diagnosis of dementia, termed aMCI-converter (MCI-C), while 53 aMCI subjects maintained a stable diagnosis, referred to as aMCI-no converter (MCI-NC).

In detail, of the 14 subjects, who progressed to a diagnosis of dementia, approximately 86% (n = 12) received a diagnosis of Alzheimer's disease based on recommendations from the National Institute on Aging-Alzheimer's Association workgroups41, one subject received a diagnosis of Behavioral Variant of Frontotemporal Dementia (bvFTD)42, and one of vascular dementia (VaD)43.

MCI-C and MCI-NC were similar in terms of age, education, and gender distribution (p > 0.05 for all p-values; Table 1).

Both groups displayed a similar neuropsychological profile, except for the MMSE test and the Rey Auditory Verbal Learning Task, in which MCI-NC had a better performance than MCI-C (p < 0.05 for both p-values; Table 2). Additionally, MCI-NC demonstrated a trend toward generating more words in both verbal fluency tests (p = 0.054 for PVF; p = 0.052 for SVF, Table 2).

Furthermore, the two groups showed substantial differences in the qualitative analysis of VF tests. In both fluency tests, PVF and SVF, we found a statistical difference in the number of clusters; in particular, MCI-NC generated more clusters during the tasks (p < 0.05 for all p-values; Table 3). In PVF, we observed a statistical difference in the number of switches, and in SVF, MCI-NC showed a trend toward producing more switches (p = 0.022 and p = 0.065, respectively; Table 3).

ROC analysis was conducted for each significant clustering and switching factor to determine the optimal cut-off value and provide information about the most sensitive and specific neuropsychological tools for predicting the development of dementia. In the PVF test, the number cluster score had the best area under the curve (AUC value = 0.80), also in comparison to other tests such as MMSE, RAVLT-IR and RAVLT-DR (AUC value = 0.788; 0.650; 0.668, respectively). The significant threshold for the subscores of the "PVF number cluster" in identifying MCI-C at baseline was 5.510. The "SVF number cluster" provided high sensitivity of 86%, and the "PVF number cluster" had the highest specificity of 74%. Sensitivity and specificity values are also presented in Table 4.

Table 4 ROC analysis was performed for each significant variable to determine the optimal cut-off value.We also calculated AUC, sensitivity and specificity to determine which variables were accurate to predict the clinical diagnosis.

We conducted a multiple linear regression analysis to examine factors influencing the significant dependent variables: SVF number cluster, PVF number cluster and switch. In particular, years of education significantly influenced all three dependent variables under examination (p < 0.05 for all p-values; Table 5).

Table 5 Multiple linear regression of the factors influencing several dependent variables.

Discussion

Tests of phonemic and semantic verbal fluency are commonly used in the assessments of individuals with memory complaints and in the clinical diagnosis of Alzheimer's disease (AD) and other types of dementia. Qualitative aspects of lexical retrieval can provide additional insights into the cognitive processes that underlie performance on verbal fluency tasks18,44. The mere counting of words generated is not a sufficient distinction between dementia and other cognitive impairments16.

According to Troyer and colleagues29, clustering and switching are the two main strategies used in verbal fluency tests to search for, organize, and retry words. The objective of our current study was to determine if assessing switching and clustering behavior in verbal fluency tests could yield measurements associated with the risk of developing dementia. Specifically, we aimed to identify potential neuropsychological markers for conversion from amnestic MCI status, where fluency abilities and performance are not clinically impaired, to dementia over an 18-month follow-up period.

At baseline, the Mini Mental State Examination Test (MMSE) and, to a lesser extent, the Rey Auditory Verbal Learning Test (RAVLT), were the only tests that appeared to predict a change in diagnosis. Specifically, the aMCI-converter group (MCI-C) had lower scores than the aMCI-no converter group (MCI-NC) in the MMSE and the RAVLT at both the immediate and delayed recall scores. These findings are in line with existing literature, as the MMSE is a useful tool for screening cognitive decline, and the RAVLT is a powerful test for detecting changes in episodic memory, which are characteristic of aMCI45.

The investigation of VF in preclinical forms of dementia, such as AD, is crucial for both diagnostic and predictive purposes. Changes in VF have been observed in individuals years before the clinical diagnosis of AD46, and patients with AD typically perform poorly on fluency tests compared to older adults without cognitive decline47,48. Moreover, a progressive decline in the number of words generated during fluency tasks has been observed in older populations in both tasks49,50 particularly during the transition from middle-age to early elderly51.

Verbal fluency's global score, which measures speeded access to lexical and semantic information, has limitations in providing insight into crucial aspects of a subject's performance and underlying cognitive mechanisms, such as cognitive flexibility, search strategy, executive function, working memory, processing speed, and verbal ability. Consequently, some studies have focused on analyzing performance based on clustering and switching. For example, one study found that the number of switches more sensitively distinguished MCI from a control group than the number of correct words20. Therefore, qualitative aspects of fluency performances like clustering and switching have been explored16.

The results showed that at baseline, aMCI subjects who later converted to dementia produced a lower number of clusters in both VF tasks compared to those who did not convert to dementia. These findings are consistent with previous research that found fewer clusters in MCI subjects compared to a control group28,30. MCI-C subjects produced fewer number of clusters, likely because they had deficits in strategic research or the ability to initiate a search for new strategies or subcategories. VF tasks activate common cognitive processes at various levels that are part of a distributed system involved in verbal recall16,52. Both phonemic and semantic tasks are associated with frontal structures, but the latter also depends on temporal systems53,54.

We also observed a difference in the number of switches in the PVF task and a trend in SVF, possibly due to the smaller number of MCI subjects who converted to dementia.

This is in line with previous studies. For example, Weakley and colleagues31 found that multidomain aMCI subjects scored worse in total words and switching production compared with healthy controls on both fluency tasks. Raoux and colleagues33 also discovered that there is a significantly lower switching index in future AD subjects, even five years before the dementia diagnosis. Since switching is a measure of cognitive flexibility and involves the ability to access novel information in the semantic memory system29,55, difficulties with these processes could contribute to the progression to dementia.

Additionally, we did not find any differences in mean cluster sizes both in PVF and SVF tasks. Some research in fact has shown that AD subjects form smaller clusters than control groups28,29, although the mean cluster size may not effectively discriminate future AD subjects15,33.

The worse performance in VF at baseline among the MCI group that progressed to mild dementia in the follow-up suggests that phonemic and semantic processes (search and retrieval of information from memory) might be affected in the early stages. These changes could characterize the transition to full-blown disease56. In particular, the tendency of subjects who later developed dementia to form smaller clusters could support the hypothesis of disorganization of lexical representations57.

Educational level played a significant role in performance, underscoring the importance of considering sociocultural aspects in neuropsychological assessments21. Multiple linear regression of demographic variables highlighted that the years of education influenced the qualitative aspects of verbal production, particularly the number of clusters and switches in both VF tasks. This is significant because education is one of the most important predictors of dementia. Higher education is associated with a reduced risk of developing dementia and provides a cognitive reserve against the effects of aging and disease on brain function58. Additionally, subjects with higher education appeared to be more flexible in their search for new clusters, a strategy associated with educational stimulation, cognitive flexibility, and cognitive reserve (CR) capacity21. This study did not provide evidence of age and gender differences in the strategies used during VF tests.

This manuscript underlines the diagnostic and predictive importance of investigation of verbal fluency in MCI. In particular, the results highlight that a detailed analysis of verbal fluency may be helpful to distinguish patients with mild cognitive impairment who are at risk of developing a form of dementia in the near future. To the best of our knowledge, no studies have investigated two verbal fluency tasks, phonemic and semantic fluency tests, with six subcategories, to determine which of these measures significantly distinguishes patients who convert to a clinical diagnosis of dementia after 18 months. The present study had some limitations that need to be addressed in future research. First, the sample size should be increased to enhance the strength of the analysis and this would be our first aim in the future. Additionally, a longer follow-up period, exceeding 18 months, would provide more accurate results. Future studies could benefit from longitudinal designs that include measurements of brain pathology to investigate the relationship between VF and neurodegeneration.

In conclusion, our findings emphasize the value of conducting a detailed and qualitative analysis of neuropsychological tests, with a specific focus on verbal fluency. This approach can offer valuable insights into early cognitive changes and assist in predicting the progression from amnestic Mild Cognitive Impairment to dementia (Supplementary file). Such investigations are crucial for developing interventions that can effectively delay the onset of dementia.

Moreover, analyzing the performance of verbal fluency tasks in the MCI group provides a deeper understanding of not only the cognitive characteristics of MCI but also the cognitive mechanisms involved in the progression to dementia. This suggests that individual fluency tasks can reveal specific cognitive processes.

Overall, our results highlight the importance of including multiple verbal fluency tests in assessment batteries for preclinical dementia populations. Additionally, the establishment of optimal thresholds for assessing the number of clusters in both VF tasks through ROC analysis, combined with the versatility, data richness, low cost, and quick administration of verbal fluency tests, has the potential to serve as a rapid and valuable tool in neuropsychological assessments as a screening instrument for early detection of neurodegenerative diseases.