Longitudinal changes in qualitative aspects of semantic fluency in presymptomatic and prodromal genetic frontotemporal dementia

Background The semantic fluency test is one of the most widely used neuropsychological tests in dementia diagnosis. Research utilizing the qualitative, psycholinguistic information embedded in its output is currently underexplored in presymptomatic and prodromal genetic FTD. Methods Presymptomatic MAPT (n = 20) and GRN (n = 43) mutation carriers, and controls (n = 55) underwent up to 6 years of neuropsychological assessment, including the semantic fluency test. Ten mutation carriers became symptomatic (phenoconverters). Total score and five qualitative fluency measures (lexical frequency, age of acquisition, number of clusters, cluster size, number of switches) were calculated. We used multilevel linear regression modeling to investigate longitudinal decline. We assessed the co-correlation of the qualitative measures at each time point with principal component analysis. We explored associations with cognitive decline and grey matter atrophy using partial correlations, and investigated classification abilities using binary logistic regression. Results The interrater reliability of the qualitative measures was good (ICC = 0.75–0.90). There was strong co-correlation between lexical frequency and age of acquisition, and between clustering and switching. At least 4 years pre-phenoconversion, GRN phenoconverters had fewer but larger clusters (p < 0.001), and fewer switches (p = 0.004), correlating with lower executive function (r = 0.87–0.98). Fewer switches was predictive of phenoconversion, correctly classifying 90.3%. Starting at least 4 years pre-phenoconversion, MAPT phenoconverters demonstrated an increase in lexical frequency (p = 0.009) and a decline in age of acquisition (p = 0.034), correlating with lower semantic processing (r = 0.90). Smaller cluster size was predictive of phenoconversion, correctly classifying 89.3%. Increase in lexical frequency and decline in age of acquisition were associated with grey matter volume loss of predominantly temporal areas, while decline in the number of clusters, cluster size, and switches correlated with grey matter volume loss of predominantly frontal areas. Conclusions Qualitative aspects of semantic fluency could give insight into the underlying mechanisms as to why the “traditional” total score declines in the different FTD mutations. However, the qualitative measures currently demonstrate more fluctuation than the total score, the measure that seems to most reliably deteriorate with time. Replication in a larger sample of FTD phenoconverters is warranted to identify if qualitative measures could be sensitive cognitive biomarkers to identify and track mutation carriers converting to the symptomatic stage of FTD.


Introduction
Frontotemporal dementia (FTD) is a clinically and pathologically heterogeneous type of early-onset dementia, typically characterized by atrophy of the frontal and/or temporal lobes [1].The clinical profile of FTD shows behavioural and language disturbances, with cognitive deficits in executive function and relative sparing of memory and visuospatial abilities [2].Up to 40% of FTD cases have an autosomal dominant pattern of inheritance.Mutations in the progranulin (GRN), microtubule-associated protein tau (MAPT) and chromosome 9 open reading frame 72 (C9orf72) genes are the most common causes [3].Early diagnosis-albeit difficult due to the heterogeneous symptoms and overlap with other forms of dementia and psychiatric disordersis essential for proper patient management and planning, Extended author information available on the last page of the article non-pharmacological treatment, and patient stratification in upcoming disease-modifying clinical trials [4].
Research in the genetic FTD field has been increasingly moving towards the presymptomatic and early prodromal stages, as the critical time-window for treatment most likely lies prior to overt symptom onset.With promising avenues opening for clinical trials, identifying robust biomarkers is of utmost importance [5].Previous neuropsychological studies showed that subtle cognitive deficits and decline are present in the presymptomatic stage, and gene-specific cognitive profiles can be detected [6][7][8].These findings suggest that neuropsychological assessment in the presymptomatic and early prodromal stages can provide sensitive cognitive markers for FTD.
The semantic fluency test is one of the most widely used tests in neuropsychological assessments.In this brief, easyto-apply test, people have to generate items from a particular semantic category (e.g., animals, foods) in 1 min [9].The semantic fluency test presents high sensitivity and specificity for dementia diagnosis [9], with impaired performance found in both symptomatic [10] and presymptomatic FTD [7,8].Although the total number of items generated is commonly used to quantify test performance, qualitative, psycholinguistic information embedded in the output can also be investigated, including clusters (number of multiword strings 1 ), switches (number of transitions between clusters), age of acquisition (AoA; the age at which a word is learned), and lexical frequency (LF; how often a word occurs in daily language) [12,13].Previous research demonstrated the prognostic value of qualitative fluency measures in cognitively healthy subjects at-risk for and in conversion from prodromal to overt Alzheimer's Dementia (AD) [14,15].This approach has been underexplored in presymptomatic and/or prodromal genetic FTD, while this psycholinguistic information may be able to detect the subtle development of FTD's characteristic language symptoms at an early stage.
The aim of this study was therefore to investigate longitudinal changes in five qualitative aspects of semantic fluency (i.e., number of clusters and switches, cluster size, AoA, and LF) in mutation carriers that developed FTD (phenoconverters), presymptomatic mutation carriers, and non-carriers from autosomal dominant GRN-and MAPT-FTD families.We were specifically interested in the inflection point (i.e., when in the disease trajectory) at which the qualitative measures start to deviate from normal.Additionally, we explored the co-correlation between the qualitative measures, and their associations with cognitive decline and grey matter (GM) volume loss, and the prognostic value of decline in qualitative measures in predicting symptomatic onset.

Participants
We included longitudinal data of 118 participants from the FTD Risk Cohort of the Erasmus MC University Medical Center (Rotterdam, the Netherlands).This is an ongoing study in which first-degree family members of FTD patients due to a pathogenic mutation are followed on a 1-or 2-year basis [16].Participants were recruited between February 2010 and October 2019.DNA genotyping at study entry assigned participants to the mutation carrier (n = 63; MAPT n = 20, GRN n = 43) or non-carrier group (controls; n = 55).Upon study entry, all mutation carriers were presymptomatic according to clinical diagnostic criteria for FTD [2,17,18], and had global CDR ® -plus-NACC-FTLD [19] scores of 0. Ten mutation carriers (MAPT n = 6, GRN n = 4) developed symptoms during follow-up (phenoconverters).Diagnoses were made in multidisciplinary consensus meetings, using information from the standardized clinical assessment (see below).Phenoconverters met the following criteria: [i] progressive deterioration of behaviour, language and/or motor functioning; [ii] functional decline, evidenced by multiple study visits with global CDR ® -plus-NACC-FTLD [19] ≥ 0.5 without reversing back to 0; and [iii] cognitive decline, evidenced by ≥ 1.5 SD below age-, sex-and education-specific means in ≥ 1 domain on neuropsychological assessment.Eight phenoconverters had clinical features of bvFTD (MAPT n = 6, GRN n = 2), and two had non-fluent variant PPA (GRN n = 2).The presymptomatic mutation carriers that did not develop FTD symptoms are referred to as nonconverters (n = 53).

Clinical assessment
Every 1-2 years, all participants underwent a standardized clinical assessment, consisting of a structured interview with the participant and a knowledgeable informant (incorporating the CDR ® -plus-NACC-FTLD [19]), medical history taking, neurological examination, neuropsychological assessment, and brain MRI.The neuropsychological assessment consisted of cognitive screening tests (Mini-Mental State Examination, MMSE, and Frontal Assessment Battery, FAB), and tests within the major cognitive domains.Neuropsychiatric symptoms were assessed with the Beck's Depression Inventory (BDI) and the brief questionnaire form of the Neuropsychiatric Inventory (NPI-Q) (see [7,8] for full test battery).

Qualitative fluency measures
The semantic fluency task was part of the neuropsychological assessment at each study visit.In this task, participants were asked to verbally generate as many different animals as possible in 60 s [20].The total score is the number of animals produced minus errors.Additionally, we calculated five qualitative fluency measures from the output: LF, AoA, number of clusters, cluster size and number of switches (Box 1, Appendix 1 for scoring guidelines).

Box 1. The 5 qualitative measures generated from the semantic fluency output
• Lexical Frequency (LF)-a measure of how often a particular word occurs in daily language.Higher LF indicates worse performance (i.e. more use of high frequency words).Each generated word was paired with its log-transformed LF value from the SUBTLEX-NL database, so that it reflects the natural logarithmic value of how often a word occurs per one million Dutch words.The SUBTLEX-NL database was based on 44 million words from Dutch film and TV subtitles [21].For this study, the mean LF per participant per study visit was used for subsequent analysis.• Age of Acquisition (AoA)-reflects the age at which a word is typically learned based on ratings of adults.Lower AoA indicates worse performance (i.e.loss of later acquired words).Each generated word was paired with its AoA from a database containing 30,000 Dutch words [22].For this study, the mean AoA per participant per study visit was used for subsequent analysis.• Number of clusters-constitutes the number of related multiword strings (e.g., pets, birds), with a minimum of at least two successive words within a semantic subcategory [13].A lower number of clusters indicates worse performance.• Cluster size-is the sum of all clustered words.The mean cluster size per participant per study visit was used for subsequent analysis (i.e. total cluster size divided by the number of clusters).Lower cluster size indicates worse performance.• Number of switches-defined as the number of transitions between different clusters, between clustered and non-clustered words (i.e.transitions from one associative strategy to none at all), or between non-clustered and other non-clustered words [13].A lower number of switches indicates worse performance.

Study design
Phenoconverters, non-converters and controls were compared at five time points: baseline, and follow-up after 2, 4, 5 and 6 years, for the purpose of this study restructured as (Fig. 1): 1. Four years before phenoconversion-data were available for six phenoconverters.The other four developed symptoms between baseline and the first follow-up visit, therefore no data 4 years prior to phenoconversion were available.The data were compared to the baseline data of non-converters and controls.2. Two years before phenoconversion-data were available for all ten phenoconverters.The data were compared to the 2-year follow-up data of non-converters and controls.3. Phenoconversion-data were available for all ten phenoconverters.The data were compared to the 4-year follow-up data of non-converters and controls.4. One year after phenoconversion-data were available for four phenoconverters.The other six were clinically too impaired to undergo neuropsychological testing or passed away (n = 4), or converted recently, therefore no data 1 year after phenoconversion were available yet (n = 2).The data were compared to the 5-year follow-up data of non-converters and controls.5. Two years after phenoconversion-data were available for four phenoconverters.The other six were clinically too impaired to undergo neuropsychological testing or passed away (n = 5), or converted recently therefore no data 1 year after phenoconversion were available yet (n = 1).The data were compared to the 6-year follow-up data of non-converters and controls.

MRI acquisition and (pre)processing
On each study visit, we performed volumetric T1-weighted MRI scanning on a Philips 3 .First, the T1-weighted images were normalized to a template space and segmented into GM, white matter (WM) and cerebrospinal fluid (CSF), after which they were rigidly aligned.We calculated total intracranial volume (TIV) by adding GM, WM and CSF.Second, the segmentations were spatially normalized to a DARTEL template by applying the flow fields of all the individual scans.Images were smoothed using a 6 mm full width at half maximum (FWHM) isotropic Gaussian kernel.After every preprocessing step, images were visually inspected.

Statistical analysis
We performed statistical analyses using SPSS Statistics 25.0 (IBM Corp., Armonk, NY).Alpha was set at 0.05 across all comparisons (two-tailed).We compared continuous demographic data between groups using one-way ANOVAs for normally distributed data (with Bonferroni post-hoc tests), or Kruskal-Wallis tests for non-normally distributed data (with Mann-Whitney U post-hoc tests).We analysed between-group differences in sex and gene with Pearson Χ 2 tests.Interrater reliability was explored using intraclass correlation analysis.We assessed the co-correlation of the five qualitative measures at each time point by means of principal component analysis using Varimax rotation.Only factors accounting for 3% or more of variance and Eigenvalues > 1 were retained.Factor loadings were only considered meaningful when r > 0.450, and any item that did not load sufficiently onto a factor was removed [23].
For ease of interpretation, we first converted raw fluency and relevant neuropsychological measures  We included the random intercept as this model presented with a lower AIC.For converters, we calculated deltas of the standardized values for the [i] qualitative fluency measures and [ii] relevant neuropsychological tests between restructured time-point 1 in the six phenoconverters that had data 4 years before phenoconversion, or time-point 2 in the four phenoconverters that only had data 2 years before phenoconversion, and time-point 3 (phenoconversion).We then explored the association between the delta fluency measures and delta neuropsychological tests (corrected for age, sex and education) using partial correlations.Change over time maps were generated by subtracting the GM maps calculated at time-point 3 (phenoconversion) from the maps calculated at time-point 1 or 2 in SPM12.We then explored the relationship between the delta qualitative fluency measures and delta GM maps by means of multiple regression models.Age, sex, TIV, and head coil were entered as covariates.We set the statistical threshold at p < 0.05, adjusted for multiple comparisons with familywise error (FWE) correction.Lastly, to investigate classification abilities of these delta z-scores, we performed binary logistic regression analyses.Assumptions were checked (non-linearity, dependence of errors, outliers, multicollinearity).The models were selected with a forward stepwise method according to the likelihood ratio test and applying the standard p-values for variable inclusion (0.05) and exclusion (0.10).Goodness of fit was evaluated with the HL Χ 2 test, with Nagelkerke R 2 as measure of effect size.The analyses were adjusted for age, sex, education, and total number of words generated.All models were corrected for multiple comparisons (Bonferroni).

Demographics and clinical data
Demographic and clinical data are shown in Table 1.There were no differences between phenoconverters, non-converters, and controls in age [F(2,117) = 0.212, p = 0.809], sex

Co-correlation between the qualitative measures
Appendix 2 shows the results of the principal component analysis on the five qualitative measures per time-point.Based on our criteria and visual inspection of the scree plot, we could extract two components.The first component explained between 39.5 and 49.7% of the total variance, and both LF and AoA had high loadings (r > 0.900).The second factor explained between 29.0 and 43.3% of the total variance, and both the number of clusters and the number of switches, and at some time-points also the cluster size, had high loadings (r > 0.59).

Longitudinal qualitative fluency trajectories
The inflection points and longitudinal trajectories of the fluency measures are shown in Table 2.The individual trajectories in phenoconverters are displayed in Fig. 2. Phenoconverters had declining total scores from at least 4 years pre-phenoconversion (p < 0.001), while there was no decline in non-converters or controls (p > 0.05).As can be noted from Fig. 2, the qualitative measures demonstrated more noise in their fluctuation than the total score.Phenoconverters had higher LF from 4 years pre-phenoconversion in comparison to controls (p = 0.016).Also the number of clusters (p < 0.001) and switches (p = 0.004) started to decline at this point.AoA declined from phenoconversion onwards (p = 0.050).No longitudinal change was found for cluster size (p > 0.05).There was no change in qualitative fluency measures in non-converters or controls (p > 0.05).Different inflection points and longitudinal trajectories were found for GRN and MAPT phenoconverters.At least 4 years pre-phenoconversion, GRN phenoconverters started producing fewer words in comparison to controls (p = 0.005).Moreover, they started producing fewer but larger clusters (both p < 0.001), and used fewer switches (p = 0.004).GRN phenoconverters had larger cluster sizes than MAPT phenoconverters (p < 0.001).LF and AoA did not change in GRN phenoconverters compared to controls (p > 0.05).Starting at least 4 years pre-phenoconversion, MAPT phenoconverters produced fewer words than controls (p < 0.001).Moreover, LF started to increase (p = 0.007), while AoA (p = 0.034), the number of clusters (p = 0.009), and cluster size (p = 0.010) declined.The number of switches did not change (p > 0.05).

Associations with cognitive decline
Partial correlation coefficients between the fluency measures and relevant neuropsychological tests are shown in Table 3.
In phenoconverters, decline on the SAT verbal correlated with an increase in LF (p = 0.031) and a decline in AoA (p = 0.037).Worse performance on TMT-B (p = 0.031) and Stroop-III (p = 0.026) correlated with an increase in cluster size, while worse performance on Stroop-III correlated with a decline in the number of switches (p = 0.031).In GRN phenoconverters, decline on the SAT verbal correlated with a decline in AoA (p = 0.020).Worse performance on Stroop-III correlated with a decline in the number of clusters (p = 0.022), and an increase in cluster size (p = 0.014), while decline on TMT-B correlated with a decline in the number of switches (p = 0.024).In MAPT phenoconverters, decline on the SAT verbal correlated with an increase in LF (p = 0.015) and a decline in AoA (p = 0.014).

Associations with GM volume loss
The relationships between the qualitative fluency measures and GM volume loss are displayed in Fig. 3 and Appendix 3. In the total group of phenoconverters, worse performance on all five qualitative fluency measures was associated with GM volume loss in large overlapping areas spanning the frontal (e.g., middle frontal gyrus) and temporal lobes (e.g., middle and superior temporal gyrus), and the anterior insular and cingulate cortices (p FWE-corrected < 0.05).In GRN phenoconverters, an increase in LF and a decline in AoA were associated with GM volume loss of the cerebellum and predominantly temporal areas (e.g., medial temporal lobe, inferior temporal gyrus).A decline in the number of clusters, cluster size, and the number of switches correlated with  GM volume loss of the cerebellum, insular cortex, and putamen (p FWE-corrected < 0.05).In MAPT phenoconverters, an increase in LF and a decline in AoA were associated with GM volume loss of the cerebellum and predominantly temporal areas (e.g., anterior temporal pole, inferior temporal gyrus).A decline in the number of clusters, cluster size, and the number of switches correlated with GM volume loss of the cerebellum and predominantly frontal areas (e.g., middle and superior frontal gyrus, frontal pole) (p FWE-corrected < 0.05).

Discussion
This study examined longitudinal changes in qualitative aspects of the semantic fluency task in a large cohort of FTD phenoconverters, presymptomatic mutation carriers, and non-carriers from GRN-and MAPT-FTD families.Phenoconverters showed a decline in the total score from at least 4 years pre-phenoconversion, with individuallyvarying inflection points and longitudinal trajectories in qualitative fluency measures in GRN and MAPT.At least 4 years pre-phenoconversion, GRN phenoconverters started producing fewer but larger clusters, and switched less between clusters, which was correlated with executive dysfunction.A decline in switching was predictive of phenoconversion.At least 4 years pre-phenoconversion, MAPT phenoconverters demonstrated an increase in LF and a decline in AoA, which was correlated with semantic deficits.A decline in cluster size was predictive of phenoconversion.Increase in LF and decline in AoA were associated with GM volume loss of predominantly temporal areas, while decline in the number of clusters, cluster size, and switches correlated with GM volume loss of predominantly frontal areas.
The semantic fluency total score is strongly intertwined with the qualitative aspects of the task.For accurate and timely word retrieval, both AoA and LF [26], and clustering and switching components [27] are required.Coinciding with the results of our principal component analysis, in which we found a AoA-LF component and a clustersswitches component, previous studies found strong correlations between AoA and LF, and between the number of clusters and the number of switches.With respect to the relation between AoA and LF, correlations have found to be high in natural languages, as early-acquired words tend to occur more frequently than late-required words [28].The number of switches and the number of clusters are correlated, as by identifying the number of clusters, one can generate the number of switches (corresponds to the number of clusters minus one) [27].
Irrespective of the underlying FTD mutation, we showed a decline in the total score in mutation carriers from at least 4 years prior to phenoconversion.Decline in semantic fluency was found to be an early cognitive marker in other neurodegenerative diseases.For instance, in preclinical AD and Huntington's disease, as early as 12 years before the onset of dementia, decline in a measure of semantic memory Fig. 3 Grey matter atrophy patterns associated with lower qualitative fluency performance.VBM analyses demonstrated grey matter volume loss to be associated with lower performance in lexical frequency (red), age of acquisition (green), clusters (blue), cluster size (yellow), and switches (copper) in the total group of phenoconverters (top), GRN phenoconverters (middle), and MAPT phenoconverters (bottom).We set the statistical threshold at p < 0.05 (FWE-corrected).Abbreviations: L left, GRN progranulin, MAPT microtubule-associated protein tau was found [29,30].In another study, it was amongst the most statistically sensitive cognitive measures of symptomatic conversion [31].Studies into semantic fluency decline in presymptomatic FTD have shown somewhat contrasting results.One study demonstrated decline in semantic fluency in MAPT mutation carriers from 6 years before estimated symptom onset [7], whereas another only found decline at estimated symptom onset [6].It should be noted that both studies used estimated years to symptom onset as a proxy for actual onset, which can be less reliable in familial FTD [6].Utilizing a similar research design as our current study, decline of semantic fluency from 4 years before symptom onset in MAPT converters, and decline of semantic fluency was found to be the best predictor for having an MAPT mutation [8].Although semantic fluency did not decline in GRN converters, decline on phonemic fluency was found to be predictive of an GRN mutation, confirming the value of fluency tasks in presymptomatic FTD as they can distinguish the underlying genotype [8].
Starting at least 4 years pre-phenoconversion, GRN mutation carriers produced fewer but larger clusters, and had fewer switches, than MAPT phenoconverters and controls.A likely explanation for the decline in their total score is that GRN phenoconverters deteriorate in cognitive flexibility [32], and thereby lose the ability to switch between semantic clusters in order to generate more words.Indeed, in our study the decline in the number of clusters and switches, and the increase in cluster size, correlated with decline in executive function.Executive dysfunction is known to be a distinctive cognitive feature in GRN mutations, demonstrating deficits in symptomatic mutation carriers [33], extending to the presymptomatic stage [7,8].When converting to the symptomatic stage, GRN mutation carriers also show the most decline in executive function [34].
Starting at least 4 years pre-phenoconversion, MAPT mutation carriers produced words with a higher LF and a lower AoA, and had fewer clusters and smaller clusters than GRN phenoconverters and controls.These findings point towards deterioration of the semantic system as an explanation as to why qualitative fluency measures change in MAPT.First, semantic decline is most likely to affect words with a lower LF and a higher AoA first, as the categorical organization of the system retrieves the 'typical' exemplars faster and more accurately, and they are better represented and more interconnected to other concepts than those that enter the semantic system later in life [35].The reliance of both processes on semantic processing is further supported by their correlation with the verbal SAT which assesses verbal semantic deficits [36].Clustering relies on lexical retrieval, vocabulary size and lexical access, and thus is mainly supported by the integrity of the semantic system [13].
We demonstrated that-irrespective of the underlying mutation-decline in the number of clusters, cluster size, and switches correlated with GM volume loss of predominantly frontal areas, while worse LF and AoA performance was associated with GM volume loss of predominantly temporal areas.These neuroanatomical correlates are in line with the predominant frontal involvement in GRN mutation carriers [6,34], and link the degradation of the fronto-insula network to less cognitive flexibility-and as a consequence early clustering-switching impairment-as the most likely underpinning of declining fluency performance in conversion to GRN-associated FTD.The finding that LF and AoA rely on temporal lobe functioning could explain why these qualitative features are changing early in MAPT mutation carriers, as temporal volume loss is considered the neuroimaging hallmark of MAPT [37,38], being present up to several decades before symptom onset [39].Although PPA is not a frequent clinical phenotype, semantic impairments are well-described in MAPT-related FTD [40].Our cohort includes three P301L and three G272V phenoconverters, which is too small to investigate differences between the two tau mutations.Nevertheless, with larger sample sizes it would be interesting to explore if there is clinical heterogeneity across the MAPT mutations [41], as the P301L mutation often presents with a language phenotype with semantic deficits [42].Impairments in semantic fluency are found to be common as the result of cerebellar pathology, as this subcortical region plays a crucial role in motor performance and executive processes necessary for organizing and monitoring word output [43].
The key strength of our study is our longitudinal design, spanning up to 6 years of follow-up in a large single-centre sample of participants from GRN and MAPT FTD-families.This design allowed the investigation of mutation carriers as they were converting to the symptomatic stage, which provides us more accurate information about the underlying disease process than previous studies that used estimated years to onset as a proxy [40].We chose multilevel linear modelling to handle potential missing data and unbalanced time-points that were the result of our ongoing prospective study.The small sample of phenoconverters, in combination with the large fluctuations in the data of the qualitative measures, are the largest drawback of the study, which has hampered our statistical power and interpretation of results.As the multilevel model assumes a linear relationship between genetic status and fluency performance it is possible that we have missed non-linear effects.Ideally, the semantic fluency test should not have been used in determining phenoconversion, however in our multidisciplinary approach we have used all available clinical information-e.g., MR brain imaging, anamnestic and heteroanamnestic information, questionnaires-so that symptom onset did not solely depend on the neuropsychological assessment.Although theoretically a cluster can consist of a single word, one cannot measure interword intervals for fewer than two words, so that a single-word "cluster" cannot be corroborated by the measurement of interword intervals.Following Ledoux et al. [11], we therefore defined clusters not as single words, but only as multiword strings (i.e. two or more consecutive words) whose relationship is defined by one of the scoring rules, but realize this could have penalized patients with a low total output.Lastly, the analyses on the presymptomatic mutation carriers were performed using the original baseline and follow-up visits, regardless of years from potential phenoconversion, therefore they might have lost some sensitivity to detect decline.Future directions include replication of our findings in larger multicentre cohorts, including C9orf72 mutation carriers.Moreover, using qualitative measures in discriminative event-based models could help us understand the dynamics of disease progression and how other biomarkers (e.g., NfL) fit into this [44].Lastly, future studies could look into the effect of using time-bins next to the usual 60-s output, as most people start with readily available animals and produce less familiar exemplars as the task develops (which affects the qualitative measures), and was found to be particularly sensitive to mutation status in presymptomatic APOE-ε4 carriers at-risk of developing AD [14].

Conclusion
Our pilot study shows that qualitative aspects of semantic fluency change in presymptomatic FTD, and shows different profiles and inflection points depending on the mutation involved.This could provide important insight into the mechanisms as to why the "traditional" total score is declining.Its brief and easy-to-apply nature makes the total score of the semantic fluency test a likely candidate cognitive biomarker for upcoming clinical trials for FTD, but more research with a larger sample of phenoconverters is needed to replicate our findings, and to explore the additional value of qualitative measures in identifying and tracking mutation carriers as they convert to the symptomatic stage.

Fig. 1
Fig.1Subject sample and study design.The total sample (n = 118) was divided into mutation carriers (n = 63) and non-carriers (healthy controls; n = 55), the mutation carrier group was split into phenoconverters (n = 10) and non-converters (n = 53).The original data was restructured for phenoconverters, so that there were five time points: 4 years before phenoconversion, 2 years before phenoconversion, phenoconversion, 1 year post-phenoconversion and 2 years postphenoconversion.Four years before phenoconversion data was available for only 6 phenoconverters, as the other four phenoconverters developed symptoms between baseline and the first follow-up visit, and therefore no data 4 years prior to phenoconversion were available.One and 2 years after phenoconversion data was available for

Table 1
Demographic and clinical data per subgroup [24]es indicate mean ± SD or n (%)Abbreviations: MAPT microtubule-associated protein tau, GRN progranulin, C9orf72 chromosome 9 open reading frame 72, bvFTD behavioural variant frontotemporal dementia, nfvPPA non-fluent variant primary progressive aphasia, N/A not applicable, MMSE Mini-Mental State Examination, FAB Frontal Assessment Battery, BDI Beck's Depression Inventory, NPI-Q neuropsychiatric questionnaire, BNT Boston Naming Test, SAT Semantic Association Test, TMT Trail Making Test *Dutch educational system categorized into levels from 1 = less than 6 years of primary education to 7 = academic schooling: Duits A KR. Schatten van het premorbide functioneren.In: Hendriks et al.[24]

Table 2
Longitudinal trajectories of the total score and qualitative fluency measures in phenoconverters and non-converters Values indicate: z-scores (i.e.individual test score minus the mean of the healthy controls, divided by the standard deviation of the healthy controls per time point).Data from healthy controls are omitted from this table as their mean z-score is zero and their standard deviation is one by default.Significant values (p < 0.05) are expressed in bold

Table 3
Partial correlations between decline in fluency measures and relevant cognitive testsValues indicate: partial correlation coefficients (corrected for age, sex and education level).

Appendix 2: Principal component analysis on the five qualitative fluency measures
LF lexical frequency, AoA Age of Acquisition