Clusters of anatomical disease-burden patterns in ALS: a data-driven approach confirms radiological subtypes

Amyotrophic lateral sclerosis (ALS) is associated with considerable clinical heterogeneity spanning from diverse disability profiles, differences in UMN/LMN involvement, divergent progression rates, to variability in frontotemporal dysfunction. A multitude of classification frameworks and staging systems have been proposed based on clinical and neuropsychological characteristics, but disease subtypes are seldom defined based on anatomical patterns of disease burden without a prior clinical stratification. A prospective research study was conducted with a uniform imaging protocol to ascertain disease subtypes based on preferential cerebral involvement. Fifteen brain regions were systematically evaluated in each participant based on a comprehensive panel of cortical, subcortical and white matter integrity metrics. Using min–max scaled composite regional integrity scores, a two-step cluster analysis was conducted. Two radiological clusters were identified; 35.5% of patients belonging to ‘Cluster 1’ and 64.5% of patients segregating to ‘Cluster 2’. Subjects in Cluster 1 exhibited marked frontotemporal change. Predictor ranking revealed the following hierarchy of anatomical regions in decreasing importance: superior lateral temporal, inferior frontal, superior frontal, parietal, limbic, mesial inferior temporal, peri-Sylvian, subcortical, long association fibres, commissural, occipital, ‘sensory’, ‘motor’, cerebellum, and brainstem. While the majority of imaging studies first stratify patients based on clinical criteria or genetic profiles to describe phenotype- and genotype-associated imaging signatures, a data-driven approach may identify distinct disease subtypes without a priori patient categorisation. Our study illustrates that large radiology datasets may be potentially utilised to uncover disease subtypes associated with unique genetic, clinical or prognostic profiles.


Introduction
Clinical heterogeneity in ALS is widely recognised. While the diagnosis of ALS requires a core set of clinical features, considerable differences exist in progression rates, disability profiles, survival, cognitive manifestations, and behavioural features [1][2][3][4]. Key dimensions of clinical heterogeneity include LMN versus UMN predominance, body region of symptom onset and cognitive profiles, but less characteristic symptoms, such as extrapyramidal, cerebellar, and sensory deficits may also add to the diversity of clinical manifestations [5][6][7][8]. The practical upshot of clinical heterogeneity includes the considerable differences in care needs, support services, caregiver burden and resources needed for the multidisciplinary management of the condition. It is widely recognised that individualised supportive strategies are required for the optimal management of ALS, and it is also increasingly accepted the individualised pharmacotherapy may be needed instead of the traditional "one-drug-for-all" approach. The ramifications of disease heterogeneity span beyond patient care and are a considerable challenge in clinical trials which are often hampered by small cohort sizes, stringent entry criteria and high drop-out rates [9]. In line with the concepts of precision medicine, and in recognition of the diversity of clinical trajectories in ALS, a multitude of classification schemes and staging systems were introduced to categorise patient with similar disability, prognostic or cognitive profiles [10][11][12][13][14]. These staging systems are relatively easy to apply in the clinical setting, useful in pharmacological trials, and proved successful in reducing clinical diversity by allocating patients into specific disease categories. Clinical staging, however, require the careful consideration of observed parameters and invariably rely on the interpretation of medical cues, reported symptoms and other potentially subjective factors. An alternative to clinical staging is the exploration of quantitative biomarker data [15][16][17] to evaluate if distinct subgroups exist, using a datadriven approach relying solely on quantitative, "measured" variables. While the majority of imaging studies use clinical categorisation first to then describe phenotype-, genotype-or stage-associated radiological profiles [18,19], an alternative is the cluster analysis of pooled imaging data and the subsequent analysis of cluster-associated clinical characteristics. Accordingly, the main objective of this study is the evaluation of a large unsegregated MR dataset with regards to radiological clusters of anatomical involvement without a priori patient categorisation. Our hypothesis is that disease subtypes may be readily identified using a data-driven approach without relying on accompanying clinical variables. A secondary objective of the study is the interrogation of cluster-associated demographic, clinical and genetic information once cluster membership has been established for each participant.

Participants
A total of 214 patients with amyotrophic lateral sclerosis (ALS) were included in a prospective, single-centre study. The study was approved by the institutional ethics board (Beaumont Hospital, Dublin, Ireland), and all participants provided informed consent. Exclusion criteria included prior cerebrovascular events, traumatic brain injury, neurosurgical procedures, as well as comorbid neoplastic, paraneoplastic or neuroinflammatory diagnoses. Participating ALS patients were diagnosed according to the El Escorial criteria. 161 patients were screened for GGG GCC hexanucleotide expansions in C9orf72. Methods for genetic screening have been previously reported [20]. GGG GCC repeat expansions in C9orf72 longer than 30 repeats were considered pathological.

White matter indices
Following quality control, eddy current corrections and skull removal were applied to DTI data before a tensor model was fitted to generate diffusivity maps of fractional anisotropy (FA). FMRIB's software library's (v6.0) tract-based statistics (TBSS) module was implemented for the non-linear registration of DTI images, skeletonisation and the creation of a mean FA mask. The study-specific white matter skeleton was masked in MNI space by the anatomical labels of the following white matter regions: left and right anterior thalamic radiation, left and right posterior thalamic radiation, left and right cerebellar white matter skeleton, left and right corticospinal tract, forceps major, body of the corpus callosum, forceps minor, left and right inferior cerebellar peduncle, middle cerebellar peduncle, left and right superior cerebellar peduncle, left and right inferior longitudinal fasciculus, left and right uncinate fasciculus, left and right superior frontal lobe, left and right inferior frontal lobe, left and right temporal lobe, left and right occipital lobe, left and right parietal lobe, left and right cingulum, left and right inferior fronto-occipital fasciculus, left and right superior longitudinal fasciculus, left and right medial lemniscus, fornix, and brainstem. To generate spatial masks for the cerebellar peduncles, medial lemniscus and posterior thalamic radiation, the labels of the ICBM-DTI-81 white matter atlas [24,25] were used. To create masks for the cingulum, forceps major, forceps minor, body of corpus callosum, anterior thalamic radiation, uncinate, inferior longitudinal fasciculi, superior longitudinal fasciculi, inferior fronto-occipital fasciculi, and corticospinal tracts, the labels of the JHU white matter tractography atlas [26,27] were utilised. FMRIB's fornix template [28] was used to mask the study-specific white matter skeleton in MNI space. Labels of the MNI probabilistic atlas [29,30] was used to generate a white masks for the cerebellum, frontal, temporal, occipital, and parietal lobes. The frontal lobe was divided into inferior and superior sections at MNI coordinate z = 8. Label 8 of the Harvard-Oxford probability atlas [31] was used to create a brainstem mask.
Based on the 15 regional integrity scores, a 2-step cluster analysis was conducted using Euclidean distance measure. The number of clusters was not fixed a priori, and the Bayesian Information Criterion (BIC) was used to determine the number of clusters. Based on cluster membership of individual patients, cluster sizes were determined and silhouette analyses run using the STATS CLUS SIL extension of SPSS. The hierarchy of input variables was calculated to rank predictor importance, i.e. the measures of which brain regions best segregate the patients. Cluster membership was plotted in a scatter plot along the integrity gradient of the three most relevant ROI to demonstrate case separation. In post hoc analyses, the clinical and genetic profiles of the clusters were contrasted.
To illustrate the discrimination potential of the anatomical regions between the clusters, a scatter plot was generated based on the integrity of the three most relevant anatomical regions (Fig. 2).
Based on allocated cluster membership, the demographic, clinical and genetic profiles of the two clusters were evaluated; Cluster 1: n = 76 age: 61.9 ± 11.9, male: 54  (χ2 corr. = 3.99, p = 0.136). The two clusters differed in genetic profiles (χ2 corr. = 23.17, p < 0.0001). In Cluster 1, there were 16 hexanucleotide repeat carriers which is 23.5% of patients with genetic information available in the cluster (n = 68). In Cluster 2, there were only six hexanucleotide repeat carriers which is 6.5% of patients with genetic information available in the cluster (n = 93). Of the 22 hexanucleotide carriers included in the study, 72.7% (n = 16) clustered to Cluster 1, and only 27.3% (n = 6) to Cluster 2.

Discussion
Our data confirm the radiological clustering of ALS into two relatively distinct subtypes. Contrary to previous staging or classification studies, we have not incorporated any complementary clinical, demographic or genetic information and uncovered two distinct subtypes based on the anatomical distribution of degenerative change alone in a large cohort of pooled ALS patients. The motivation behind our approach was to solely interpret objective, quantitative, spatially coded radiological data without applying any a priori stratification strategy. Whilst most studies first categorise patients based on clinical, genetic or phenotypic criteria to then describe phenotype-or genotype-associated imaging signatures, our intention was the opposite; evaluate the natural segregation of patients based on pathological patterns and then assess clinical features associated with the clusters.
Using cerebral grey and white matte measures in 15 cerebral regions covering the entire brain, we have detected 2 distinct subgroups: a larger cluster (64.5%) of patients with moderate extra-motor disease burden and a smaller cluster (35.5%) with considerable frontotemporal pathology. One of the objectives of the study was to evaluate which brain regions best distinguish the subgroups; therefore, the evaluation of predictor importance is of particular interest. The marked involvement of superior lateral temporal and frontal regions in a subset of patients is consistent with previous reports, but the high predictor importance of parietal changes merits further discussion. ALS is not traditionally associated with preferential parietal atrophy [33]. Parietal changes have been sporadically described mostly in association with advanced disease, but our study suggests that parietal indices may help to segregate patients into subgroups. Predictor importance ranking also revealed that the involvement of brain regions traditionally associated with ALS, such as the motor cortex, corticospinal tracts, commissural structures and brainstem do not readily distinguish disease clusters as the pathology of these regions represent core, unifying features of disease [34,35]. The low predictor importance of the cerebellum is also of interest. Cerebellar changes in ALS have only been recently characterised in detail [36] and the gravity of cerebellar changes are thought to be associated with specific genotypes and phenotypes [7,37]. While an ALS-ataxia continuum was proposed by some, our data did not indicate the existence of a cluster of patients with marked cerebellar involvement without frontotemporal change.
The heterogeneity of limbic involvement is consistent with the vast body of neuropsychology and neuroimaging literature [38][39][40]. The importance of mesial inferior temporal structures in segregating ALS subtypes is consistent with the literature of medial temporal pathology in ALS and their contribution to cognitive deficits [18,41]. Peri-Sylvian features ranked relatively high in our study, despite the left-right averaging of integrity variables. Peri-Sylvian regions are seldom assessed specifically in ALS, as the focus of imaging studies in ALS-FTD is often orbitofrontal, dorsolateral prefrontal and various temporal regions. Preferential insular and Broca's area degeneration have been previously described in association with C9orf72 [42] but also often detected in whole-brain cortical thickness or morphometric analyses. Language deficits in ALS are also relatively well described [43,44], but rarely linked to focal degenerative change [45]. Subcortical integrity metrics ranked to the middle of predictor hierarchy which is somewhat unexpected given the role of subcortical structures driving neuropsychological manifestations and the notion that hexanucleotide carriers may exhibit particularly marked subcortical degeneration [46][47][48][49]. Sensory areas ranked low in their importance of separating the clusters, despite recent reports of subtle or subclinical sensory deficits in ALS [50,51]. Our predictor analysis outcomes highlight the importance of systematically assessing each brain region in ALS instead of only pursuing the analysis of brain regions which are known to be affected based on post mortem data. Certain anatomical areas such as the parietal lobes and occipital lobe may not be characteristic regions of degeneration, yet, as illustrated, may have a role in segregating specific ALS subtypes. This observation is consistent with the emerging machine-learning literature of ALS [52,53] which suggests that feature importance analyses, especially in multi-class classification schemes, may identify brain regions which are not classically associated with ALS [54,55].
Our data indicate a relative discordance between clinical and radiological profiles. While subjects in Cluster 1 exhibited marked frontotemporal change radiologically and the proportion of patients with cognitive impairment was higher, the statistical comparison of clinical variables in the two clusters did not reach significance. Furthermore, the two clusters were also matched in motor disability as indicated by their ALSFRS-r profiles. The dissociation between disease burden and clinical performance is increasingly recognised [56] and a multitude of factors, such as compensatory processes, "motor reserve" and "cognitive reserve" may contribute [57][58][59]. The relative genetic segregation of subjects based on their imaging profiles is of particular interest; 72.7% of hexanucleotide carriers segregated to Cluster 1, and only 27.3% to Cluster 2. It is conceivable, that in much larger datasets, imaging may have a role in uncovering anatomically unique subgroups which may carry a higher percentage of specific genetic variants or groups with distinct clinical features.
Our study is not without limitations. A silhouette coefficient of 0.572 can only be interpreted as a "reasonable" structure [32]. Our cluster analysis relied on cross-sectional data, and similarly to other studies [53,60], the clinical implications of cluster membership need to be characterised further with regards to potential prognostic and survival ramifications. The assessment whether radiological cluster membership is consistent longitudinally throughout the course of the disease would be of interest [61]. Furthermore, only very basic clinical variables were appraised in the resulting anatomical clusters such as composite disability scores and cognitive screening outcomes. The fine-grained assessment of specific clinical domains such as pyramidal, extrapyramidal, cerebellar, language, social cognition, and apathy scores may reveal significant inter-cluster differences [4,5,[62][63][64]. To explore patient segregation into core pathological patterns, relatively large anatomical regions were defined and only structural integrity metrics evaluated. The incorporation of spinal cord metrics [65,66] and functional network integrity indices [17,67] may have helped to identify additional clusters. Only symptomatic patients with an established diagnosis of ALS were included in this study. Given the considerable evidence of brain [44,68] and spinal cord [69] alterations long before symptom onset, the anatomical clustering of asymptomatic mutation carriers would be of particular interest. Finally, the inclusion of non-ALS MNDs, such as SBMA, SMA, PLS, PPS or PMA in cluster analyses may be of potential interest to evaluate if these subtypes segregate from ALS based on their radiological profiles [70][71][72][73][74][75][76]. Notwithstanding these limitations, our study demonstrates that pooled radiology data may be utilised to uncover disease subtypes which may be associated with unique genetic profiles.

Conclusions
Cluster analysis of imaging data reveals distinct subtypes in ALS without accompanying clinical information. The interrogation of biomarkers by data-driven approaches helps to explore the heterogeneity of neurodegenerative conditions without a priori patient stratification. With the increased availability of large harmonised datasets, similar analyses may expose unique disease subtypes with distinctive clinical, prognostic or genetic traits.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.