Amyotrophic lateral sclerosis stratification: unveiling patterns with virome, inflammation, and metabolism molecules

Amyotrophic lateral sclerosis (ALS) is an untreatable and clinically heterogeneous condition primarily affecting motor neurons. The ongoing quest for reliable biomarkers that mirror the disease status and progression has led to investigations that extend beyond motor neurons’ pathology, encompassing broader systemic factors such as metabolism, immunity, and the microbiome. Our study contributes to this effort by examining the potential role of microbiome-related components, including viral elements, such as torque tenovirus (TTV), and various inflammatory factors, in ALS. In our analysis of serum samples from 100 ALS patients and 34 healthy controls (HC), we evaluated 14 cytokines, TTV DNA load, and 18 free fatty acids (FFA). We found that the evaluated variables are effective in differentiating ALS patients from healthy controls. In addition, our research identifies four unique patient clusters, each characterized by distinct biological profiles. Intriguingly, no correlations were found with site of onset, sex, progression rate, phenotype, or C9ORF72 expansion. A remarkable aspect of our findings is the discovery of a gender-specific relationship between levels of 2-ethylhexanoic acid and patient survival. In addition to contributing to the growing body of evidence suggesting altered peripheral immune responses in ALS, our exploratory research underscores metabolic diversity challenging conventional clinical classifications. If our exploratory findings are validated by further research, they could significantly impact disease understanding and patient care customization. Identifying groups based on biological profiles might aid in clustering patients with varying responses to treatments. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00415-024-12348-7.


Introduction
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease affecting motor neuron (MN), marked by significant genetic and clinical heterogeneity.This diversity mirrors the brain physiological processes, altering host susceptibility to various disorders, including neurodegenerative diseases [68].Specifically, the gut microbiota impacts the central nervous system (CNS), through a bidirectional interaction, the gut-brain axis, influencing neuronal health via the production of neuroactive metabolites such as short chain fatty acids (SCFA) and toxins, and modulates the immune system [39], for instance, by affecting T cells' activity and differentiation [13,54].In addition, the viral components of the microbiome, known as the "virome," may have a significant role in maintaining immune health [8].The human virome comprises a variety of commensals and pathogenic viruses that elicit a broad range of immune responses from the host.While persistent viral immunomodulation is associated with several inflammatory disorders, beneficial effects, such as protection against diseases, have also been observed [51].The most common element of the human virome is the torque teno virus (TTV), belonging to the Anellovirus family, which causes persistent human infections [32,66].Levels of TTV viremia, especially high in immunosuppressed patients [19,21,26,57], have been closely associated with host immunity, suggesting their potential as biomarkers for assessing the immune system functionality [40,55].Given the significant inter-individual variations in immunity and disease susceptibility influenced by the composition and function of the human microbiome, our study aims to enhance the understanding of ALS through microbiome, metabolism, and immune-derived molecules to characterize patients.Our findings unveil distinct patient clusters based on biological variables and identify a gender-specific association with patient survival, providing new insights into ALS pathogenesis and potential therapeutic approaches.

Study population
We conducted a case-control cohort study at the ALS Center of Modena University Hospital in Italy, involving 100 patients with newly diagnosed ALS between January 2017 and January 2020, with follow-up extending to July 2022.The ALS Center of Modena coordinates the Register of ALS of Emilia Romagna region (ERRALS) [23,44], which encompasses a population of 4.5 million inhabitants.The patients were diagnosed with possible, probable, or definite ALS, according to the Revised El Escorial criteria [7].Thirty-four healthy individuals were also recruited among healthy unrelated spouses of patients.Both the patients and the healthy controls had to be aged between 18 and 80 years, have a BMI of 18 or above, and possess the capacity to understand and provide informed consent.The exclusion criteria included dementia or any other condition that compromised the ability to consent; known organic gastrointestinal disease (including, but not limited to, malignancy, inflammatory bowel disease, gastric ulcer, chronic diarrhea, gastroesophageal reflux); celiac disease and/or documented food intolerances (e.g., lactose intolerance); autoimmune disorders; severe comorbidities (such as liver, heart, or kidney failure, or chronic infections such as HIV, TBC, or hepatitis); history of complicated gastrointestinal surgery; and acute infections at the time of sampling.The study received approval from the Ethical Committee of Modena (Comitato Etico Provinciale di Modena, file n. 15/17), and informed consent was obtained from all participants.The reporting of clinical data complies with the STROBE guidelines.

Clinical data
Clinical data were obtained from ERRALS, which has been enrolling MND patients at the time of their diagnosis and prospectively collecting demographic and clinical information since 2009 [44].For all participants in this study, we extracted details on sex, age, and site of symptom onset (bulbar, upper limb or lower limb, respiratory), date of diagnosis, phenotype (classic, bulbar, upper motor neuron predominant ALS, flail arm, and flail leg, respiratory) [9,58], genotype, and the ALS Functional Rating Scale-Revised (ALSFRS-R) total score at the time of diagnosis, at sampling and at the last available observation [43].Except for six patients, all were screened at least for mutation in C9ORF72, SOD1, FUS, and TARDBP as detailed elsewhere [45].Participants were followed from diagnosis until death or the last observation, whichever occurred first.The dates of initiation for nutritional or respiratory support were also recorded.Survival was measured from disease onset to death or the commencement of invasive ventilation.The rate of disease progression at diagnosis was determined by subtracting the ALSFRS-R score from 48 (the value used for symptom onset) and dividing by the number of months until diagnosis [33].Disease progression rate was also calculated at the time of sampling and at the last observation, considering the monthly decline in the ALSFRS-R score from the symptom onset and diagnosis.Disease progression was classified as "slow" if ≤ 0.4 points per month, "fast" if ≥ 1 point per month, and "intermediate" for rates in between.

Sample collection and processing
Serum samples were obtained by venipuncture at the time of diagnosis or shortly thereafter, following an overnight fasting period, and processed following standard procedures.After sample centrifugation for 10 min at 1300 × g, the supernatant was divided into aliquots and stored in polypropylene tubes at -80 °C in Modena Neurobiobank, until shipping to University of Florence.

Quantification of TTV-DNA plasma levels
Viral DNA was extracted from 200 μl of serum using QIAamp DNA Mini kit (QIAGEN, Chatsworth, CA) and used to determine the presence and loads of TTV-DNA using a single-step universal TaqMan real-time PCR assay [41].This assay uses primers (AMTS, 5′-GTG CCG IAGG TGA GTTTA-3′; AMTAS, 5′-AGC CCG GCC AGT CC-3′) and probes (AMTPTU, 5′-TCA AGG GGC AAT TCG GGC T-3′) designed on a highly conserved segment of the untranslated region of the viral genome and has, therefore, the capacity to detect all the species in which TTV is classified.TTV loads were expressed as the number of viral DNA copies/mL of serum sample.The lower limit of detection was 10 copies of TTV-DNA/mL.The procedures used to quantitate the copy numbers and assess specificity, sensitivity, intra-and interassay precision, and reproducibility have been previously described [41].All the procedures to validate the amplification process and to exclude the presence of carryover contaminations were performed: serum handling, DNA extraction, PCR amplification, and electrophoresis analysis were carried out in independent rooms; appropriate negative controls were added during DNA extraction and PCR amplification; and positive and negative controls (i.e., no template control and/or no amplification control) were run in each PCR.

Statistical analysis
Categorical variables were presented as absolute frequencies and percentages and were compared between ALS patients and healthy controls using the χ 2 test for unpaired data.Continuous variables were presented as median value and interquartile range (calculated as difference between the 75th and 25th percentiles of the data) and were compared by means of Mann-Whitney U-test (Bonferroni correction) in SPSS 27.0.P values less than 0.05 were considered statistically significant.
High-throughput data are often subject to batch effects.We employed the ComBat method proposed by Johnson et al. [31] to remove known batch effects due to experiments conducted under different conditions.ComBat is robust to outliers even in small batch sizes.Batch effects adjustment was performed with R package "sva" [37].
We employed the permutational multivariate analysis of variance (PERMANOVA) to compare multivariate sample means across different groups.PERMANOVA is a statistical test that does not rely on distributional assumptions of the data (i.e., normality), making it better suited than traditional methods for analyzing complex data.In addition, we used non-metric multidimensional scaling (NMDS) to simplify multivariate data into a few relevant axes to facilitate recognition and interpretation of nonlinear patterns and differences among groups.These plots have been computed on Euclidean distance using R 4.2 with the help of the packages vegan 2.6.2 and ggplot2 3.3.6 to compare groups.

Cluster and survival analysis
A Gaussian Mixture Model (GMM) [4] was implemented to identify common biological profiles among the subjects.GMM is a model-based approach to clustering that associates each component of a finite mixture with a cluster.To focus our analysis on features that presented a large informative power, we retained in the analysis only those features whose median absolute deviation (MAD) exceeded a significance threshold.In fact, a larger MAD corresponds to a higher discriminatory power.All the continuous variables have been standardized.This method produces clusters of subjects that are homogeneous in terms of biological features, and each cluster is modeled with a multivariate Gaussian distribution.We select the number of clusters by Bayesian Information Criterion (BIC).The method's output is a list of clusters.The method allocates to each cluster similar subjects (in terms of biological features) and also estimates the parameters of the multivariate Gaussian density associated with the cluster.GMM analysis is performed with R package "Mclust" [59].For the survival analysis, we adopted a two-stage analysis to estimate the associations between survival time and biological features.In the first step, we deployed a modified version of the sure independence screening (SIS) [16] procedure.SIS uses the notion of marginal correlation--in our case, the correlation of a single biological feature with the survival time--to rank the features.We selected for step 2 those features with the smallest p-value from an accelerated failure time (AFT) [67] model with that given feature as the only predictor.An AFT model is a parametric model that measures the impact of the predictors on the survival time (instead of hazard, as in the Cox model).In the AFT model, the effect of explanatory variables is to accelerate or decelerate the time to event by a constant factor.In this study, for each predictor, we estimated the p-value testing the model with the considered feature plus the intercept against a model with only the intercept with a likelihood-ratio test.In the second step, we used the AFT model to determine the joint effect of the selected biological features and clinical factors on survivalIn particular, and the underlying assumption is that the log of the survival time is linearly affected by the biological features.All the statistical procedures for cluster and survival analysis were performed in R.

Clinical and demographical characteristics of participants
We analyzed a cohort of 100 patients with newly diagnosed ALS and 34 sex-matched healthy controls as depicted in the study diagram (Supplementary Fig. 1).Patients' features are detailed in Supplementary Table 1.The average age at the time of sampling was 67 years (ranging from 37 to 93) for patients and 70 years (range 51-81) for HC (P = 0.146).Among the patients, 28 presented bulbar symptoms, 69 had limb onset, and 3 exhibited early respiratory impairment.Forty-six patients developed a classic phenotype, 25 bulbar, 18 flail arm/leg, 7 pyramidal, and 3 respiratory phenotypes.Disease progression, as indicated by the initial monthly decline in ALSFRS-R scores, was slow in 42 patients, fast in 31, and intermediate in 27.Genetic analysis revealed C9orf72 expansion in six patients and a FUS mutation in one.No other mutations were detected among the remaining patients, excluding six cases that were not genetically analyzed.The average ALSFRS-R total score at the time of sampling was 41.18 (SD:5.78).During the follow-up, 73 out of the 100 ALS patients either died or underwent tracheostomy with a mean survival time of 38.56 (SD:31.15)months from symptom onset.

Cytokine levels
Analysis revealed a distinct cytokine profile in ALS patients, with 10 out of 14 tested cytokines showing lower expression in ALS patients compared to healthy controls and IL-8 (CXCL8) being more highly expressed in ALS patients relative to healthy controls (Table 1).
No significant differences in cytokine expression were observed across patients with different disease onset, progression rates, phenotypes, or genotypes (including the presence of C9ORF72 expansion or FUS mutation) (Supplementary Table 2).

TTV-DNA status
ALS patients displayed a significantly higher serum load of TTV-DNA compared to healthy controls: TTV-DNA was detected in 88 out of 100 (88%) ALS serum samples with a mean TTV load of 2,370 (range 180-15,930) copies/ mL, while in the control group, 24 out of 34 (71%) serum samples tested positive for TTV, with a mean load of 157 (11-750) copies/mL (P < 0.0001).
The levels of TTV varied significantly among patients with different progression rate (P = 0.027), particularly when comparing fast progressors to slow progressors (Dunn's multiple comparison test, P < 0.05) (Fig. 1).However, TTV load did not show significant differences among patients with varying disease onset types, phenotypes, or C9ORF72 expansion (data not shown).

Profile of the free fatty acids
Compared to healthy controls, patients with ALS exhibited a higher total FFA level (P < 0.0001), while their SCFA level was generally lower.Among the evaluated MCFA and LCFA, only t2-ethylhexanoic acid, octanoic acid, and octadecanoic acid showed no significant difference between ALS patients and healthy controls (Table 2).Similarly, among SCFA, only butyric acid levels did not significantly differ.The distribution of FFA across different ALS progressors groups and according to disease onset or phenotype displayed considerable variability within groups, without significant differences.Patients with a C9ORF72 expansion showed a non-significant trend toward lower FFA levels (Supplementary Table 3).

Multivariate analysis with the entire dataset
The expression patterns of all examined biological variables are depicted in Fig. 2. To investigate the distinctions between ALS patients and healthy controls, as well as among ALS subtypes categorized by onset, phenotype, and the presence of C9ORF72 expansion, we conducted PERMANOVA tests using Euclidean distance.These tests included cytokines, fatty acids, and TTV-DNA values as independent variables.According to the PERMANOVA results, validated by the nonmetric multidimensional scaling (NMDS) analysis, there was a significant difference between ALS patients and healthy controls (Pr (> F) = 0.041) (Fig. 2B).However, it was not possible to differentiate among ALS patients based on their clinical characteristics (disease onset, phenotypes, and C9ORF72 expansion presence).

Cluster analysis
To analyze the biological profile of our study participants, we utilized a Gaussian Mixture Model (GMM), selecting eight specific features based on their mean absolute deviation (MAD).These features included certain fatty acids (valeric, 2-ethylhexanoic, benzoic, hexadecanoic, and octadecanoic acids), the cytokines MCP-1 and MIP-1α, and TTV load.Our analysis identified five distinct clusters among the subjects, showing consistency in their biological features, with the number of subjects in each cluster ranging from 5 to 46 (Fig. 3A).This clustering indicates a clear partitioning of the data into distinct groups.Nonetheless, we considered Cluster 2 as a residual cluster, comprising only five subjects (one healthy control and four ALS patients without mutations), characterized by a large variance in the expression of biological variables, especially of benzoic acid and MIP-1α (Supplementary Fig. 2).This suggests significant diversity within this cluster, indicating a lack of uniform biological patterns among its members.
Cluster 1 consists of healthy controls (91.18%), with no ALS patients, demonstrating the GMM's efficacy in distinguishing between healthy individuals and those with ALS.
The relationship between cluster membership and ALS diagnosis was statistically significant, as shown by the 2 test (P < 0.0001 ) (Fig. 3B,C).While ALS patients were distributed across four clusters, no significant correlation was found between cluster membership and various clinical factors such as disease and progression rate ( 2 test with P = 0.6866 ), sex ( 2 test with P = 0.6797 ), site of onset ( 2 test with P = 0.8483 ), phenotype ( 2 test with P = 0.6721), or the presence of C9ORF72 expansion ( 2 test with P = 0.0971) (Supplementary Fig. 3).This indicates that while the clusters differentiate ALS patients from healthy controls, they do not align with specific ALS clinical characteristics.
Members of Cluster 1, primarily consisting of healthy controls, consistently exhibited lower levels of TTV, benzoic acid, and hexadecanoic acid, but higher levels of MIP-1α, compared to the overall population average (Fig. 3A, Supplementary Fig. 2).When analyzing the ALS patients in Clusters 3, 4, and 5, we observed the following: (i) the levels of MIP-1α, TTV, and, to a lesser degree, benzoic acid, were similar across these clusters, all showing a markedly distinct pattern from that of Cluster 1; (ii) Cluster 3 stood out for the lower expression of valeric and 2-ethilhexanoic acids (iii) Cluster 4 exhibited higher levels of hexadecanoic and octadecanoic acids; (iv) Cluster 5 was characterized by increased levels of 2-ethylhexanoic acids and decreased levels of MCP-1 and, to a lesser extent, octadecanoic acid and MIP1-α.
To evaluate the statistical significance of our clusterbased analysis in differentiating between healthy controls and ALS patients, we conducted several statistical tests focusing on the differential expression of specific biological markers.We focused our analysis on those features for which we could formulate precise one-sided hypotheses based on the clustering results (see Fig. 3).Namely, we tested the null hypothesis of no difference between healthy controls and ALS against the alternatives of overexpression of benzoic and hexadecanoic acids and TTV in ALS and underexpression of valeric acid, MCP1, and MIP1-α in ALS.
All tests yielded highly significant results, with p-values < 0.0001, providing strong confidence in our findings (Supplementary Table 4).
These results are consistent with those obtained from the univariate analysis reported in Sects.1.2-1.5, which  showed significant differences in the expression of these same features between healthy controls and ALS patients.

Survival analysis
Initially, we implemented a survival regression model to investigate the impact of cluster membership on tracheostomy-free survival, which was log-transformed for analysis.This model did not reveal any statistically significant association between the clustering and survival time, aligning with the observation that disease progression rates do not vary significantly across clusters (Supplementary Fig. 3).To evaluate the association between specific biological features and survival, we first utilized the sure independence screening (SIS) method (step 1), to identify the most informative features.The highest-ranking markers were valeric, 2-ethylhexanoic, and octadecanoic acids.Incorporating these acids, along with age, sex (coded as a binary variable where 1 represents female patients), and an intercept, we ran an accelerated failure time (AFT) model assuming the error term followed a Weibull distribution.To account for potential nonlinear relationships, we allowed the impact of 2-ethylhexanoic acid on survival to be modeled nonlinearly through natural cubic splines with 3 degrees of freedom.An interaction term between 2-ethylhexanoic and sex was also included.The results are reported in Fig. 4. The AFT model demonstrated a good fit to the data (Fig. 4B) and identified significant associations: the nonlinear effects of 2-ethylhexanoic acid on survival were moderately significant (p-values 0.13, 0.03, 0.07 for each degree of freedom), as was the interaction between 2-ethylhexanoic acid and sex (p-values 0.13, 0.08, 0.04 for each degree of freedom).In an AFT model, interpreting the coefficients is typically straightforward: a one-unit increase in a covariate leads to a multiplication of the failure time by the exponent of its coefficient.For example, a one-unit increase in the level of valeric acid is associated with a 16% increase in survival time, holding Error" is the standard deviation of the sampling distribution of the coefficient estimates, indicating the precision of the estimates."z" is the z-statistic for each coefficient, used to test the significance of each parameter."p-value" is the p-value associated with the z-value, indicating the significance of each predictor variable in the model; B The validity of the assumed Weibull distribution for the survival times can be assessed using residuals that account for censoring.This is done by computing the fitted model residuals and creating a Kaplan-Meier estimate.The estimated residuals and the assumed Weibull distribution are then plotted and compared to assess their fit.A good fit to the data indicates that the Weibull distribution is a suitable model for the survival times.The solid black line represents the Kaplan-Meier estimator of the residuals, with the black dotted lines representing the upper and lower 95%CI.The solid red line represents the survival probability estimated with the fitted AFT model.C Graphical representation of the effect of 2-ethylhexanoic acid on log survival time, considering its interaction with sex.The left panel shows the effect of 2-ethylhexanoic acid on log survival for male patients and the right panel for females.Since we modeled the effect of 2-ethylhexanoic using nonlinear terms (cubic splines), it is not constant for different expression levels.Since the biological features have been standardized, the value "0" for 2-ethylhexanoic denotes the average expression of 2-ethylhexanoic acid other variables constant.Similarly, female patients are expected to have a 27% longer survival time compared to males, with other factors being equal.However, the presence of interaction and nonlinear terms in our model complicates the interpretation of these coefficients.Therefore, we can graphically represent the effect of 2-ethylhexanoic acid on log-transformed survival time for both male and female patients separately, providing a visual insight into its impact.

Discussion
In our study, we analyzed a broad spectrum of blood-based biological markers to delve into the increasingly recognized role of microbiome, metabolism, and immunity in ALS pathogenesis [47].Our comprehensive assessment included 14 inflammatory cytokines, 18 free fatty acids (end products of human and microbial metabolism), and TTV viremia, a virome-related potential marker linked to immune system function [40].
Initial univariate analysis revealed significant differences in 30 biological markers between newly diagnosed ALS patients and controls.Specifically, we found alterations in TTV viremia, 11 of the 14 tested cytokines and 16 over 18 free fatty acids.
Most cytokines exhibited reduced levels in ALS patients compared to healthy controls, with the exception of IL-8, which was elevated.Such reductions in cytokine levels and changes in immune cell profiles are consistent with findings in ALS [5,15,48,52] and similar peripheral immune dysregulations reported in other neurodegenerative diseases such as Alzheimer's, suggesting potential commonalities in immune alteration mechanisms across different neurological disorders [35].
The observed decrease in both proinflammatory and anti-inflammatory cytokines might result from peripheral immune cell anergy or an auto-regulatory feedback aimed at mitigating neuroinflammation.Factors such as changes in blood-brain barrier (BBB) permeability, allowing the infiltration of peripheral immune cells into the central nervous system, could contribute to this immune dysregulation [35,53].However, the literature reports conflicting evidence, often indicating increased levels of cytokines, especially on soluble factors [60], including IL-8 [15,25,30,36,50], a chemokine involved in microglia mobilization and activation, which can exacerbate neuroinflammation and brain damage [14,20,49].These discrepancies may stem from technical and methodological issues (i.e., small sample size of the analyzed cohort, selection bias, and the disease stage at which patients are studied [6]).Our previous studies, based on smaller cohorts, showed results that were not entirely consistent with the current findings [52].In the present study, we enrolled patients shortly after diagnosis, excluding those with a history of acute infections or severe comorbidities that could independently contribute to a chronic inflammation.This careful patient selection and consideration of potential confounding factors lend additional weight to our findings, underscoring the complexity of immune dysregulation in ALS.Our study contributes to the expanding evidence on cytokine dysregulation in ALS, providing valuable insights into the intricate interplay of cytokines and immune cells in the context of ALS pathophysiology.
The observation of generally low levels of cytokines and chemokines alongside elevated TTV viremia in ALS patients, especially among those with fast and intermediate progression, points to a compromised peripheral immune response in ALS.The increased TTV load, which has been suggested as a marker of immune function, where higher levels indicate excessive immunosuppression and lower levels indicate insufficient immunosuppression in transplant patients [64], may signify an ineffective immune response to the ongoing neuroinflammation characteristic of ALS, especially in those who experience rapid disease progression.This pattern of higher TTV viremia was also found in patients with progressive multiple sclerosis compared to those with relapsing remitting forms [42], indicating a possible broader implication in neurological diseases.
Considering the noted microbiome alteration in ALS [52], a leaky gut barrier could allow the systemic spread of bacterial products and viruses.Viruses themselves may influence human health by shaping the microbiota or by direct interactions with the immune system [38], suggesting that a comprehensive analysis of the gut virome and bacteriome in ALS could provide insightful information.
Furthermore, our investigation into FFA in ALS patients and healthy controls responds to emerging evidence linking alterations in the gut microbiome to changes in plasma lipid profiles in ALS, underscoring their potential as biomarkers for the disease [29].FFA, which serve as regulators of immunity function and metabolism, can be categorized by their carbon chain length (short, medium, long, or verylong) and saturation level.They are involved in inflammatory processes and their structural variations are linked to different biological functions, connecting lipid peroxidation and metabolism to inflammation seen in disorders such as ALS and Alzheimer's disease [24,62].Our findings support previous studies showing increased levels of total and very long chain fatty acids in the blood samples of ALS patients [2,11,17,27], while in addition to this, we observed elevated levels of MCFA, further highlighting the intricate interplay between lipid metabolism and disease pathogenesis in ALS.SCFA, the main microbiota-derived metabolites, known for their neuro-immunoendocrine regulatory and anti-inflammatory effects [22,61], were found to be decreased, mirroring patterns observed in other neurological disorders [15,32,66].
This reinforces the potential of serum lipid levels as biomarkers and highlights the intricate relationships between metabolic changes, the microbiome, and immune responses in the context of ALS pathogenesis.
Our multivariate analysis, utilizing PERMANOVA with all the evaluated variables, successfully differentiated ALS patients from healthy controls.In addition, by applying a Gaussian Mixture Model, we identified shared biological profiles among patients.The selection of variables for this model was guided by their statistical significance, determined through the mean absolute deviation method applied on the full range of biological features.This process highlighted eight biological elements (valeric, 2-ethylhexanoic, benzoic, hexadecanoic and octadecanoic acids, MCP-1 and MIP1-α cytokines, and TTV-DNA) as differentiators between ALS patients from healthy controls.In addition, these eight biological variables organized patients into clusters with distinct characteristics.The clusters, characterized by unique metabolic signatures, did not correspond with traditional clinical ALS categories, such as onset type, phenotype, genotype, or disease progression rate.For instance, Cluster 3 exhibited low levels of valeric and 2-ethylhexanoic acids, Cluster 4 showed high levels of hexadecanoic and octadecanoic acids, and Cluster 5 had elevated 2-ethylhexanoic acid but reduced MCP-1α.This observation implies that the clusters may more accurately reflect metabolic variations in ALS rather than traditional clinical metrics [24,34].
The distinction between these immune-metabolic clusters and classical clinical categorizations suggests the potential for identifying unique ALS biotypes, which could offer a more representation of the disease biological attributes.This perspective shift toward biological rather than clinical classification could enrich our understanding of ALS.Recognizing these groups could be instrumental in developing targeted therapeutic strategies for distinct segments within the ALS patient population.In our research, we delved into the prognostic potential of specific biological variables and uncovered a significant correlation between levels of 2-ethylhexanoic acid and survival in ALS patients, revealing a notable gender-based difference in this association.This finding emphasizes sex differences in ALS [12,46] and introduces an additional layer of complexity in utilizing these biomarkers for disease progression prediction.
The observed "protective" effect of 2-ethylhexanoic acids in females is unprecedented and could be explained by an interaction with estrogen hormones that were found neuroprotective in ALS [63,65].Another possibility is represented by a sex-specific genetic architecture [10].However, our study's limitations must be acknowledged.By focusing solely on blood-derived biomarkers, we missed the opportunity to directly explore the gut-brain axis or to analyze gut microbiota composition differences between ALS patients and healthy controls or across the identified clusters.Consequently, we could not provide direct evidence of an imbalanced gut microbiota composition in ALS, previously established in our works [52].
Furthermore, while examining systemic immune dysregulation provides a comprehensive perspective on the immune response in ALS, an ideal approach would include analyses at the single neuronal cell level.Future studies should aim to integrate single-cell analyses to enrich our findings, offering a finer-grained insight into the cellular dynamics at play.Our study's scope was also limited by not including patients with other neurodegenerative diseases.This is a critical aspect, as these conditions are often considered in differential ALS diagnosis.Despite our rigorous selection criteria, we could not eliminate all potential confounding factors such as diet composition [18,52].Nonetheless, our study successfully identified distinct clusters within the ALS patient population, characterized by unique biological profiles.
To further validate and expand our findings, future research should include a wider array of biomarkers, encompassing plasma, gut microbiome, and cellular immunity aspects, such as regulatory T cells, in larger and more diverse cohorts.In addition, our methodology was tailored specifically for TTV detection and did not account for the discovery of other potentially relevant viruses.In conclusion, our study marks a significant step forward in comprehensively analyzing serum cytokine profiles, FFA, and TTV in a substantial ALS patient cohort.We have highlighted peripheral immune deficiencies and facilitated the classification of ALS patients into distinct groups based on a combination of host (cytokines) and microbiome-related variables (TTV and SCFA).
This novel approach holds promise for better identifying ALS subgroups that may exhibit differential responses to therapeutic interventions.

Fig. 2 A
Fig. 2 A Heatmap visualization based on cytokines, free fatty acids, and TTV distribution.Rows: biological variables; columns: patients.Green: HC; yellow: ALS patients; pink = bulbar onset; gray = spinal onset; orange = respiratory onset.Color key indicates metabolite expression value: blue: lowest; red: highest.B The sample distances

Fig. 3
Fig. 3Cluster analysis A Mean expression levels of biological markers within the five identified clusters, after normalization to ensure zero mean and unit variance across the dataset.This normalization process allows for a straightforward comparison of biological marker expressions among clusters, using the (normalized) average expression level (represented by a horizontal black line) as a reference point.The colorful lines depict the unique "biological profile" of each cluster.B and C Contingency table and cluster membership with graphical representation of the 2 test.Panel B shows considerable differences in the proportion of patients and healthy subjects across clusters, with a strong prevalence of healthy patients in Cluster 1.The mosaic plot represents the statistical association between cluster membership and ALS prognosis.The horizontal axis displays the dis-

Fig. 4 A
Fig. 4 A Summary of the fitted survival regression model.The table reports estimated values for each predictor variable in the model, including the intercept and the log of the scale parameter."Std.Error" is the standard deviation of the sampling distribution of the coefficient estimates, indicating the precision of the estimates."z" is the z-statistic for each coefficient, used to test the significance of each parameter."p-value" is the p-value associated with the z-value, indicating the significance of each predictor variable in the model; B The validity of the assumed Weibull distribution for the survival times can be assessed using residuals that account for censoring.This is done by computing the fitted model residuals and creating a Kaplan-Meier estimate.The estimated residuals and the assumed Weibull distribution are then plotted and compared to assess their fit.A good fit to

Funding
Open access funding provided by Università degli Studi di Firenze within the CRUI-CARE Agreement.This work was supported by the Italian Ministry of Health [Grant-No.RF-2016-02361616]; Matteo Pedone is supported by the European Union--Next GenerationEU and UNIFI Young Independent Researchers Call--BayesMeCOS [Grant no.B008-P00634].

Table 1
Concentrations of cytokines in the serum of ALS patients and healthy controls P-values were calculated with Mann-Whitney test; * p-value adj < 0.0035 Fig. 1 Torque teno virus abundance (copies/ml) in A ALS patients compared to HC; B ALS patients with different progression rate.***p-value < 0 .001*p-value< 0.05

Table 2
Concentrations of free fatty acids (µmol/l) determined in the serum of ALS patients and healthy controls P-values were calculated with Mann-Whitney test.* p-value adj < 0.0022