Introduction

Alcohol-associated liver disease (ALD) and metabolic dysfunction-associated steatotic liver disease (MASLD) are medical conditions that show a high mortality rate worldwide and they account for the majority of liver diseases (LD)s1,2,3. Pathologically, LD is characterized by steatosis, inflammation, fibrosis, hepatocellular carcinoma, and regeneration of the liver parenchyma. The progression of LD is now known to be a dynamic process with significant potential for resolution4,5. Therefore, early diagnosis is an essential part of the assessment and management of patients with chronic LD. Current diagnostic strategies range from the use of biomarkers to image modalities6. Classical diagnostic methods, such as biopsy and imaging studies, have disadvantages, such as high cost, need for hospitalization, inability to diagnose general health state, and side effects (bleeding, infection, pain, or mortality). As such, there is strong demand for reliable diagnostic biomarkers that provide insight into disease in lieu of invasive liver biopsy. Future personal medicine requires a simple and non-invasive method that can simultaneously identify and predict various health conditions in LD7.

The human gut microbiome is an ecosystem that consists of various microorganisms including bacteria, fungi, and viruses8. The gut microbiota plays multiple roles in human health related to the immune system, inflammation, nutritional balance, homeostasis, circulation, and metabolism. To date, the distribution of intestinal microorganisms is affected by several factors, such as genetics, nutrition, the environment, exercise, pathobiology, and lifestyle9. When the balance between the human and gut microbiota is disorganized, the composition and function of the microbiota are changed, which is referred to as dysbiosis10. Gut microbial imbalance is associated with the occurrence of various diseases, and each disease shows characteristic findings of intestinal microorganisms11. Because gut microbiota reflect an individual’s health and predicts prognosis, it has revealed promising biomarker for various diseases12. In this respect, the stool microbiota analysis-based diagnostic strategy has the advantage of being able to diagnose chronic LD quickly and accurately at a relatively low cost.

The gut–liver axis refers to the close bidirectional connection between the intestine and the liver via the portal vein, biliary tract, and systemic circulation. There has been increasing interest in the role of the gut microbiome in LD as a potential diagnostic modality, therapy, or prognostic marker13,14. Gut microbiota dysregulation plays a key role in the pathogenesis of LD through the gut–liver axis. It is difficult to identify their characteristics and rules because individual differences and diversity exist. The intestinal microbiota, which reflects an individual's health status, has a close relationship with LDs. Recent studies have demonstrated that the composition of the gut microbiome differs according to the progression of LD15,16. Accordingly, LD can be diagnosed using the distribution of the gut microbiome profiles17.

It is necessary to apply diagnostic technology that considers various factors using artificial intelligence to achieve accurate and personalized medical treatment. Machine learning (ML), one branch of artificial intelligence, has been utilized for the diagnosis of various diseases, including malignancy, metabolic disease, and intestinal disease18. Recently, artificial intelligence has been used in the diagnosis of LD. Most ML studies analyzed genetic information or clinical data19,20. Few ML studies have utilized gut microbiota in the diagnosis of LD, and data covering various stages of LD are insufficient. Currently, the ML technique is not applied in the diagnosis of ALD. Considering that LD progresses sequentially and that the microbiota is closely related to LD, we hypothesized that a gut microbiota-based ML strategy could be used for the diagnosis of ALD and MASLD.

Results

Study population

A prospective cohort study was performed between April 2017 and March 2020 (ClinicalTrials.gov NCT04339725). This study involved patients with LD who had follow-up visits at the hepatology department of University Hospitals. A total of 1,051 people underwent stool analysis for metagenomics by sequencing the 16S rRNA gene (Fig. 1). The workflow of the whole study is summarized in Fig. 2. Detailed baseline characteristics are explained in Table S1.

Figure 1
figure 1

Study design.

Figure 2
figure 2

Machine learning strategy.

Microbial differences in alcohol-associated liver disease

The control group revealed higher Shannon index scores than the elevated liver enzyme (ELE), cirrhosis, and liver cancer groups (p < 0.01). No difference in Shannon’s index was found among ELE, cirrhosis, and liver cancer. This result is also shown in the inverse Simpson index and Plelou evenness (Fig. S1A). The Chao1 index of the control and ELE groups was higher than that of the alcoholic cirrhosis and liver cancer groups (p < 0.01). The ELE group showed a difference compared with the cirrhosis and liver cancer groups in the Chao1 index (p < 0.01) (Fig. 3A).

Figure 3
figure 3

(A) Differences in alcoholic liver disease group. (B) α diversity. β diversity. (C) Composition of phylum. (D) Taxonomical features with a LDA score > 2.0 were plotted with cladogram and (E) LEfSe bar graph for each group. (F) Heatmap for different genus and species. ELE elevated liver enzyme; HCC hepatocellular carcinoma.

In the β diversity analysis and composition of phyla, each group showed differences (Fig. 3B,C). β diversity ordination using the Aitchison distance: applying PCA to the centered log-ratio (CLR) transformed counts. By using unweighted UniFrac distance, PCoA analysis indicated the different community compositions at the OTU level between the control and the liver fibrosis groups with variances of PC1 1.6% and PC2 2.4%.

There was a difference in the centroid location between groups (Pr [> F] = 0.001) (Fig. S1B). LEfSe analysis was performed to identify the distinct bacterial species between the control and ALD groups with an LAD score > 2 (Table S2 and S4, Fig. 3D,E).

In the liver cancer group, the levels of Proteobacteria (phylum), Betaproteobacteria and Gammaproteobacteria (class), Enterobacteriales (order), Enterobacteriaceae, Streptococcaceae, and Paraprevotellaceae (family), Lactobacillus, Streptococcus, and Bacillus (genus) (p < 0.05), and Bifidum and Hamsteri (species) were found to be higher.

Regarding cirrhosis, Firmicutes, Coriobacteriia, Coriobacteriales, Aerococcaceae, Collinsella, Granulicatella, Anaerosinus, Clostridium, Balaenopterae, Paraplantarum, Bolteae, Glycerini, and spiroforme were found to be increased (p < 0.005). The abundances of Erysipelotrichim, Erysipelotrichales, Bacteroidaceae, Butyricimonas, Bacteroides, and equi were increased in the alcoholic ELE group (p < 0.0005). Deltaproteobacteria, Bacteroidales, Clostridiales, Burkholderiales, Odoribacteraceae, S24_7, Lachnospiraceae, Ruminococcaceae, Erysipelotrichaceae, Desulfovibrionaceae, Odoribacter, Paraprevotella, Alistipes, Clostridium, Roseburia, Faecalibacterium, Gemmiger, Finegoldii, Putredinis, Inulinivorans, Lactaris, Islandicum, and Formicilis were significantly increased in the control group (Fig. 3F). Circos representation of the most abundant bacterial genera between the normal, ELE, cirrhosis, and cancer groups in the ALD datasets. Bacterial genera with LDA scores above 2.0 for all bacterial sequences were plotted (Fig. S1C).

Microbial differences in metabolic dysfunction-associated steatotic liver disease

No differences in Shannon (p = 0.339) or inverse Simpson (p = 0.401) indices were found between the control and ELE groups (Fig. 4A and Fig. S2A). The Chao1 index of the α diversity in the control group was lower than that in the ELE group (p = 0.002). PCoA indicated the different community compositions at the OTU level between the control group and the liver fibrosis group, with variances of PC1 of 1.9% and PC2 of 1.7% (Fig. 4B and Fig. S2B).

Figure 4
figure 4

(A) Differences in metabolic dysfunction-associated steatotic liver disease group. (B) α diversity. β diversity. (C) Composition of phylum. (D) Taxonomical features with a LDA score > 2.0 were plotted with cladogram and (E) LEfSe bar graph for each group. (F) Heatmap for different genus and species. ELE elevated liver enzyme.

In the ELE group, significantly increased relative abundances of Actinobacteria, Firmicutes, and TM7 were found at the phylum level (p < 0.05). At the class level, higher abundances of Actinobacteria, Coriobacteriia, Bacilli, Erysipelotrichia, and TM7_3 were found (p < 0.05). At the species level, a higher abundance of Coprophilus, Indistinctus, Balaenopterae, Hellenica, Equi, Bolteae, Pullicaecorum, Variabile, Glycerini, Mobilis, Biforme, and Mitsuokai was found in the ELE group (p < 0.005).

In the control group, a lower abundance of Bacteroidia and Betaproteobacteria was found at the class level (p < 0.05). At the order level, Desulfovibrionales decreased significantly in the control group (p < 0.05). At the family level, a lower abundance of Bacteroidaceae was identified, and Bacteroides decreased significantly at the genus level (p < 0.05). At the species level, lower abundances of Hamsteri, Clostridium, and D168 were found in the control group (p < 0.05) (Fig. 4C–F). Circos representation of the most abundant bacterial genera between the normal and ELE groups in MASLD datasets. Bacterial genera with LDA scores above 2.0 for all bacterial sequences were plotted. (Table S3, S4 and Fig. S2C).

Diagnostic value of supervised machine learning models and external validation

Table 1 presents performance measures of the four different ML algorithms evaluated on each of the testing datasets of ALD and MASLD. Here, the 40 reduced feature dimensions are considered for both the ALD and MASLD groups. CNN in classification combined with PCA in feature reduction performed better than other models. They achieved AUC values ranging between 0.92 and 0.96 for ALD datasets and an AUC of ≈ 0.96 for an MASLD dataset. Tables present performance measures of the CNN model for four different numbers (20, 40, 60 and 80) of feature dimensions reduced by the PCA model for the ALD and MASLD datasets (Tables S4 and S5, respectively) as well as the architecture of the CNN model utilized in this study (Table S6).

Table 1 Performance measures of different machine learning algorithms.

With the ALD datasets, precision, recall, and accuracy were in the range of 0.89 to 0.96, 0.94 to 0.99, and 0.94 to 0.98, respectively, for the 20 reduced feature dimensions; in the range of 0.91 to 0.96, 0.95 to 0.99, and 0.95 to 0.97, respectively, for the 40 reduced feature dimensions; in the range of 0.88 to 0.96, 0.94 to 0.98, and 0.94 to 0.98, respectively, for the 60 reduced feature dimensions; and in the range of 0.88 to 0.97, 0.94 to 0.99, and 0.93 to 0.98, respectively, for the 80 reduced feature dimensions (Fig. 5A). With the MASLD datasets of the normal and ELE groups, precision, recall, and accuracy were 0.94, 0.93, and 0.92, respectively, for the 20 reduced feature dimensions; 0.95, 0.95, and 0.94, respectively, for the 40 reduced feature dimensions; 0.97, 0.95, and 0.95, respectively, for the 60 reduced feature dimensions; and 0.98, 0.96, and 0.96, respectively, for the 80 reduced feature dimensions (Fig. 5B). The CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity, showing good diagnostic classification power for predicting and identifying the disease groups.

Figure 5
figure 5

(A) Diagnostic values of machine learning strategy in liver disease. Alcoholic liver disease. (B) Metabolic dysfunction-associated steatotic liver disease. (C) External validation. ELE elevated liver enzyme; HCC hepatocellular carcinoma.

In the external validation, the AUC values for ALD and MASLD (vs control) were > 0.90 and 0.88, respectively, considering the higher values between the combinations CNN/PCA and CNN/RP (Fig. 5C).

Discussion

In LD, artificial intelligence has been applied for detecting fibrosis, differentiating liver mass, predicting the prognosis of chronic LD, and diagnosing MASLD21. In a previous report, the development and verification of ML and artificial intelligence using gut microbiome data for cancer treatment and diagnosis was evaluated22. Loomba et al. provided a random forest classifier mode that showed excellent diagnostic accuracy in detecting advanced fibrosis in MASLD (AUC 0.936)23. In our clinical data for ALD and MASLD, ML models of convolutional neural networks combined with principal component analysis with 40 taxonomic features achieved > 0.90 in the AUC scores on the 8 paired groups, showing the potential of using ML for predictive diagnosis. Compared with other studies utilizing the microbiome, our current data first link metagenomic features with ALD and MASLD, which leads to the discovery of a potential ML model for diagnosing ALD and MASLD.

In the external validation results using patient data from different regions, the AUC was over 0.90 for ALD and 0.88 for MASLD. These results show that our ML strategy can be useful for diagnosing ALD and MASLD.

As typical microbiota signatures affect the development and progression of human diseases, demonstration of the relationship between the gut microbiome and disease characteristics might be the cornerstone in future human health care24,25. With advances in microbiome-related technologies and personalized medicine, the vast amount of data and the complexity of the data limit statistical analysis and predictive potential. Considering the diversity of gut microbiota due to multiple factors, universally applicable microbiota-based metagenomic signatures are not known26. Recently, ML and artificial intelligence technology have been actively applied to the analysis of large-scale health care information and the utilization of microbiome-based big datasets, and application cases are being reported in various diseases27.

In the ALD, specific AUCs for the diagnosis of liver cirrhosis reached 0.97, indicating that alcoholic cirrhosis patients showed a typical dysbiosis pattern compared with the pattern of the control or ELE group. Anaerosinus, Glycerini, Coriobacteriia, Granulicatella, Balaenopterae, Collinsella, Collinsella, Coriobacteriales, Paraplantarum, Aerococcaceae, Spiroforme, and Bolteae were abundant genera in our study. In a previous report, the abundance of Bacteroides, Escherichia, Shigella, and Prevotella was closely related to portal hypertension in cirrhosis patients28. Based on previous research, Roseburia and Faecalibacterium prausnitzii were regarded as good functioning strains. In this study, the normal control group revealed enrichment of Roseburia and Faecalibacterium. Taken together, the results indicate that the ML method using the composition of the macrobiome can be an efficient modality in the diagnosis of LD, especially alcoholic cirrhosis.

The investigators demonstrated that ALD is associated with disruptions in the gut microbiome. Bacteroidetes abundance was decreased in the heavy drinking control29. In another study, alcoholic ELE patients showed decreases in Verrucomicrobia, Akkermansia, and Bacteroides30. In this study, α and β diversity decreased according to disease progression. The Proteobacteria composition was increased in the cirrhosis group. The abundances of Bacteroidaceae, Bacteroides, Equi, Butyricimonas, Erysipelotrichi, and Erysipelotrichales were elevated in the ELE group. In addition, each group revealed different compositions of microbiota in our results. In ALD, the composition of the microbiota shows specific findings for each disease.

In our results, the diagnostic AUC of microbiota-based ML for Metabolic dysfunction-associated steatohepatitis (MASH) was 0.93 ± 0.11. In other reports, the decision-tree algorithm of the Canadian dataset diagnosed MASLD with 76% accuracy and an AUROC of 0.7331. Another study revealed that the AUROC of MASH was 0.83 to 0.88 in a large US population32. Considering that our data showed a higher diagnosis rate than other data using artificial intelligence utilizing clinical data, ML analysis of the intestinal microbiota showed a higher diagnostic rate. As the occurrence of MASH in Asia has increased in recent years, our results are considered a useful tool for the diagnosis of MASH in the future.

In this study, the CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity, showing good diagnostic classification power for predicting and identifying the disease groups. The reason might be that the architecture of the CNN used in this study is more suitable for capturing patterns within individual OUT sequences or understanding sequential relationships than other models such as RF or MLP. In a previous study, this gradient-boosting machine algorithm provided the best prediction of liver cancer risk in patients with virus infection33. Considering that ALD and MASLD are major public health problem and that approximately 50% of cirrhosis cases are related to ALD and MASLD in Asia34, it is necessary to develop various diagnostic technologies, including big data or ML techniques, that go beyond statistics.

We used the 16S rRNA method rather than the shotgun method because it is an inexpensive and easy method for rapid diagnosis and easy clinical use. In terms of diagnostic accuracy, the 16S rRNA-based ML strategy showed a high score and showed the basics of personal medicine in real time. Recently, multiomics-based analysis has been applied for the diagnosis, treatment, and prediction of various diseases35. The ML method using images, pathological tissues, and clinical results will be used in various fields of LD in the future. In South Korea, the prevalence of MASLD-related cirrhosis and liver cancer is low. Therefore, ML analysis for MASLD-related cirrhosis and liver cancer was not performed in this study.

MASLD is defined by using cardiometabolic markers. However, all patients did not perform liver biopsy for the diagnosis of MASH. To reduce selection bias, we used ELE group. Since MASLD diagnosis can be easily used clinically without a biopsy, it is expected to increase accessibility to the use of AI in the future. In South Korea, there are few cases of advanced liver disease (liver cirrhosis or HCC) associated with MASLD. We did not enrolled MASLD-related cirrhosis and HCC. Regarding elevated alpha diversities in control group compared with ELE group in MASLD, there are cases where the distribution of microbiotas changes and diversity increases due to liver disease.

Conclusion

This study provides scientific evidence to support the excellent diagnostic accuracy of the microbiota for ALD and MASLD, suggesting holistic insight for further research. The CNN model trained with the 40 reduced feature dimensions had slightly higher sensitivity and specificity. This work developed an excellent microbiota-based ML strategy for the diagnosis of ALD and MASLD. Along with the development of personalized medicine, diagnostic technology using big data and genetic information will replace imaging or liver biopsy. The intestinal microbiota, which reflects an individual's health status, has a close relationship with LDs. Microbiota-based ML strategies can be used to diagnose ALD and MASLD to achieve personalized treatment and prevention of side effects. In the future, based on the microbiota-based ML strategy, we expect to develop a ML method for the treatment and follow-up.

Methods

Study design and participants

ALD was diagnosed on the results of alcohol history, liver biopsy, blood chemistry, or imaging study (ultrasound or computed tomography scan). The ALD group was subgrouped by control, ELE, cirrhosis, and liver cancer. Alcoholic ELE patients were defined if they had abnormal liver enzymes [aspartate aminotransferase (AST) ≥ 50 IU/L, AST/alanine aminotransferase (ALT) > 1.5, and AST and ALT < 400 IU/L] and excessive alcohol consumption (male > 60 g/day and female > 40 g/day) with last alcohol drink within 8 weeks of jaundice onset (bilirubin > 3 mg/dL). The control group was recruited from a medical check-up center. Patients meeting the following criteria were excluded: age > 70 years, hepatitis A, B, C, and E virus-related cirrhosis, HIV infection, Wilson’s disease, biliary obstruction, sepsis, drug-induced liver injury, autoimmune LD, history of high-dose steroid or antibiotics, presence of liver tumor or history of other cancer, or pregnancy.

MASLD was diagnosed on the definition as described in the consensus statement on new fatty liver disease nomenclature36,37. Cardiometabolic criteria including body mass index, fasting glucose, medication history, blood pressure, or cholesterol level were used for the diagnosis of MASLD. Normal control group did not have cardiometabolic criteria. Patients with elevated liver enzymes [AST or ALT ≥ 50 IU/L] or hepatitis on liver pathology were included in the ELE group. They did not drink excessive alcohol (male > 210 g/week and female > 140 g/week). Patients with autoimmune LD, alcohol use disorder, pancreatitis, hemochromatosis, viral LD, pregnancy, Wilson’s disease, drug-induced liver injury, and other cancers were excluded.

Cirrhosis was diagnosed based on the presence of complications (varix, ascites, and encephalopathy), blood tests, imaging findings, fibroscan, or pathological liver results. Liver cancer was diagnosed by two or more imaging tests, such as computer tomography, magnetic resonance imaging, angiography or contrast ultrasound. In addition, subjects taking drugs that affect the gut microbiota were excluded at enrollment. For the control group, we included healthy subjects who visited the center for a health check-up.

Baseline studies included family history, diet pattern, alcohol history, abdominal ultrasound and computed tomography scan, X-ray, electrocardiography, complete blood count, electrolytes, liver function test, viral markers, and Child‒Pugh score. Blood analysis was performed using standard methodologies. Serum biochemical parameters included AST, ALT, albumin, bilirubin, alkaline phosphatase (ALP), gamma glutamyl transpeptidase (GGT), blood urea nitrogen, creatinine, international normalized ratio, α-fetoprotein, carcinoembryonic antigen, prothrombin time, blood glucose, and total cholesterol. The levels of hepatitis A, B, and C and other virus markers were evaluated. Antinuclear antibody, antimitochondrial antibody, and antismooth muscle antibody tests were also performed.

This project followed the ethics of the 1975 Helsinki Declaration, as reflected by a prior approval by the Chuncheon Sacred Heart Hospital institutional review board for human research (2016–134). Informed consent was obtained from all participants. All authors had access to the study data and reviewed and approved the final manuscript.

Validation cohort and data collection

Machine learning with internal validation was performed in 464 subjects, and 210 subjects were divided into an external validation group. For external validation, patient data collected from hospitals in different regions were used.

Stool sample and processing

Sequencing was carried out according to the manufacturer’s instructions at Chunlab, Inc. (Seoul, Republic of Korea) with the Illumina MiSeq platform using reagent kit V3 in PE 250 bp mode. Microbiome taxonomic profiling was conducted with the EZBioCloud platform (ChunLab Inc., Republic of Korea) using the database version PKSSU4.0.

Human feces were stored at − 20 °C as soon as the patient received 2–3 g of feces using the kit (stool paper and stool box) and moved to − 80 °C within 1 day. Genomic DNA for metagenomic sequencing was extracted with a QIAamp stool kit (Qiagen, Hilden, Germany), and the library was prepared with a NEBNext Ultra II FS DNA Library Prep Kit for Illumina (New England BioLabs, Ipswish, MA, USA) according to the manufacturer’s directions. The quantification of libraries was checked using a Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and confirmed by quantitative polymerase chain reaction (qPCR) with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, Wilmington, MA, USA). The quality of the libraries was assessed on a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) using a DNA 12,000 chip. All libraries were sequenced on the NovaSeq 6000 platform (Illumina, USA) with paired-end 150 bp reads.

The analysis was performed following our previous reference38. In brief, DNA was extracted with a QIAamp stool kit, and amplification of the V3-V4 region of the bacterial 16S rRNA gene was conducted using barcoded fusion primers. The forward fusion primer contained the p5 adapter, i5 index, and gene-specific primer 341F (5′-AATGATACGGCGACCACCGAGATCTACAC-XXXXXXXX-TCGTCGGCAGCGTCAGATGTG TATAAGAGACAG-CCTACGGGNGGCWGCAG3′; underlining indicates the target region primer and X indicates the barcode region), and the reverse fusion primer contained the p7 adapter, i7 index, and gene-specific primer 805R (5′-CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTCTCGTGGGCTCGGAGATGTG TATAAGAGACAG-GACTACHVGGGTATCTAATCC-3′), which included sequencing adapters and dual-index barcodes of the Nextera XT kit (Illumina, San Diego, CA, USA). The amplification was performed in the C1000 touch thermal cycler PCR system (Bio-Rad Laboratories, Inc., Hercules, CA, USA) with the following conditions: initial denaturation of 3 min at 95 °C; followed by 25 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, extension at 72 °C for 30 s and final extension at 72 °C for 5 min. Each amplified PCR product was confirmed with 1% agarose gel electrophoresis and visualized on a Gel Doc XR + imaging system (Bio-Rad Laboratories, Inc., USA). The amplified products were purified and size-selected by Agencourt AMPure XP beads (Beckman Coulter, Chaska, MN, USA). The library was constructed with pooled PCR products, and the quality of the library was assessed on a Bioanalyzer 2100 (Agilent, USA) using a DNA 12,000 chip and quantified by qPCR with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, USA).

Sequence and statistical analysis

The sequencing data were processed using the Quantitative Insights Into Microbial Ecology (QIIME version 2). The low-quality sequence reads were removed following the criteria: (1) reads with a length of < 150 bp, (2) reads with an average Phred score of < 20, (3) reads containing ambiguous bases, and (4) reads containing mononucleotide repeats of > 8 bp. High-quality reads were clustered into 16S rRNA operational taxonomic units (OTUs) with ≥ 97% sequence homology39. The taxonomic classification of each OTU was performed with VSEARCH by comparing the representative sequence set against the SILVA reference database40.

After filtering samples with read counts greater than 500, the reads were grouped at the phylum level (using phyloseq), and the relative abundance was estimated at the phylum level by groups. There was a total of seven phyla. The taxa were sorted by abundance to improve the visualization and then plotted on box plots according to group and faceted by phylum using the raw counts. Many samples had a high number of Bacteroidetes, followed by Firmicutes and Proteobacteria. Most samples had low read counts for other phyla, with some outlying samples. To formally test for a difference in the phylum-level abundance, a multivariate test for differences in the overall composition between groups of samples was conducted by using the HMP package; herein, a Dirichlet-multinomial distribution is assumed for the data, and null hypothesis testing is conducted by testing for a difference in the location (mean distribution of each of the taxa) across groups accounting for the overdispersion in the count data.

Taxon abundance at the phylum, class, order, family, genus, and species levels was calculated and statistically compared among groups using the R stats package. Based on the tables generated in QIIME, alpha diversities, including Chao1, Simpson, and Shannon, were calculated. The significant differences in alpha diversity metrics were determined using the R package “Vegan”. To investigate the structural variation in microbial communities, beta diversity analysis was performed using UniFrac distance metrics 24 and was visualized via principal component analysis (PCA), principal coordinate analysis (PCOA), and nonmetric multidimensional scaling (NMDS). We checked the separation between other disease groups and normal group samples, suggesting some differences in the communities according to sample type. We tested whether the samples clustered beyond that expected by sampling variability using ADONIS.

The linear discriminant analysis (LDA) effect size (LEfSe) method based on the Kruskal‒Wallis test was conducted to identify significantly abundant taxa between different groups, where a linear discriminant analysis score > 2.0 was defined as the threshold for selecting the discriminative features. Quantitative data are expressed as the mean ± standard deviation (SD) unless otherwise stated. Comparisons were made utilizing analysis of variance (ANOVA), Kruskal‒Wallis test, general linear model analysis (repeated regression), paired sample t test, or independent-sample t test for continuous variables. The chi-square test or Fisher’s exact test was used for the comparison of groups. Data were analyzed with statistical software (SPSS, version 22.0, SPSS, Inc., Chicago, IL, USA), R stats package (www.r-project.org, Austria), and GraphPad Prism version 8.0 for Windows (GraphPad Software, San Diego, CA, USA). For all tests, p values < 0.05 were considered significant.

Machine learning modeling for OTU feature reduction and classification

The abundance of microorganisms at the genus level was used as a feature. The abundance was normalized by applying a centered log ratio (clr) transformation. Three different nonsupervised ML algorithms for feature reduction were trained with the features of the normalized OTUs on the platform: independent component analysis (ICA), principal component analysis (PCA), and random projection (RP), considering 20, 40, 60, and 80 reduced features. For each of these numbers on the platform, four different supervised ML algorithms for classification were trained using the reduced features: support vector machine (SVM), random forest (RF), multilevel perceptron (MLP), and convolutional neural network (CNN) (Table 2 and Table S4). To evaluate the performance concerning model architectures for predictive classification and diagnostics of LDs, SVM, RF, MP, and CNN algorithms were constructed using various numbers of features reduced from the three nonsupervised ML algorithms. Additionally, four different supervised ML algorithms for classification were also trained with features having LDA score greater than 2.0. Data were assigned into training (70%) and testing internal validation (30%) datasets. The training performance of the different ML models was evaluated using Stratified ShuffleSplit tenfold cross-validation41, and the process was repeated 10 times. Hyperparameter tuning was automatically executed by caret, testing 60 different values for each hyperparameter.

Table 2 The modeling and performance testing process of 52 combinations of ML algorithms, 13 feature reduction and 4 classification algorithms.

A total of 52 combinations of MLs (ML for feature reduction with an ML for classification) for each of four different numbers (20, 40, 60 and 80) of reduced features were performed. The process was optimized using 60 hyperparameter variations and evaluated by Stratified ShuffleSplit tenfold cross validation. In the testing internal validation phase, the prediction performance of each combination of ML models was assessed using parameters including the area under the receiver operating characteristic curve (AUC), accuracy, recall, precision, and F1 score. Box plot representations of the AUC, accuracy, recall, precision, and F1 score values were generated using the ggplot2 package in R. The entire process was repeated for each pair of subgroups (7 ALD and 1 MASLD) within each group.

The computing machine we used for timestamped runs is on Ubuntu 18.04 and is equipped with an Intel Core i9-9820X CPU (10 cores), 64 GB memory, and an NVIDIA GTX 1080 Ti GPU. The scikit-learn Python library was utilized for ICA, PCA, RP, SVM, and RF. Additionally, the Keras Python library was employed for MLP and CNN.