Introduction

Colorectal cancers (CRC) are the third leading cancer, with approximately 1.9 million new cases and 930,000 deaths reported globally in 2020. With the increase in its prevalence, the deaths are predicted to increase to 1.6 million by 2040 [1]. Although South Asia has reported a lower age standardized incidence of 14.8/100,000 person-years for males and 18.4/100,000 person years for females in 2020 compared to high income countries, the prevalence is gradually increasing [1]. In Sri Lanka, the age standardized incidence was 10.2/100,000 person years in 2019 [2] and the prevalence is likely to increase due to the change in population demographics, diet and lifestyle changes as seen in many countries in South Asia.

Risk factors for the development of CRC have been identified and include smoking, inflammatory bowel disease, diet, obesity, lack of exercise and alcohol consumption [3]. Dietary patterns such as consumption of red meat, a diet low in calcium and fiber and a diet low in milk have shown to be risk factors for development of CRC [3]. Obesity and hyperglycemia are also well-defined strong risk factors [4]. The increased prevalence of diabetes [5], obesity along with the change in diet is likely to lead to a significant increase in CRC, in regions such as South Asia, where currently a lower incidence is reported. While the mechanisms by which diet, obesity and diabetes contribute to development of CRC is not known, these directly influence the composition and diversity of the gut microbiome [6, 7]. CRC is associated with gut microbial dysbiosis with overabundance of bacteria such as Bacteroides fragilis, Escherichia coli, Enterococcus faecalis, Fusobacterium nucleatum and Streptococcus gallolyticus, with Bacteroides fragilis being the commonest bacterial strain [8, 9]. In fact, Bacteroides fragilis along with species such as Fusobacterium nucleatum, Porphyromonas asaccharolytica, Parvimonas micra, Prevotella intermedia, were shown to be overabundant in patients with CRC across many geographical locations such as Europe, America and China [10]. Higher abundance of Fusobacterium species has also been reported in adenomas, which are premalignant lesions leading to CRC [9].

The diet is an important factor that leads to gut microbial diversity, and low intake of milk, fiber and whole grain products along with high intake of red meat being significantly associated with the risk of developing CRC [11]. The diet is known to play a significant role in the composition of the gut microbiome, with temporality changes occurring even within 24 h of consuming a different diet [12]. Prolonged dietary changes are known to induce prolonged and permanent changes in the gut microbiome [13]. The gut microbiome shows markedly diversity based on the geographical location, with significant differences in Westen populations compared to African and South American populations [14]. In India, individuals from regions primarily consuming a plant-based diet were found to have an overabundance of Prevotella species compared to those who were consuming animal and plant-based products, who had abundance of the species Bacteroides, Ruminococcus, and Faecalibacterium [15]. Therefore, there are significant differences in the gut microbiome of the South Asian population compared to other populations, which may vary in those with CRC.

Although species such as Bacteroides fragilis, Fusobacterium nucleatum, Porphyromonas asaccharolytica etc. have shown to associate with CRC in the West and in Chinese communities where a red meat is readily consumed, red meat consumption is very much less in South Asian countries such as India and Sri Lanka. Therefore, the type of bacterial phyla and genera associated with CRC could differ. Furthermore, except for a few studies in India, there are very few studies describing the composition of the gut microbiome in the South Asian population, while there are none from Sri Lanka. In this study, we evaluated the gut microbiome in health individuals, individuals with CRC, individuals with diabetes did not have CRC or premalignant lesions (age and sex matched) and in those with premalignant lesions. This data would be important to identify suitable dietary interventions to possibly change the gut microbiome composition, as potential prevention strategies for development of CRC.

Methods

Participant characteristics

We recruited 112 individuals who underwent colonoscopy at Colombo South Teaching Hospital, which is a tertiary care hospital in Sri Lanka between January 2017, and April 2018, following informed written consent. Individuals who had taken antibiotics in the past 1 month were excluded from the study as it could affect the composition of the gut microbiome. All clinical details regarding altered bowel habits, abdominal pain, loss of weight, appetite along with laboratory and radiological investigations such as full blood count, ultrasound scanning of the abdomen and CT scans were recorded. Biopsies were obtained at the time of colonoscopy from the rectum, hepatic flexure, transverse colon, sigmoid colon and anal verge, as a part of their routine colonoscopy. The biopsies were then evaluated for the presence of CRC and grading was carried out according to TNM staging classification. The clinical characteristics of these individuals are shown in supplementary Table 1.

Ethics approval

Ethics approval was obtained from the Ethics Review Committee of the University of Sri Jayewardenepura (ERC 35/16). All patients who participated gave informed written consent.

Collection of stool and biopsy samples used for microbiome studies

Stool samples were collected 2 weeks following colonoscopy from the above cohort as studies have shown that the gut microbiome recovers quickly after bowel preparation for colonoscopy and returns to normal in 14 days [16, 17]. Accordingly stool samples were collected from 24 patients who were confirmed as having CRC, 10 with those who were found to have premalignant lesions, 50 healthy individuals (normal colonoscopy who did not have diabetes or any other illnesses) and 28 patients with diabetes mellitus (who had normal colonoscopy findings). The fasting blood sugar and lipid profiles were assessed in the 50 healthy individuals to exclude the presence of diabetes and hyperlipidemia. Biopsies of 18 patients who were confirmed to have CRC and biopsies of 18 healthy individuals were also included in the analysis of the microbiome.

DNA extraction and Metagenomics analysis

The biopsy samples and stool samples were transported to the laboratory within 24 h after collection. The DNA was extracted from both stool and biopsy samples as soon as they were received in the laboratory. DNA from stool samples was extracted using DNeasy Powersoil kit (QS, Hilden, Germany) whereas DNA from tissue samples was extracted using DNeasy PowerLyzer Tissue and Cells Kit (QS, Hilden, Germany) according to the manufacturer’s instructions. Extracted DNA was stored at -80 C until sequencing was carried out. 16S metagenomic sequencing was carried out by Diversigen, USA. The V4 hypervariable region of the bacterial 16S rRNA marker gene (16Sv4) was PCR-amplified in duplicate with primers 515 F-OH1 and 806R-OH2, which enabled us to characterize the gut microbiome up to the genus level. DNA libraries were prepared using PCR products according to the Nextera XT DNA Library Preparation kit guide (Illumina, CA, USA). These were then pooled and sequenced on the Illumina MiSeq platform with a read length of 2 × 250 bp.

Bioinformatics analysis

Raw read pairs were de-multiplexed according to the unique molecular barcodes using the MiSeq (Illumina) inbuilt tools. Resulting FASTQ reads were processed with the USEARCH [18] suite adhering to the best practices of the USEARCH guidelines. Paired-end reads were merged via fastq_mergepairs command with a minimum overlap of 50 bases and maximum mismatches of 5. USEARCH quality filter was set to discard merged reads containing above 5% mismatches. Singletons (unique sequences that are found only once), were discarded using USEARCH command as they can create many spurious OTUs during the downstream analysis. Also, chimeric sequences were detected and removed with UCHIME (Edgar, 2016). Clustering of the merged sequences into operational taxonomic units (OTUs) was performed using UPARSE-OTU algorithm [18]. Then the sequences were binned at a similarity threshold of 97% and a list of representative OTU sequences was generated. Diversity and taxonomic analysis were done by mapping the representative OTUs against the SILVA, version 132 16S database [19]. The OTU abundance table and the taxonomy table were created with USEARCH commands while a 16S rRNA gene-based phylogenetic tree was generated from representative FASTA sequences. Beta diversity matrixes were generated using the phylogenetic tree, which was plotted during the statistical analysis. OTU counts were normalized across the samples to 2965, which was the lowest number of reads a sample had acquired, to avoid potential bias caused by differing sequencing depths.

A total of 2,513,198 raw sequencing read pairs were obtained with a median count of 17,375 (SD = 6202) read pairs per sample. After filtering low-quality reads, artifacts, and singletons, 1,786,919 pairs (71.1%) were mapped against the 16 S database. The highest mapped read count in a sample was 19,349 while the lowest being 2965 reads, hence all the samples were rarefied to 2965 reads.

Statistical analysis

The resulting OTU table, taxonomy table and sample data tables were analyzed with Rstudio\R version 4.0.3 [20] using the phyloseq package, version 3.12 [21] to generate abundance data for each hierarchical level. Alpha diversity in bacterial communities was calculated by R vegan package, version 2.4 [22] and presented by the number of observed OTUs and Shannon index. Non-parametric two-sample t-test was used to compare the alpha diversity metrics between the healthy and CRC samples. Weighted and unweighted UniFrac distances of beta diversity was plotted with principal coordinates analysis (PCoA). Analysis of similarity between gut tissue and stool samples were calculated with Bray-Curtis dissimilarity model using the “anosim” function at 1,000 permutations and species accumulation curves with the “specaccum” function in vegan package. All visualizations were done using R\ggplot2.

OTU counts were converted to relative abundance percentages for each sample using the R\funrar, version 1.4.1 [23]. Significant differences between the healthy and CRC groups for gut tissue samples were calculated at phyla and genera hierarchical levels using the Mann–Whitney U test. For stool samples of healthy vs. those with CRC, healthy vs. those with a pre-malignant lesion and healthy vs. diabetes, the groups were tested at phyla and genera levels using Kruskal-Wallis test. P values were corrected using Benjamini-Hochberg false-discovery rate (FDR) [24] and the significance assessed at 0.05. As an additional step, to compare the microbiome of the cohort of patients with diabetes with that of the groups who were classified as having pre-malignant lesions or and CRC, Kruskal-Wallis test with FDR corrections using Benjamini-Hochberg was used for analysis at phyla and genera levels. This was followed by a post-hoc analysis using Dunn’s multiple comparisons test on taxa that were identified to be different to determine which pair/pairs of groups were different among the subgroups.

Results

Overall comparison of the faecal and tissue microbiome in healthy individuals and patients with CRC

We assessed the gut microbiome in a total of 148 samples, of which 112 were stool samples and 36 tissue biopsy samples. The resulting OTU abundance data consisted of 818 bacterial genera belonging to 17 different phyla. Species accumulation curves (Fig. 1) of each group against sampling effort (sites) indicated the highest species discovery in healthy stool samples. However, all the curves appear to overlap, except the curve for healthy gut tissue samples that exhibited slightly less species discovery over sampling effort.

Fig. 1
figure 1

Species accumulation curves of healthy and CRC patients for each sample type. Healthy stool samples demonstrated the highest OTU discovery, whereas healthy gut tissue samples showed the least OTU discovery

Microbial diversities between and within the sampling sites and their subgroups were measured using Shannon and Simpson indexes. In general, tissue biopsies showed higher richness and evenness suggesting higher bacterial complexity compared to faecal samples, although these differences were not significant. Also, there were no significant differences between the stool and tissue subgroups (Fig. 2). However, beta diversity within stool and tissue samples was tested by mapping weighted UniFrac distances into Principal Coordinates Analysis (PCoA), which indicated significant clustering between the microbial composition of stool subgroups (R = 0.0386, p = 0.025, Fig. 3A). Also, the analysis of similarities (ANOSIM) test indicated significant differences in stool samples between the different patient sub-groups (healthy, those with premalignant lesions and those with CRC) (R = 0.121, P < 0.001) (Fig. 3B).

Fig. 2
figure 2

Comparison of the faecal and tissue microbiome (alpha diversity) in healthy individuals, patients with CRC, with premalignant lesions and with diabetes. Alpha diversity in bacterial communities was calculated and presented by the number of observed OTUs and Shannon index in tissue samples (T) and stool samples (S), in the in those with CRC (C), healthy individuals (N), those with diabetes mellitus (M) and in those with premalignant lesions (PM). P values were calculated with Kruskal-Wallis test

Fig. 3
figure 3

Comparison of the beta diversity of faecal and tissue microbiome in healthy individuals, patients with CRC, with premalignant lesions and with diabetes. A) The Principal Coordinates Analysis (PCoA) using weighted UniFrac distances indicating significant clustering between the stool subgroups of those with CRC (C), those with diabetes mellitus (M), healthy individuals (N) and those with premalignant leisions (PM), (p = 0.02). B) The analysis of similarities test (ANOSIM) for samples obtained from colonic tissue (T) and stools (S), in those with CRC (C), healthy individuals (N), those with diabetes mellitus (M) and in those with premalignant lesions (PM) was carried out to assess the differences in microbial composition. The microbial composition between gut tissue subgroups was consistent while stool samples exhibit significant dissimilarity (p < 0.0001) between the subgroups

Compositional analysis of the stool and tissue samples of patients and healthy individuals

The overall analysis of the taxonomic composition in all subgroups (health, those with CRC, premalignant lesions and diabetes) revealed that more than 97% of the sequences collected were classified into six dominant phyla: Firmicutes (37%), Bacteroidetes (31%), Proteobacteria (19%), Actinobacteria (7%), Fusobacteria (2%) and Verrucomicrobia (2%). However, the relative abundances of Fusobacteria (p < 0.001), and Proteobacteria (p < 0.005) were overall significantly higher in tissue samples compared to stool samples, whereas Firmicutes (p = 0.03) and Actinobacteria (p = 0.03) were significantly abundant in stool samples. At genera level, overall relative abundances of genus Fusobacterium (p < 0.001), Acinetobacter (p < 0.001), Escherichia-Shigella (p < 0.05) were significantly higher in gut tissue, while Romboutsia (p < 0.01) and Prevotella (p < 0.05) were significantly higher in stool samples.

The tissue biopsy samples of patients with CRC had a high abundance of bacterial of the phylum Bacteroidetes, Fusobacteria and Verrucomicrobia compared to the tissue samples obtained from healthy individuals. In contrast, Proteobacteria, Firmicutes and Actinobacteria were more abundant in tissue samples of healthy individuals compared to those with CRC (Fig. 4). However, there were no significant differences of bacterial composition detected at phyla or genus level between those two sample groups. The genus Gemella was enriched in tissue of patients with CRC, while Streptococcus, Escherichia and Shigella were less abundant compared healthy controls.

Fig. 4
figure 4

The relative abundance of Phyla of tissue samples of healthy individuals in comparison to patients with CRC. The relative abundance of different bacterial Phyla in tissue samples in healthy individuals (n = 18) and in patients with CRC (n = 18) was assessed. Phylum Firmicutes, Bacteroidetes, Proteobacteria and Fusobacteria were found to be highly abundant across the tissue samples

Differences in the composition microbiome of stool samples in patients with pre-malignant, CRC and healthy individuals

The most abundant bacteria phylum of stool samples of patients with CRC was Bacteroidetes compared to the stool samples of healthy and in those with pre-malignant lesions. The overall abundance of Firmicutes was higher in stool samples of those with premalignant lesions compared to healthy individuals and patients with CRC (Fig. 5A). However, significant differences were only seen in the relative abundances of phylum Epsilonbacteraeota and Elusimicrobia in the stool samples of the three subgroups (healthy, those with CRC or with premalignant lesions) (p = 0.03) using the Kruskal-Wallis test (Fig. 5B). Post hoc pairwise testing revealed that stool samples of patients with pre-malignant lesions had significantly higher abundances (p < 0.01) of bacteria belonging to the phyla Epsilonbacteraeota and Elusimicrobia compared to healthy controls.

Fig. 5
figure 5

The relative abundance of phyla in stool samples of patients with CRC, healthy individuals and in those with premalignant lesions. A) The differences in the relative abundance of different bacterial phyla between stool samples of healthy individuals (n = 50), those with CRC (n = 24), and in those with a pre-malignant lesion (n = 10) were compared. Phylum Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria and Fusobacteria were dominant across the stool subgroups. B) The differences in the abundance of the phyla Epsilonbacteraeota and Elusimicrobia were compared in stools between those with CRC (C), those with diabetes mellitus (M), healthy individuals (N) and those with premalignant lesions (PM). P-values were calculated using Dunn’s multiple comparisons test with Benjamini–Hochberg false discovery correction

There were significant differences in the relative abundance of bacteria belonging to five genera Christensenellaceae, Enterobacteriaceae, Mollicutes, Romboutsia and Ruminococcaceae in the stool samples of these three subgroups, namely healthy individuals, those with premalignant lesions and those with CRC (p < 0.05) (Fig. 6). Post-hoc tests further confirmed that stool samples of patients with pre-malignant lesions had significantly high abundances of Christensenellaceae, Enterobacteriaceae, Mollicutes and Ruminococcaceae (p < 0.001) compared to patients with CRC, and healthy individuals. Bacteria of genus Romboutsia was significantly higher (p < 0.01) in healthy stool samples compared to the stool samples of patients with CRC.

Fig. 6
figure 6

Significantly different genera in stool subgroups. The relative abundance of bacterial genera were compared in stool samples (S) in healthy individuals (S-N), pre-malignant (S-PM), patients with CRC (S-C) and patients with diabetes (S-M) using Dunn’s multiple comparisons test with Benjamini–Hochberg false discovery correction. Significantly different abundances of Christensenellaceae, Enterobacteriaceae, Mollicutes and Ruminococcaceae in pre-malignant stool were observed compared to other subgroups. Genus Romboutsia had significantly different abundance in healthy stool compared to CRC and diabetes subgroups

Differences in the composition of microbiome of stool samples in patients with diabetes mellitus and healthy individuals

As 22 (52%) of patients with CRC also had diabetes, it is not clear if the changes seen in the stool microbiome of those with CRC could be related to the presence of diabetes or specific to CRC. Therefore, we assessed the stool microbiome of 28 patients with diabetes to differentiate these observations.

Firmicutes, Bacteroidetes and Actinobacteria were found to be dominant in stool samples of individuals with diabetes (Fig. 7). Bacteria of phylum Bacteroidetes were detected at a relatively high in stool samples of patients with diabetes mellitus (38.12%) compared to healthy individuals (31.2%), while Firmicutes was found to be less abundant in patients with diabetes (35.16%) compared to healthy individuals (42.69%). Bacteria of the phylum Proteobacteria were found in similar abundance in patients with diabetes (11.81%) and in healthy individuals (13.78%) (Fig. 7). At genus level, Bacteroides was enriched in stool samples of patients with diabetes (15.1%) compared to healthy individuals (9.7%) but was not significant. Genus Romboutsia was significantly depleted in patients with diabetes compared to healthy controls (p = 0.009) (Fig. 8). We did not observe any differences in relative abundance of phyla or genera of the stool samples of patients with diabetes compared to patients with CRC.

Fig. 7
figure 7

The relative abundance of phyla in stool samples in healthy individuals and in patients with diabetes. The differences in the relative abundance of different bacterial phyla between stool samples of healthy individuals (S.N) (n = 50) and in patients with diabetes (S-M), were compared. Firmicutes, Bacteroidetes and Actinobacteria were found to be dominant in stool samples of individuals with diabetes

Fig. 8
figure 8

A heatmap showing relative abundance of genera in tissue and stool samples. The type of sample and the subgroup are indicated with a color key. Dendrograms were produced with the (UPGMA) method based on Bray-Curtis distance

Discussion

In this first study from South Asia, we assessed the gut microbiome in stool and colonic tissue biopsies of patients with CRC and premalignant lesions, comparing them to the microbiome of healthy age and sex matched individuals and those with diabetes. Interestingly, there were significant differences in the microbiome in colonic tissue samples compared to stool samples in all subgroups, with Fusobacterium, Acinetobacter, Escherichia, Shigella being significantly more abundant in colonic tissue samples with Romboutsia and Prevotella, being the most abundant in stool samples. Although significant variations have been previously observed between stool and colonic biopsy samples, the abundance of bacteria have greatly varied [25, 26]. Bacteroidetes, Fusobacteria and Verrucomicrobia were overabundant in colonic tissue samples of patients with CRC compared to healthy individuals, while Firmicutes and Actinobacteria were the most abundant in tissue samples of healthy individuals. Studies in the US and Sweden also have shown that Actinobacteria and Firmicutes were most abundant in healthy colonic tissue samples, although they did not find a high abundance of Proteobacteria [25, 26]. Fusobacterium have shown to be overabundant in adenomas [9], while many studies have reported a high abundance of Bacteroides species in those with CRC compared to other groups [8, 9].

Bacteria have been shown to induce tumorigeneses by various mechanisms [27]. Certain species of Bacteroides fragilis have been shown to produce a toxin that degrades E-cadherin and activates proliferative signaling pathways and has been shown to induce tumors in mouse models [28]. Colibactin producing strains of E. coli have been shown to induce certain mutations within the colonic epithelium [27, 29]. Fusobacterium nucleatum produces Fusobacterium adhesin A (FadA) protein, that again changes the expression of E-cadherin, induces proliferative signaling pathways and has shown to induce tumors in mice models [27, 30]. Although we did find an overabundance of Fusobacterium, Acinetobacter, Escherichia and Shigella in colonic tissue compared to stool samples in patients with CRC, one of the main limitations of our study was that we could not identify the bacterial species. Therefore, it would be important to further identify the bacterial species in the colonic tissue microenvironment to understand their contribution to disease pathogenesis, especially in those with premalignant lesions. This would enable discovery of biomarkers and also potential therapeutic targets during early disease.

Interestingly, bacteria belonging to the genus Romboutsia and Prevotella were found to be overabundant in stool samples compared to colonic tissue biopsies. Furthermore, genus Romboutsia was significantly more abundant in stool samples in healthy individuals compared to those with CRC and diabetes. Rombutsia species have shown to be less abundant in individuals with diabetes compared to healthy individuals [31, 32]. However, some have shown that Rombutsia species associate with the presence of non-alcoholic fatty liver disease (NAFLD) and strongly associated with hepatocellular carcinoma (HCC), while it was less abundant in those with diabetes [33, 34]. As many of these studies including ours report associations and patterns of the microbiome in disease and health, it would be important to explore if these bacteria do play a role in NAFLD and HCC and if so, the possible mechanisms involved.

In our cohort, we found that the bacteria of genus Prevotella were one of the most abundant bacteria in stool samples. Prevotella species have shown to associate with plant-based diets, high in fiber and low in fat content, and is highly abundant typically in individuals consuming a non-Western diet [35, 36]. However, there was no difference in the abundance rates of Prevotella species in those with CRC, compared to healthy individuals, those with premalignant lesions or in those with diabetes. This is likely to be due to high consumption of plant-based products in the typical Sri Lankan diet. It would be important to further characterize the different bacteria at species level to understand their roles and to explore the possibility of dietary manipulation to enhance the abundance of favorable microbe species.

As shown in other studies Bacteroides species were found to be overabundant in those with CRC compared to other groups [8, 9]. Although we could not characterize the Bacteriodes species in this study, due to sequencing of the V4 hypervariable region of the 16S RNA, one of our previous studies using quantitative real-time PCR showed that Bacteriodes fragilis was significantly higher in patients with CRC compared to healthy individuals and in those with diabetes [37]. Presence of enterotoxigenic Bacteroides fragilis has shown to be a potential marker for the presence of CRC and was shown to associate with poor prognosis [38, 39]. Bacteroides fragilis has shown to induce tumorigenesis by multiple mechanisms, which include alterations in NFkβ signalling pathways, inducing DNA damage, increasing polyamine metabolism, inducing TH17 cellular responses and by stimulating stem cell activity [39]. Fusobacterium, which has also shown to associate with CRC was found to be enriched especially in colonic tissue samples in those with CRC [10, 40, 41]. We found that those with premalignant lesions had significantly higher frequency of Christensenellaceae, Enterobacteriaceae, Mollicutes and Ruminococcaceae in their stool samples compared to patients with CRC. The presence of Christensenellaceae and Ruminococcaceae were found to be enriched in patients with adenoma previously compared to healthy individuals and have shown to be potential biomarkers for early identification of progression to CRC [42].

Conclusions

In summary, we found that despite marked differences in the Sri Lankan diet compared to the typical Western diet, Bacteroides fragilis and Fusobacterium species were the most abundant in those with CRC, while bacteria of genus Christensenellaceae and Ruminococcaceae were found to be most abundant in those with premalignant lesions. Phylum Firmicutes, Bacteroidetes, Proteobacteria and Fusobacteria were found to be highly abundant across the tissue samples. Interestingly, Prevotella species, was one of the most abundant in many individuals, possibly due to the predominant plant-based diet consumed by Sri Lankans. We believe these results pave the way for possible dietary interventions for prevention of CRC in the South Asian population.