Introduction

Despite recent therapeutic advances, lung cancer remains the number one cause of cancer-related death worldwide (He et al. 2021). According to statistics Canada, more Canadians died from lung cancer in 2021 (approximately 21,000 Canadians) than from breast, colorectal, and pancreatic cancers combined. Within lung cancer, non-small cell lung cancer (NSCLC) represents the most common histology with about 85% of all lung cancer (Remark et al. 2015) and they are most often diagnosed in the advanced stage (He et al. 2021). Over the past decade, immunotherapy has demonstrated superior efficacy compared to chemotherapy and is now incorporated in the standard-of-care in NSCLC (Rocha and Arriola 2019). Recently, the gut microbiota has emerged as one of the key regulators of immune checkpoint inhibitors efficacy and modulation of the microbiota are currently being evaluated in the immuno-oncology arena (Gopalakrishnan et al. 2018; Li et al. 2019; Derosa et al. 2021). The study of the gut microbiota is limited by challenges with current sequencing methodologies, so we applied culturomics technique to isolate different species.

Gabonibacter massiliensis is currently the only validly published species of the genus Gabonibacter (https://lpsn.dsmz.de/genus/gabonibacter) and was first cultured from a human fecal sample of a healthy young Gabonese (Mourembou et al. 2016). It belongs to the family of Porphyromonadaceae within the Bacteroidota phylum (Krieg et al. 2010). The Genus Gabonibacter regroups Gram-negative anaerobic coccobacillus bacterium that exhibited neither catalase nor oxidase activities (Mourembou et al. 2016). During a study addressing the gut microbiota composition of NSCLC patients treated with anti-PD-1 immunotherapy, using the culturomics approach (Dubourg et al. 2014; Lagier et al. 2015, 2016), we recovered a previously unknown bacterial species designated strain KD22T. In this paper, using a polyphasic taxonomic approach combining analysis of phylogenic identification, phenotypic and biochemical specificities, and genomic features, we determined the taxonomic affiliation of strain KD22T isolated from a fecal sample of an NSCLC patient.

Materials and methods

Bacterial strain isolation and identification

Strain KD22T was isolated from a fecal sample collected from a patient with advanced NSCLC treated with anti-PD-1 immunotherapy, at the University of Montreal Healthcare Centre (CHUM; Montreal/Canada). The stool samples collected were stored at −80 ºC in November 2019 prior to initiating culturomics. The patient had not received any antibiotics treatment within the last 3 months before fecal collection. He gave a signed informed consent at the time of sampling and the study was approved by CHUM Research Ethics Committee 20.300. To isolate gut bacteria, 1 g of the fecal sample was injected into an anaerobic culture bottle (BACTEC Lytic/10 Anaerobic/F Culture Vials) enriched with 4 ml filter-sterilized rumen fluid and 5% sheep blood (Cedarlane Labs, Burlington, Canada), and then incubated at 37 ºC. After 7 days of incubation, 100 µl culture broth was sampled and plated on sheep blood-enriched Columbia agar (BioMérieux). The agar plates were incubated in an anaerobic chamber (5% H2, 5% CO2, and 90% N2) at 37 ºC for 48 h. Each emerging colony was purified and identified using MALDI-TOF mass spectrometry (MS) with a Microflex LT spectrometer (Bruker, Daltonics, Germany) that compared the spectra with those present in the library (Bruker database and CRCHUM database, constantly updated), as previously reported as previously described (Seng et al. 2009, 2013). When the score was < 1.7, no identification was considered reliable.

The MALDI-TOF MS identification of strain KD22T was not successful. Therefore, to achieve the identification and determination of phylogenetic affiliation of strain KD22T, its 16S rRNA gene was sequenced as previously described using fd1, and rp2 primers and a 3730xl DNA Analyzer from Applied Biosystems™ (Technelysium Pty. Ltd) (Routy et al. 2022). Obtained 16S rRNA gene sequence was assembled and corrected using ChromasPro software (http://technelysium.com.au/wp/chromaspro/). Phylogenetic neighbors of KD22T were identified using the BLASTn program (Altschul et al. 1997) and the nucleotide collection (nr/nt) of the NCBI database (Yoon et al. 2017), available at https://blast.ncbi.nlm.nih.gov/Blast.cgi. Based on the BLAST results, the 16S rRNA gene sequences of closest relatives validly published were extracted from the GenBank database and aligned using the CLUSTAL W tool (Thompson et al. 1994; Higgins et al. 1996) integrated into the MEGAX program (Kumar et al. 2018). Phylogenetic interferences were reconstructed using the neighbor-joining method (Saitou and Nei 1987) with the maximum composite likelihood model and bootstrap values of 100 replicates using MEGAX software (available at https://www.megasoftware.net/).

Morphologic and phenotypic characteristics

Strain KD22T cell morphology was assessed by transmission electron microscopic as previously reported (Routy et al. 2022). Gram stain was assessed using the standard protocol. The bacterium motility was investigated using a Leica DM 1000 photonic microscope (Leica Microsystems) at 100 X magnification. To test sporulation, a thermal shock at 80 °C for 20 min of strain KD22T was performed. The growth temperature range was determined by culturing strain KD22T on Columbia agar and incubated for 2 days at various temperatures (room, 28, 37, 42, and 56 °C) under different atmospheres (anaerobic, microaerophilic, and aerobic conditions). The pH range growth was also tested at pH 5, 6, 6.5, 7, 7.5, and 8.5. Tolerance of NaCl was determined for concentrations ranked between 0 and 100 g/l. Catalase and oxidase productions were also detected (BioMérieux). Enzymatic and biochemical properties of strain KD22T were determined in duplicate using the API® 20A, API® ZYM, and Rapid ID 32A identification systems (BioMérieux). Short-chain fatty acids were analyzed using both a gas chromatograph (Hewlett Packard) and Microbial Identification System software.

Genome sequencing and annotation

Genomic DNA was sequenced using MiSeq Illumina. Libraries were generated using the NxSeq® AmpFREE Low DNA Library Kit Library Preparation Kit (Lucigen) according to the manufacturer’s recommendations, with 700 ng of genomic DNA as starting material. Dual-indexed adaptors were purchased from IDT. Libraries were quantified using the Kapa Illumina GA with Revised Primers-SYBR Fast Universal kit (Kapa Biosystems). The average size fragment was determined using a LabChip GX II (PerkinElmer) instrument. The libraries were normalized and pooled, denatured in 0.05 N NaOH, and neutralized using HT1 buffer. The pool was loaded at 225 pm on an Illumina NovaSeq S4 lane using Xp protocol as per the manufacturer’s recommendations. The run was performed for 2X150 cycles (paired-end mode). A phiX library was used as a control and mixed with libraries at 1% level. Base calling was performed with RTA v3. Program bcl2fastq2 v2.20 was then used to demultiplex samples and generate fastq reads. Genome assembly was performed as previously reported (Routy et al. 2022). Then, the genome of strain KD22T was compared to those of type strains of phylogenetically related species.

The core genome among all genomes compared was built using CoreCruncher software as previously described (Harris et al. 2020) with Usearch Global v8.0 (Edgar, 2010) and the stringent option. This method conservatively seeks out orthologues within large sets of whole genomes with the added ability to filter out paralogues and xenologues. Orthologs were defined with > 70% protein sequence identity and > 80% sequence length conservation and all other parameters were set to default. The core genome was defined as the set of single copy orthologs found in at least 90% of the genomes and resulted to 108 genes. Protein sequences of each core gene were then aligned using Mafft v7.407 (Katoh and Standley 2013) with default parameters. Protein alignments were then reverse translated into their corresponding nucleotide sequences. Finally, the nucleotide alignments of all the core genes of each genome were concatenated into a single large alignment as previously described (Bobay and Ochman 2017). Maximum-likelihood phylogenomic trees were built from the concatenated alignment of the core genome using FastTree v2.1.11 with GTR model (Price et al. 2010). Branch supports were evaluated by generating 100 bootstraps replicates using the same parameters. The trees were visualized with FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). The genomic similarity among all compared genomes was evaluated by calculating two parameters: digital DNA–DNA hybridization (dDDH), average nucleotide identity (ANI), and average amino acid identity (AAI) values using Genome-to-Genome Distance Calculator (GGDC) (Auch et al. 2010) and the ANI and the AAI calculators (Goris et al. 2007), respectively.

Results

Strain identification and phylogenetic analysis

The strain KD22T could not be identified by MALDI-TOF MS, which indicated that this strain may be a new putative species (Fig. S1). Subsequently, an almost-complete 16S rRNA gene sequence of strain KD22T was determined (1520 bp, GenBank accession no. OP221267.1), and database searches using nucleotide BLAST revealed highest sequence similarity to that of 2 uncultured bacteria (97.66% to Porphyromonadaceae bacterium AIP925T, and 97.46% to Porphyromonadaceae sp. S190), to ‘Sanguibacteroides justesenii’ OUH 308042 T (97.72%) not validly published and to members of the genus Gabonibacter (97.58% to G. massiliensis GM7T and 96.20% to ‘Gabonibacter timonensis’ Marseille-P3388T). Phylogenetic analysis of KD22T and other related type strain, performed using the neighbor-joining method, confirmed that strain KD22T was phylogenetically closest to G. massiliensis GM7T detected from human feces (Fig. 1). This cluster was supported by a high bootstrap value of 100%. These results are consistent with those obtained with the maximum-likelihood (Fig. S2) methods. Based on the results of phylogenetic analysis, strain KD22T was considered to be a member of the genus Gabonibacter.

Fig. 1
figure 1

Phylogenetic tree, based on the 16S rRNA gene sequence of strain KD22T and closest related taxa, constructed using the neighbor-joining method with the maximum composite likelihood model. Branch supports were evaluated by generating 100 bootstraps replicates. 16S rRNA gene GenBank accession numbers are shown in parentheses. The scale bar represented 0.02 substitutions per nucleotide position

Phenotypical and biochemical characteristics

Cells of strain of KD22T were Gram-negative, non-spore-forming coccobacilli without catalase, oxidase, and urease activities. The bacterium measured up to 0.5 µm in diameter and 1.5 µm in length (Fig. S3). Bacteria occurred as single rods or in a short chain. Bacterial colonies were circular, white to pale cream with 0.6–1.7 mm of diameter after 48 h of incubation at 37 ºC in anaerobic conditions on sheep blood-enriched Columbia agar. Growth occurred between 28 and 37 ºC with an optimal at 37 ºC. No growth was observed in the microaerophilic and aerobic atmosphere or at 45 and 55 ºC. The strain grew at a pH ranging from 6 to 7.5, with optimal growth at pH 7.0 and a NaCl concentration of less than 5%. The biochemical and enzymatic characteristics of KD22T, obtained using API® ZYM, API® 20A, and Rapid ID 32A strips, are given in the species description section and the features that differentiate strain KD22T from its close neighbors are shown in Table 1. The analysis of the total cellular fatty acid composition demonstrated that saturated branch-chain iso-C15:0 was the major fatty acid (51.65%). The cellular fatty acid profile of strain KD22T and those of its closest related species are mentioned in Table 2. However, the differences noted in the cellular fatty acid profile between studies may be due to differences in the bacterial culture conditions and the extraction method used.

Table 1 Differential phenotypic features between strain KD22T and closest related species
Table 2 Cellular fatty acid composition (%) of strain KD22T compared to those of related species

Genome properties and comparison

After assembly, filtered reads of the draft genome of KD22T resulted in 14 scaffolds (composed of 14 contigs) with a total sequence length of 3,368,578 bp (Table S1, Fig. 2). The repartition of genes into the 25 general COG categories is represented in Table S2. The genome DNA G + C content of strain KD22T was 41.99 mol%. Of the 2,827 predicted genes, 2,746 were protein-coding genes and 59 were RNAs (one 5S rRNA, one 16S rRNA, one 23S rRNA, 53 tRNAs, and 3 ncRNAs genes). The genomic comparison of KD22T with its neighbors is presented in Fig. 3, Table 3 and Table S1. Briefly, the genome size, gene content, and DNA G + C content of KD22T (3.37 Mb, 2,827, and 41.99 mol%, respectively) are in the range reported for type strains of phylogenetically related species but very close to those of G. massiliensis (3.39 Mb, 2,880 and 42.10 mol% respectively; Table S1). Strain KD22T and G. massiliensis show high level of AAI values (96.03%, Table S3). Nevertheless, the dDDH values between species ranged from 18.70% with ‘S. justeseni’i and O. laneus to 84.40% between G. massiliensis and ‘S. justesenii’. Strain KD22T shared dDDH values from 18.50% with Butyricimonas virosa to 67.40% with G. massiliensis (Table 3). Furthermore, all these compared genomes had less than 97.50% ANI values (between G. massiliensis and ‘S. justesenii’). Strain KD22T shared ANI values ranging from 65.63% with Parabacteroides distasonis to 95.43% with G. massiliensis (Table 3).

Fig. 2
figure 2

Graphical circular map of the chromosome of strain KD22.T. From the outside to the center: genes on the forward strand colored by Clusters of Orthologous Groups of proteins (COG) categories (only genes assigned to COG), genes on the reverse strand colored by COG categories (only gene assigned to COG), RNA genes (tRNAs green and rRNAs red), GC content (dark), and GC skew (negative values purple and positive values olive green)

Fig. 3
figure 3

Core-genome phylogenetic tree between strain KD22T and other related species. Phylogenetic tree built with a concatenated alignment of 108 single copy genes present in all the analyzed genomes, using the maximum-likelihood method with GTR model. Percentage bootstrap support values above 99% are shown for each node. The bar is nucleotide substitutions per site

Table 3 Genomic comparison between Gabonibacter chumensis KD22T and closest related species using Pairwise comparisons, such as DDH (upper right side in blue) and ANI values (lower left side in read)

Conclusion

A polyphasic approach based on a combination of 16S rRNA gene sequence analysis, phenotypic features, chemotaxonomic properties, and genomic data demonstrated that strain KD22T belongs to the genus Gabonibacter. The AAI, 96.03%, is high between G. chumensis and G. massiliensis slightly above the 96% limit for differentiating two species. Note that it is appropriate to use the average amino acid identity (AAI) for more distant populations, because the resolution is progressively lost at the nucleotide level (Rodriguez-R and Konstantinidis 2014). However, its differing phenotypic and biochemical characteristics sustained by 16S rRNA gene sequence similarity, dDDH, and ANI values of 97.58%, 67.40%, and 95.43%, respectively, distinguished it from its closest relative species as well as other validly published members of the genus Gabonibacter. The above findings, 16S rRNA gene similarity < 98.65% (Kim et al. 2014; Yarza et al. 2014), dDDH < 70% (Auch et al. 2010), and ANI < 96% (Meier-Kolthoff et al. 2013; Chun et al. 2018), which are used for species delineation indicate that strain KD22T represents a novel species within genus Gabonibacter. The name Gabonibacter chumensis sp. nov. is proposed for this new isolate. In addition, dDDH and ANI values between G. massiliensis GM7T and ‘S. justesenii’ OUH 308042 T (84.40% and 97.50%, respectively) higher than the limit set for species demarcation suggest that these two strains belong to the same species.

Taxonomic and nomenclature proposal

Description of Gabonibacter chumensis sp. nov.

Gabonibacter chumensis (chum.en’sis N.L. masc. adj. chumensis, referring to the acronym CHUM, meaning “Centre Hospitalier de l’Université de Montréal”, the clinical lab where the type strain was first isolated).

Cells are strictly anaerobic, Gram-negative, not mobile, and non-spore-forming coccobacilli with up to 0.5 µm in diameter and 1.5 µm in length. They occur as single rods or in a short chain. After 48 h anaerobically at 37 ºC on sheep blood-enriched Columbia agar, bacterial colonies are circular, white to pale cream with 0.6–1.7 mm of diameter. Growth occurs between 28 and 37 ºC with an optimum at 37 ºC. NaCl concentration and pH range allowing growth are 0–5% and 6–7, respectively. Catalase, oxidase, and urease are not produced. Indole is detected, and nitrate is not reduced to nitrites. Aesculin and gelatin are not hydrolyzed. Acid is produced from D-glucose, D-mannitol, salicin, D-mannose, D-sorbitol, and D-trehalose, but not from lactose, D-saccharose, D-maltose, D-xylose, L-arabinose, glycerol, D-cellobiose, D-melezitose, D-raffinose, and L-rhamnose. Using API® strips (ZYM and Rapid 32A), the strain exhibits alkaline phosphatase, acid pyroglutamic arylamidase, alanine arylamidase, α-chymotrypsin, esterase, esterase lipase, glutamyl acid glutamic, leucyl glycine arylamidase, and naphthol-AS-BI-phosphohydrolase activities. On the other hand, galactosidase (α and β), glucosidase (α and β), β-glucuronidase, N-acetyl-β-glucosaminidase, α-arabinosidase, α-manisidase, and α-fucosidase are negative. The most abundant fatty acids are iso-C15:0 and C16:0 3-OH.

The type strain, KD22T (= CSUR Q8104T = DSM 115208), was isolated from the feces of a patient suffering from lung cell small cancer. Its draft genome measures 3,368,578 bp and exhibits a DNA G + C of 41.99 mol%. The GenBank/EMBL accession numbers of 16S rRNA gene and genome sequences are OP221267.1 and JANSKB000000000.1, respectively.