RETRACTED ARTICLE: Gemella massiliensis sp. nov., a new bacterium isolated from the human sputum

Thanks to its ability to isolate previously uncultured bacterial species, culturomics has dynamized the study of the human microbiota. A new bacterial species, Gemella massiliensis Marseille-P3249T, was isolated from a sputum sample of a healthy French man. Strain Marseille-P3249T is a facultative anaerobe, catalase-negative, Gram positive, coccus, and unable to sporulate. The major fatty acids were C16:0 (34%), C18:1n9 (28%), C18:0 (15%) and C18:2n6 (13%). Its 16S rRNA sequence exhibits a 98.3% sequence similarity with Gemella bergeri strain 617-93T, its phylogenetically closest species with standing in nomenclature. Its digital DNA–DNA hybridization (dDDH) and OrthoANI values with G. bergeri of only 59.7 ± 5.6% and 94.8%, respectively. These values are lower than the thresholds for species delineation (> 70% and > 95%, respectively). This strain grows optimally at 37 °C and its genome is 1.80 Mbp long with a 30.5 mol% G + C content. Based on these results, we propose the creation of the new species Gemella massilienis sp. nov., strain Marseille-P3249T (= CSUR P3249 = DSMZ 103940). Supplementary Information The online version contains supplementary material available at 10.1007/s00203-021-02493-2.


Introduction
The genus Gemella has been described for the first time by Berger 1961 (Berger).Members of this genus are usually Gram-positive, coccoid-shaped, facultatively anaerobic, and do not produce any catalase activities (Collins 2006).The first species of this genus such as Gemella morbillorum and Gemella haemolysans are commensals of mucous membranes of humans but are sometimes responsible for human infections (Kilpper-Bälz and Schleifer 1988).
Gemella bergeri and Gemella sanguinis were recovered from human clinical specimens (Collins et al. 1998a, b), whereas Gemella palaticanis was isolated from a dog (Collins et al. 1999).Although the pathogenicity of members of this genus is not yet proven, it seems likely that they are also residents of the mucous membranes.
During a project on the human microbiota, we studied sputum samples by culturomics as previously described (Lagier et al. 2018) which allowed us to isolate a new bacterial strain belonging to the phylum Firmicutes.Herein, we report a taxonogenomic description (Fournier et al. 2015) of Gemella massiliensis sp.nov., which is previously announced by our research group (Fonkou et al. 2018).

Growth conditions
A bacterial strain was isolated from a sputum sample from a healthy Frenchman by culturomics to explore the human microbiome.The study was approved by the ethics committee of the Institut Federatif de Recherche IFR48 under the number 09-022 and then the patient gave his formal agreement by signing the informed consent.Thus optimal growth conditions of strain Marseille-P3249 were evaluated using various culture conditions.Culture assays were done at 28, 37, 45 and 55 °C under anaerobic (GENbag anaer, bioMérieux), microaerophilic (GENbag Microaer, bioMérieux) and aerobic conditions.Tolerance to acidity and halotolerance were evaluated independently with growth assays at pH 6, 6.5, 7 and 8.5 and by using 0, 5, 10, 50, 75 and 100 g/L NaCl concentrations, respectively.

Morphological, biochemical and antibiotic susceptibility analysis
The main biochemical features of strain Marseille-P3249 T were tested using API strips (ZYM, 50CH and 20A (bio-Mérieux, France)).Motility and Gram stain were checked using a DM1000 photonic microscope (Leica Microsystems, Nanterre, France).Additionally, sporulation was evaluated after exposing a bacterial suspension to a 20 min heat shock at 80 °C.Cell morphology images were obtained using a scanning electron (SEM) microscope (TM4000 Plus, Hitachi High-Technologies Corp., Tokyo, Japan).
Cellular fatty acid methyl ester (FAME) analyses were performed with GC/MS with 10 mg of bacterial biomass per tube.GC/MS and FAME analyses were performed as previously reported (Elsawi et al. 2017).

DNA extraction and genome sequencing
A total of 82.1 ng/µL of genomic DNA (gDNA) were extracted from strain Marseille-P3249 as previously described (Elsawi et al. 2017).gDNA was sequenced using the MiSeq technology (Illumina Inc, San Diego, CA, USA) with the Mate-pair strategy and were run and barcoded with 11 additional projects using the Nextera Mate-Pair sample prep kit (Illumina) as formerly described (Elsawi et al. 2017).The DNA fragment size ranged from 1.5 kb up to 11 kb with an optimal size of 6.29 kb.No size selection was done and 177.24 ng of tagmented fragments were circularized.The circularized DNAs were sheared mechanically to smaller fragments with an optimal size at 1393 bp on the Covaris device S2 in T6 tubes (Covaris, Woburn, MA, USA).Using a high sensitivity bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA), the library profile was visualized with a final concentration of 15.59 nmol/L.The latter were normalized at 2 nM and pooled with other samples, and finally diluted to 15 pM.Automated cluster generation and sequencing run were performed in a single 2 × 251-bp run.Total information of 9.5 Gb was obtained from a 1050 K/mm 2 cluster density with a cluster passing quality control filters of 92.5% (18,644,000 passing filter paired-reads).Within this run, the index representation for strain Marseille-P3249 T was determined to 4.67%.The 870,362 paired reads were trimmed, assembled, annotated and analyzed.

Phylogenetic analysis
For phylogenetic analyses, 16S rRNA gene sequences of closely related species were recovered from the Genbank database (https:// www.ncbi.nlm.nih.gov/ genba nk/).Muscle was used for sequence alignment and phylogenetic inferences were generated using the approximately-maximumlikelihood method within the FastTree software (Edgar 2004;Price et al. 2009).In addition, a phylogenetic tree based on housekeeping genes such as groES, groEL, recA, gyrA, and rpoB was performed using iTOL software online (https:// itol.embl.de/).Genes are extracted from annotated genomic sequences and then concatenated for each strain.

Strain identification
MALDI-TOF MS failed to identify strain Marseille-P3249 T .Therefore, 16S rRNA gene sequencing was performed and using a blast comparison against the NCBI nucleotide database, strain Marseille-P3249 T exhibited a 98.3% sequence similarity with Gemella bergeri strain 617-93 T , being the phylogenetically closest species with standing

R E T R A C T E D A R T I C L E
in nomenclature (Fig. 1) (Collins et al. 1998a).Thus, and according to Kim et al., this strain may be classified within a new bacterial species within the Gemella genus as it exhibits more than 1.35% sequence divergence with its phylogenetically closest species with a validly published name (Kim et al. 2014).Furthermore, the MLSA tree performed with concatenated genes shows that G. massiliensis strain Marseille-P3249 T is positioned within the Gemella species but is clearly distinct from them on a single branch (Fig. 2).

General characteristics of strain Marseille-P3249
Cells from strain Marseille-P3249 T were Gram-positive cocci.Colonies grew in optimally at 37 °C in aerobic conditions with pH range between 6 and 8.5 and NaCl concentrations below 50 g/L and measured from 0.5 to 1.2 mm in diameter after 24 h of incubation.Cells were not motile and non-spore forming with a mean diameter of 0.78 µm.They metabolize d-fructose, amygdalin, and l-sorbose possessed enzymes such as esterase, leucine arylamidase, and naphthol-AS-BI-phosphohydrolase.Biochemical criteria of strain Marseille-P3249 T are compared with those of closely related species in standing in nomenclature (Table 1).

Genome characteristics of strain Marseille-P3249
The genome was 1,804,813 bp long with a 30.5 mol% G + C content (Fig. 3).It is composed of 7 scaffolds (composed of 8 contigs).Of the 1727 predicted genes, 1677 were

R E T R A C T E D A R T I C L E
protein-coding genes and 50 were RNAs (5 genes were 5S rRNA, 2 genes were 16S rRNA, 2 genes were 23S rRNA, and 41 genes were tRNA genes).A total of 1 276 genes (76.09%) were assigned a putative function (by cogs or by NR blast).Twenty-six genes were classified as ORFans (1.55%).The remaining genes were annotated as hypothetical proteins (304 genes (18.13%)).The distribution of genes into COG functional categories is detailed in supplementary Table S1.

Genome comparison
The draft genome sequence of strain Marseille-P3249 T was larger than those of Gemella cuniculi DSM 15828 T , Gemella sanguinis ATCC 700632 T and Gemella haemolysans ATCC 10379 T , but smaller than those of Gemella asaccharolytica WAL 1945J T , Gemella bergeri 617-93 T and Gemella morbillorum NCTC11323 T (Table 3).
Additionally, the G + C content of strain Marseille-P3249 T is smaller than those of G. asaccharolytica WAL 1945J T , G. cuniculi DSM 15828 T , G. sanguinis ATCC 700632 T and G. bergeri 617-93 T , but larger than those of G. morbillorum NCTC11323 T and G. haemolysans ATCC 10379 T .In the same way, the gene content of strain Marseille-P3249 T was compared with the closely related Gemella species.
The genome is 1.80 Mbp with 30.5 mol% G + C content.
T h e t y p e s t r a i n M a r s e i l l e -P 3 2 4 9 T (= CSURP3249 = DSM103940) was isolated from the sputum sample of a healthy French man.
The 16S rRNA and whole-genome sequences of G. massiliensis sp.nov., were deposited in EMBL-EBI Fig. 3 Graphical circular map of the chromosome.From outside to the center: genes on the forward strand colored by COG categories (only genes assigned to COG), genes on the reverse strand colored by COG categories (only gene assigned to COG), RNA genes (tRNAs green, rRNAs red), GC content and GC skew

R E T R A C T E D A R T I C L E
under accession numbers LT628479 and FQLS00000000, respectively.

Fig. 1 Fig. 2
Fig.116S rRNA gene sequence phylogenetic analysis highlighting the position of strain Marseille-P3249 relative to other species.This tree is formally already published but it was remade with slight changes(Fonkou et al. 2018).Sequence alignment and phylogenetic inferences were obtained using the maximum likelihood method within MEGA 7 software.The scale bar represents a 2% sequence divergence using 1000 replicates.GenBank accession numbers are indicated in parenthesis

)
Fa facultative anaerobic; Na data not available; V variable; W weakly positive

Table 3
Genome information of the species involved in the genomic comparative analyses SpeciesSize (Mb) GC (%) Gene content