Genome sequence and description of Haloferax massiliense sp. nov., a new halophilic archaeon isolated from the human gut

By applying the culturomics concept and using culture conditions containing a high salt concentration, we herein isolated the first known halophilic archaeon colonizing the human gut. Here we described its phenotypic and biochemical characterization as well as its genome annotation. Strain Arc-HrT (= CSUR P0974 = CECT 9307) was mesophile and grew optimally at 37 °C and pH 7. Strain Arc-HrT was also extremely halophilic with an optimal growth observed at 15% NaCl. It showed gram-negative cocci, was strictly aerobic, non-motile and non-spore-forming, and exhibited catalase and oxidase activities. The 4,015,175 bp long genome exhibits a G + C% content of 65.36% and contains 3911 protein-coding and 64 predicted RNA genes. PCR-amplified 16S rRNA gene of strain Arc-HrT yielded a 99.2% sequence similarity with Haloferax prahovense, the phylogenetically closest validly published species in the Haloferax genus. The DDH was of 50.70 ± 5.2% with H. prahovense, 53.70 ± 2.69% with H. volcanii, 50.90 ± 2.64% with H. alexandrinus, 52.90 ± 2.67% with H. gibbonsii and 54.30 ± 2.70% with H. lucentense. The data herein represented confirm strain Arc-HrT as a unique species and consequently we propose its classification as representative of a novel species belonging to the genus Haloferax, as Haloferax massiliense sp. nov.


Introduction
The human intestinal microbiota is a complex ecosystem consisting of a wide diversity including bacteria (Lagier et al. 2012), archaea (Khelaifia et al. 2013), and unicellular eukaryotes (Nam et al. 2008). The culturomics concept, recently introduced in our laboratory to study the prokaryotes diversity in the human gut (Lagier et al. 2012), allowed the isolation of a huge halophilic bacteria diversity including several new species ). Among the diverse culture conditions and several culture media used by culturomics to isolate new prokaryotes, some conditions targeting specifically extremophile organisms were also used . Indeed, culture media containing high salt concentration are essentially used to select halophilic bacteria and archaea. Currently, the determination of the affiliation of a new prokaryote is based on the 16S rDNA sequence, G + C content % and DNA-DNA hybridization (DDH). This approach is limited because of the very low cutoff between species and genera (Welker and Moore 2011). In some cases, 16S rRNA gene sequence comparison has been proved to poorly discriminate some species belonging to a same genus and remain ineffective (Stackebrandt and Ebers 2006). Recently, we proposed a polyphasic approach based on phenotypic and biochemical characterization, MALDI-TOF MS spectrum and total genome sequencing and annotation to better define and classify new taxa (Ramasamy et al. 2014).
Using culturomics techniques to isolate halophilic prokaryotes colonizing the human gut , strain Arc-Hr T was isolated from a stool specimen of a 22-yearold Amazonian obese female patient . This strain presented different characteristics enabling its classification as a new species of the Haloferax genus. The Haloferax genus was first described by Torreblanca et al. (1986) and actually includes 12 species with validly published names. Members of the Haloferax genus are essentially extremely halophilic archaea that inhabit hypersaline environments such as the Dead Sea and the Great Salt Lake. They are classified in the family Haloferacaceae within the Euryarchaeota phylum and the various species constitute 57 recognized genera (Arahal et al. 2017).
In this study, we present a classification and a set of characteristics for Haloferax massiliense sp. nov., strain Arc-Hr T (= CSUR P0974 = CECT 9307) with its complete genome sequencing and annotation.

Ethics and samples collection
The stool specimens were collected from a 22-year-old Amazonian obese female patient after defecation in sterile plastic containers, sampled and stored at -80 °C until use. Informed and signed consent was obtained from the patient. The study and the assent procedure were approved by the Ethics Committees of the IHU Méditerranée Infection (Faculty of Medicine, Marseille, France), under agreement number 09-022. Salt concentration of the stool specimen was measured by digital refractometer (Fisher scientific, Illkirch, France) and the pH was measured using a pH-meter.

Isolation of the strain
Strain Arc-Hr T was isolated in December 2013 by aerobic culture of the stool specimen in a home-made culture medium consisting of a Columbia broth (Sigma-Aldrich, Saint-Quentin Fallavier, France) modified by the addition of (per liter): MgCl 2 ·6H 2 O, 15 g; MgSO 4 ·7H 2 O, 20 g; KCl, 4 g; CaCl 2 ·2H 2 O, 2 g; NaBr, 0.5 g; NaHCO 3 , 0.5 g, glucose, 2 g and 150 g of NaCl. pH was adjusted to 7.5 with 10 M NaOH before autoclaving. Approximately, 1 g of stool specimen was inoculated into 100 mL of this liquid medium in a flask incubated aerobically at 37 °C with stirring at 150 rpm. Subcultures were realized after 10, 15, 20 and 30 days of incubation. Then, serial dilutions of 10 −1 -10 −10 were performed in the home-made liquid culture medium and then plated onto agar plates consisting of the previously detailed liquid medium with 1.5% agar.

Strain identification by MALDI-TOF MS and 16S rRNA gene sequencing
MALDI-TOF MS protein analysis was carried out as previously described (Seng et al. 2013). The resulting 12 spectra of strain Arc-Hr T were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of halophilic and methanogenic archaea including the spectra from Haloferax alexandrinus, Methanobrevibacter smithii, Methanobrevibacter oralis, Methanobrevibacter arboriphilus, and Methanomassilicoccus massiliensis. The 16S rRNA gene amplification by PCR and sequencing were performed as previously described (Lepp et al. 2004). The phylogenetic tree was reconstructed according to the method described by Elsawi et al. (2017).

Growth conditions
The optimum growth temperature of strain Arc-Hr T was tested on the solid medium by inoculating 10 5 CFU/mL of an exponentially growing culture incubated aerobically at 28, 37, 45***, and 55 °C. Growth atmosphere was tested under aerobic atmosphere, in the presence of 5% CO 2 , and also in microaerophilic and anaerobic atmospheres created using GENbag microaer and GENbag anaer (BioMérieux, Marcy l'Etoile, France) respectively. The optimum NaCl concentration required for growth was tested on solid media at 0, 1, 5, 7.5, 10, 15, 20, 25** and 30% of NaCl. The optimum pH was determined by growth testing at pH 5, 6, 7, 7.5, 8 and 9.

Biochemical, sporulation and motility assays
To characterize the biochemical properties of strain Arc-Hr T , we used the commercially available Api ZYM, Api 20 NE, Api 50 CH strips (bioMérieux), supplemented by 15% NaCl (w/v) and 30 g/L of MgSO 4 . The sporulation test was done by thermic-shock at 80 °C for 20 min and subculturing on the solid medium. The motility of strain Arc-Hr T was assessed by observing a fresh culture under DM1000 photonic microscope (Leica Microsystems, Nanterre, France) with a 100X oil-immersion objective lens. The colonies' surface was observed on the agar culture medium after 3 days of incubation under aerobic conditions at 37 °C.

Microscopy and gram test
Cells were fixed with 2.5% glutaraldehyde in 0.1 M cacodylate buffer for at least 1 h at 4 °C. A drop of cell suspension was deposited for approximately 5 min on glowdischarged formvar carbon film on 400 mesh nickel grids (FCF400-Ni, EMS). The grids were dried on blotting paper and cells were negatively stained for 10 s with 1% ammonium molybdate solution in filtered water at RT. Electron micrographs were acquired with a Morgagni 268D (Philips) transmission electron microscope operated at 80 keV. The gram stain was performed using the color GRAM 2 kit (Biomerieux) and observed using a DM1000 photonic microscope (Leica Microsystems).

Analysis of fatty acid methyl ester and membrane polar lipids
Polar lipids were extracted and identified by one-dimensional TLC as described by Cui and Zhang (2014). Cellular fatty acid methyl ester (FAME) analysis was performed by GC/MS. Three samples were prepared with approximately 80 mg of bacterial biomass per tube harvested from several culture plates. Fatty acid methyl esters were prepared as described by Sasser (2006). GC/MS analyses were carried out as described before . Briefly, fatty acid methyl esters were separated using an Elite 5-MS column and monitored by mass spectrometry (Clarus 500-SQ 8 S, Perkin Elmer, Courtaboeuf, France). Spectral database search was performed using MS Search 2.0 operated with the Standard Reference Database 1A (NIST, Gaithersburg, USA) and the FAMEs mass spectral database (Wiley, Chichester, UK).

DNA extraction and genome sequencing
After scraping 5 Petri dishes in 1 mL TE buffer, the genomic DNA (gDNA) of strain Arc-Hr T was extracted from 200 µL of the bacterial suspension after a classical lysis treatment with a final concentration of lysozyme at 40 mg/mL for 2 h at 37 °C followed by an incubation time of 1 h at 37 °C in SDS 1% final and 30µL RNAse. Proteinase K treatment was realized with at 37 °C. After three phenol extractions and alcohol precipitation, the sample was eluted in the minimal volume of 50 µL in EB buffer. DNA was quantified by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) to 14 ng/µL. GDNA was sequenced on the MiSeq Technology (Illumina Inc, San Diego, CA, USA) with the mate pair strategy as previously described . Total information of 10.6 Gb was obtained from a 1326 K/mm 2 cluster density with a cluster passing quality control filters of 99.1% (20,978,044 pass filter clusters). Within this run, the index representation for strain Arc-Hr T was determined to be of 6.22%. The 1,303,974 paired reads were filtered according to the read qualities, trimmed and then assembled.

Genome assembly
Illumina reads were trimmed using Trimmomatic (Lohse et al. 2012), then assembled thought Spades software (Nurk et al. 2013;Bankevich et al. 2012). Contigs obtained were combined together by SSpace (Boetzer et al. 2011) and Opera software (Gao et al. 2011) helped by GapFiller (Boetzer and Pirovano 2012) to reduce the set. Some manual refinements using CLC Genomics v7 software (CLC bio, Aarhus, Denmark) and homemade tools in Python improved the genome. Finally, the draft genome of strain Arc-Hr T consisted of 8 contigs.

Genome annotation and comparison
Non-coding genes and miscellaneous features were predicted using RNAmmer (Lagesen et al. 2007), ARAGORN (Laslett and Canback 2004), Rfam (Griffiths-Jones et al. 2003), PFAM (Punta et al. 2012), and Infernal (Nawrocki et al. 2009). Coding DNA sequences (CDSs) were predicted using Prodigal (Hyatt et al. 2010) and functional annotation was achieved using BLAST + (Camacho et al. 2009) and HMMER3 (Eddy 2011) against the UniProtKB database (The UniProt Consortium 2011). A brief genomic comparison was also made between strain Arc-Hr T (CSTE00000000), Haloferax alexandrinus strain Arc-Hr (CCDK00000000), Haloferax gibbonsii strain ARA6 (CP011947), Haloferax lucentense strain DSM 14919 (AOLH00000000), Haloferax volcanii strain DS2 (CP001956) and Haloferax prahovense strain DSM 18310 (AOLG00000000). To estimate the mean level of nucleotide sequence similarity at the genome level between strain Arc-Hr T and the four closest species with an available genome, we used the Average Genomic Identity of Orthologous gene Sequences (AGIOS), in a laboratory's pipeline. Briefly, this pipeline combines the Proteinortho (Lechner et al. 2011) software (with the following parameters: e value 1e −5 , 30% identity, 50% coverage and algebraic connectivity of 50%) for the detection of orthologous proteins between genomes compared pairwise, retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity between orthologous ORFs using the Needleman-Wunsch global alignment algorithm (Ramasamy et al. 2014). Strain Arc-Hr T genome was locally aligned 2-by-2 using BLAT algorithm (Kent 2002;Auch et al. 2010) against each selected genomes previously cited and DNA-DNA hybridization (DDH) values were estimated by using the genome-to genome sequence comparison (Auch et al. 2010).

Strain identification and phylogenetic analysis
Using MALDI-TOF MS identification, no significant score allowing a correct identification was obtained for strain Arc-Hr T against our database (the Bruker database is constantly incremented with URMITE data), suggesting that our isolate did not belong to any known species; and consequently, spectra from strain Arc-Hr T was added to our database (http://www.medit erran ee-infec tion.com/artic le.php?laref =256&titre =urms) (Fig. 1). PCR-amplified 16S rRNA gene of strain Arc-Hr T (HG964472) exhibited a 99.2% sequence similarity with Haloferax prahovense JCM 13924 (NR113446), the phylogenetically closest validly published species with standing in nomenclature (Fig. 2). As 16S rRNA gene sequence comparison has been proven to poorly discriminate Haloferax species, we sequenced the complete genome of strain Arc-Hr T and a digital DNA-DNA hybridization (dDDH) was made with four of the closest Haloferax species (see the part on genome comparison). These data confirmed strain Arc-Hr T as a unique species. Finally, the Fig. 1 Reference mass spectrum from Haloferax massiliense strain Arc-Hr T . Spectra from 12 individual colonies were compared and a reference spectrum was generated gel view showed the protein spectral differences with other members of the genus Haloferax (Fig. 3).

Phenotypic and biochemical characteristics
Salt concentration of the stool specimen measured by digital refractometer was about 2.5% and the pH was 7.2. Strain Arc-Hr T colonies were circular, red, shiny and smooth with a diameter of 0.5-1 mm. Cells were gram-negative, nonmotile and non-spore-forming. Cells were very pleomorphic (irregular cocci, short and long rods, triangles and ovals) and had a diameter between 1 to 4 µm (Fig. 4). Strain Arc-Hr T was mesophilic and grew at temperatures ranging from 25 to 45 °C, with an optimum at 37 °C. NaCl was required for growth and the strain grew at a salinity ranging from 10 to 25% of NaCl with an optimum at 15%; cells underwent lysis below 100 g/L NaCl. The optimum pH for growth was 7 (range between pH 6.5 and 8). The strain was strictly aerobic and grew in the presence of 5% CO 2 ; no growth was observed in microaerophilic or anaerobic condition by using alternative electron acceptors such as nitrate or DMSO, or  Table 1.

Genome sequencing information and annotation
Strain Arc-Hr T 's genome was sequenced as part of a culturomic study aiming at isolating all prokaryotes species colonizing the human gut ) and because of its phylogenetic affiliation to the Haloferax genus. Strain Arc-Hr T represents the 13th genome sequenced in the Haloferax genus. The draft genome of strain Arc-Hr T contains 4,015,175 bp with a G + C content of 65.36% and consists of 8 contigs without gaps (Fig. 5). The genome was shown to encode at least 64 predicted RNA including 3 rRNA, 57 tRNA, 4 miscellaneous RNA and 3911 proteincoding genes. Among these genes, 490 (13%) were found to Fig. 4 Gel view comparing Haloferax massiliense strain Arc-Hr T to other species within the genus Haloferax. The gel view displays the raw spectra of loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra load-ing. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color of a peak and the peak intensity, in arbitrary units. Displayed species are indicated on the left be putative proteins and 291 (8%) were assigned as hypothetical proteins. Moreover, 2335 genes matched at least one sequence in Clusters of Orthologous Groups (COGs) database (Tatusov et al. 1997(Tatusov et al. , 2000 with BLASTP default parameters. Table 3 shows the detailed project information and its association with MIGS version 2.0 compliance. The properties and the statistics of the genome are summarized in Table 4. The distribution of genes into COGs functional categories is presented in Table 5.
The distribution of genes into COG categories was identical ( Fig. 6) (Field et al. 2008) a Evidence codes-IDA, Inferred from Direct Assay; TAS, Traceable Author Statement (i.e., a direct report exists in the literature); NAS, nontraceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from http://www.geneo ntolo gy.org/GO.evide nce.shtml of the Gene Ontology project (Ashburner et al. 2000). If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements   (Enache et al. 2007), Haloferax volcanii (Torreblanca et al. 1986), Haloferax denitrificans (Tindall et al. 1989); 4, Haloferax mediterranei (Torreblanca et al. 1986), Haloferax gibbonsii , Haloferax alexandrinus (Asker and Ohta 2002) and Haloferax lucentense (Gutierrez et al. 2002

Discussion
Here, we describe the genome sequence and most of the biochemical characteristics of the first isolate of Haloferax massiliense sp. nov., an extremely halophilic archaea isolated from the human gut. Halophilic organisms are generally known to colonize hypersaline environments where the salt concentration is close to saturation, such as salt lakes and salt marshes (Oren 1994). Here, using a culture medium containing high salt concentration, we successfully isolated strain Arc-Hr T belonging to the Haloferax genus within the Haloferacaceae family. This strain presents the first halophilic archaea isolated from the human gut. Recently, DNA sequences belonging to some halophilic archaea frequently present or abundant in extreme environments were detected by PCR in the human gastro-intestinal tract as well as some members of the Halobacteriaceae family (Oxley et al. 2010). Bacterial halophilism has become a subject of considerable interest for microbiologists and molecular biologists during the past 20 years, because of their development on salty foods (Fukushima et al. 2007). Indeed, these organizations have also been detected in refined salt ) as well as food products where salt is used in large quantities in the process of their conservation such as salted fish, pork ham, sausages and fish sauces (Tanasupawat et al. 2009;Kim et al. 2010). Additionally, the limitation of these organisms to extreme environments has been recently contested after their detection in habitats with relatively low salinity, suggesting an ability of adaptation to survive in more moderate environments (Purdy et al. 2004). This work does not intend to demonstrate a medical or biotechnological interest regarding strain Arch-Hr T ; its only aim is to expand knowledge about the human microbiota and isolating all the prokaryotes that colonize the human digestive tract ).

Conclusion
Based on the characteristics reported here and the phylogenetic affiliation of strain Arc-Hr T , we proposed the creation of Haloferax massiliense sp. nov., as a new species belonging to the Haloferax genus with strain Arc-Hr T as its type strain. Haloferax massiliense sp. nov., (= CSURP0974 = CECT 9307), described here, was isolated from the human gut as part of a culturomic study aiming at expanding the repertoire of microorganisms colonizing the human gut.  . 6 Distribution of functional classes of predicted genes according to cluster of orthologous groups of proteins from Haloferax massiliense strain Arc-Hr T Table 6 Number of orthologous proteins shared between genomes (upper right), average percentage similarity of nucleotides corresponding to orthologous protein shared between genomes (lower left) and number of proteins per genome (bold)
Haloferax massiliense strain Arc-Hr T is a strictly aerobic gram negative, non-motile and non-spore-forming. Cells were very pleomorphic (irregular cocci, short and long rods, triangles and ovals) and had a diameter between 1 and 4 µm. An optimal growth was observed at 37 °C, pH 7 and 15% of NaCl. Colonies are red, smooth, shiny and measure 0.5-1 mm. Strain Arc-Hr T has exhibited positive catalase and oxidase activities.