Background

The Helicobacter genus comprises gram-negative bacteria with characteristic spiral shapes [1]. The type species, Helicobacter pylori, was the first cultivated species of the genus. H. pylori was first isolated from in the stomachs of patients with gastritis and peptic ulceration in 1984 [2]. Currently, over 30 species have been isolated, including many candidate or unclassified species [3]. Except for H. pylori, most of the other Helicobacter have been isolated and identified from wild animals, including rodents, cats, dogs, rabbits, chickens, sheep, cattle, swine, cheetahs, ferrets, dolphins, whales, and non-human primates [4,5,6,7,8]. In addition, certain Helicobacters were isolated from laboratory animals [7, 9,10,11]. Convincing evidence has identified H. pylori as a causative bacterium of gastric adenocarcinoma, mucosa-associated lymphoid tissue lymphoma, and peptic ulcer in humans [12]. However, studies of the association between diseases and non-H. pylori species are lacking. There are only a few reports about non-H. pylori infection in humans, mostly in immunocompromised patients [13,14,15,16]. Thus, the pathogenicity and genetic characteristics of non-H. pylori remain to be explored.

In 2015, we reported the isolation of Helicobacter himalayensis sp. nov. strain 80(YS1)T from Marmota himalayana in the Qinghai-Tibet Plateau, China. This was the first report that a new species of helicobacter could be isolated and cultured directly in the gastrointestinal mucosa of a Marmota himalayana. M. himalayana live in the Qinghai-Tibet Plateau, a relatively isolated environment with an average altitude over 4000 m, which induces great differences in animal and plant biodiversity compared with plain regions. H. himalayensis, a new species of Helicobacter isolated from this plateau, may have unique evolutionary, pathogenic, and drug resistance characteristics because of its unique natural environment and host.

With development of next-generation sequencing technologies, many whole-genome sequences for non-H. pylori species have been generated in the past five years [17,18,19,20]. The deposition of this genomic data in the National Center for Biotechnology Information (NCBI) database has encouraged research seeking a deeper understanding of Helicobacter species at the genomic level. Therefore, in the present study we aimed to sequence the whole genome of H. himalayensis, and to explore the molecular evolution and potential pathogenicity of H. himalayensis using genome-wide comparative analyses based on the existing genomes of Helicobacter species.

Results

General features of H. himalayensis genome

The nucleotide sequence of the genome of H. himalayensis strain 80(YS1)T was deposited in the NCBI Databases with accession No. CP014991. The final genome assembly of H. himalayensis 80(YS1)T contained 103 contigs with a draft genome size of 1,829,936 bp and a genomic GC content of 39.89% (Table 1). The general annotation was performed by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). It predicted 1769 genes in total, among which 1664 belongs to predicted proteins, 1290 (77.52%) could be assigned to a function with a high level of confidence, and 374 (22.48%) were assigned as hypothetical proteins. The genome contained 39 tRNAs, a pair of 16S rRNAs, a pair of 23S rRNAs, a pair of 5S rRNAs, 3 ncRNAs and one phage intergrase, however, no plasmids or insertion sequence (IS) elements were found. Moreover, a genomic island predicted by a online tool IslandViewer 4, was named HhiG1 in the genome. The genomic island contained multiple genes encoding four restriction endonuclease subunit S, a restriction endonuclease subunit R, a type I restriction-modification system subunit M, a type II toxin-antitoxin system RelE/ParE family toxin, a type II toxin-antitoxin system RelB/DinJ family antitoxin, and site-specific integrase, as well as some hypothetical proteins (See Supplementary Table 1, Additional File 1). HhiG1 comprised 52,677 bp at position 1,738,356 to 1,791,033 in genome .

Table 1 General features of the H. himalayensis, H. cinaedi, H. hepaticus and H. pylori

Phylogenetic relationship and genomic collinearity features of H. himalayensis with other Helicobacter species

Seventeen whole-genome sequences of Helicobacter species, including H. himalayensis, and two whole-genome sequences of Campylobacter jejuni and Acetobacter pasteurianus as outgroups, were selected to construct the phylogenetic tree. H. himalayensis was in a single clade that was adjacent to H. cinaedi, H. bilis and H. hepaticus. These three Helicobacter spp. were in a node that was far away from H. pylori (Fig. 1a). Moreover, the phylogenetic relationship between H. himalayensis and the selected sixteen Helicobacter species, based on the core-genome, were also analyzed (Fig. 1b). The results were consistent with the phylogenetic map based on the Helicobacter genome sequences, which indicated that H. himalayensis is evolutionarily close to H. hepaticus and H. cinaedi but not to H. pylori. The general genomic features of H. himalayensis, H. cinaedi, H. hepaticus, and H. pylori are presented for comparison in Table 1. H. himalayensis has a larger genome than that of H. hepaticus (1,799,146 bp) and smaller genome than that of H. cinaedi (2,240,130 bp) [21]. The genomes of all three Helicobacter spp. were larger than that of H. pylori. All three Helicobacter spp. have phages or phage like elements in their genome, but H. pylori has none. By contrast, H. pylori has IS elements (IS605 and IS606), whereas H. himalayensis, H. cinaedi, and H. hepaticus do not have.

Fig. 1
figure 1

Phylogenetic tree produced by the comparison of the genome sequences of selected bacteria using MEGA X. a Phylogenetic tree based on the pan-genomes of the seventeen Helicobacter species and two outgroups. Campylobacter jejuni NCTC 11168 and Acetobacter pasteurianus IFO 3283–32 were used as outgroups. b Phylogenetic tree based on the core-genomes of the seventeen Helicobacter species without outgroup. The scale indicates the divergence (substitutions/site). The labeled numbers represent the posterior mean of rates on the corresponding branches

Figure 2 shows a comparison of the gene sequences of H. himalayensis with those of H. cinaedi, H. hepaticus, and H. pylori at the whole-genome scale. The results revealed a very high degree of genome collinearity between H. himalayensis and H. hepaticus (Fig. 2a), as well as between H. himalayensis and H. cinaedi (Fig. 2b). However, it also revealed a low degree of genome collinearity between H. himalayensis and H. pylori (Fig. 2c).

Fig. 2
figure 2

Map of genome collinearity between H. himalayensis with H. hepaticus, H. cinaedi, and H. pylori. a H. hepaticus. b H. cinaedi. c H. pylori. A red line indicates that the gene sequence is transcribed in the forward direction, and a blue line indicates that the gene is transcribed in the reverse orientation. The homologous sequences was identified as > 100 bp in length and at least 70% homology

Function analysis of H. himalayensis

Of the 1664 predicted proteins, 1184 were clearly assigned to a functional classification with evidence and 202 were poorly functionally characterized using the Clusters of Orthologous Groups (COG) database (Fig. 3). Moreover, there were 105 proteins without any annotations in the COG database and 173 were not in the database. The numbers of genes identified through functional classification in the COG database for “information storage and processing”, “metabolism”, and “cellular processes and signaling” were 284, 546, and 426 respectively. Compared with H. cinaedi, H. hepaticus, H. bilis, and H. pylori, H. himalayensis has more genes for functions in the “cell wall/membrane/envelope biogenesis” and “coenzyme transport and metabolism” sub-branches, but fewer genes for functions in the “cell cycle control/ cell division/ chromosome partitioning”, “intracellular trafficking/ secretion/ vesicular transport”, “lipid transport and metabolism” and “carbohydrate transport and metabolism” sub-branches (Table 2). The circular genome atlas of H. himalayensis integrating different kinds of information is shown in Fig. 4, that including the protein-coding genes, tRNA and rRNA genes, GC content and GC skew information.

Fig. 3
figure 3

COG functional classification of genes belonging to H. himalayensis

Table 2 Proportion of genes in COG function clssification for helicobacters
Fig. 4
figure 4

Circular genome atlas of H. himalayensis

In addition, Forty-two genes were predicted to match virulence factors genes in the VFDB from Helicobacter genus, and 83.3% (35/42) of them were associated with flagella motility and bacterial invasion (See Supplementary Table 2, Additional File 2). Other virulence factors included cytolethal distending toxin (cdtA, cdtB, cdtC), lipopolysaccharide Lewis antigens (futA), neutrophil activating protein (napA), autoinducer-2 production protein (luxS), and catalase (katA). The H. himalayensis genome does not contain a urease and the Cag pathogenic island, like H. pylori. However, the genome contained other predicted virulence factors associated with migration, invasion, colonization, and carcinogenesis, which might be pathogenic to the host.

Discussion

In the present study, the whole genome sequence data of H. himalayensis, a new species of Helicobacter, were obtained. The H. himalayensis genome was compared with those of other Helicobacter species deposited in the NCBI database. The results of the comparative genomic analyses not only confirmed phylogenetic position of H. himalayensis in relation to other Helicobacter species, but also predicted its potential pathogenicity by matching the virulence factor genes in VFDB.

Several smaller stretches of anomalous GC content were identified using the GC mol% content analysis around the genome (Fig. 4). In pathogenic bacteria, this type of anomalous GC content is usually associated with genome islands [22]. For example, the H. pylori cag pathogenicity island is intimately associated with increased potential to cause more severe pathology, including cancer and gastric ulcers [23]. There were five fragments with anomalous GC content in the H. himalayensis genome, and the predicted genome island HhiG1 was covered by one of the five fragments, which suggested that HhiG1 might be related with the pathogenicity of H. himalayensis.

H. himalayensis is phylogenetically close to H. cinaedi and H. hepaticus, based on the whole genome sequence, which is consistent with the previously reported phylogenetic relationship based on the analysis of housekeeping gene analysis [24]. Moreover, the predicted genes of H. himalayensis had higher identity to those of H. cinaedi and H. hepaticus than to those of H. pylori, which confirmed the previous result of evolutionary analyses from another perspective. H. cinaedi was first isolated from a rectal swab from a homosexual men, while H. hepaticus was first isolated from the livers of mice. Both of them are unable to colonize the gastric mucosa, because they do not express a urease to overcome the acidic environment in the stomach. These helicobacters with the term “enterohepatic helicobacters (EHH)” colonize the intestinal mucosa or the liver, and are associated with chronic liver or intestinal inflammation [25]. Recently, many cases of EHH infection in humans were reported, mainly in immunocompromised patients [3]. H. himalayensis is closely related to H. cinaedi and H. hepaticus; therefore, H. himalayensis is predicted to have similar pathogenic effects.

Based on the genomic functional analysis of H. himalayensis, we found that in addition to the 35 virulence factor genes related to flagellar motility, H. himalayensis also has genes encoding CDT, lipopolysaccharide Lewis antigens, neutrophil activating protein, autoinducer-2 production protein, and catalase. These virulence factor genes from the VFDB have been identified in H. hepaticus, H. pylori, or other bacteria in experiments [26]. CDT is a typical virulence factor, having DNase I activity that induces DNA double-strand breaks, apoptosis, and G2/M cell cycle arrest in cultured mammalian cells. In vivo, CDT is also associated with carcinogenesis [27, 28]. The Lewis antigen (encoded by futA) suppresses the immune response to the bacteria permits it to adhere to the gastric mucosa, or has a role in certain aspects of autoimmunity [29]. Neutrophil activating protein (encoded by napA) could promote the production of reactive oxygen radicals in H. pylori or the adhesion of human neutrophils to endothelial cells [30, 31]. Autoinducer-2, encoded by luxS, is produced and detected by a wide variety of bacteria, and is presumed to facilitate interspecies communications [32]. katA is a gene coding catalase, which protects against reactive oxygen species damage [33]. It is hypothesized that these orthologous genes in the H. himalayensis genome have similar effects and may also confer potential pathogenicity on the host. Till now, no animal experiment has been done with H. himalayensis infection, but the result of histopathological examination for the gastrointestinal mucosa of M. himalayana where H. himalayensis isolated showed obvious lesions, while the gastrointestinal mucosa of M. himalayana without H. himalayensis isolated had no lesions (See Supplementary Figure 1, Additional File 3). Therefore, whether H. himalayensis containing these virulence factor genes is a pathogen to host, should be confirmed by further experimental studies.

Conclusions

The present study described the characteristic of the H. himalayensis genome, analyzed its phylogenetic relationship with other helicobacters, and highlighted the potential pathogenicity of 80(YS1)T to the host. Further research into the pathogenesis of this potentially pathogenic bacterium is necessary and might help to manage diseases caused by H. himalayensis.

Methods

Bacterial strain

H. himalayensis sp. nov strain 80(YS1)T (identical to CGMCC1.12864 and DSM28742) was used in this study. It was cultured as previously described, with some modifications [24]. Briefly, the stored strain was revived and cultured on brain heart infusion agar plates containing 8% (v/v) horse blood, 2.5 μg/mL trimethoprim, 10 μg/mL vancomycin, and 1.25 U/mL polymyxin B, under a hypoxic atmosphere (5% CO2, 5% H2, 0.5% O2, and 90% N2) at 35 °C for 3–5 days.

Genome DNA extraction and whole genome sequencing

A Wizard® Genomic DNA Purification kit (Promega, Madison, WI, USA) was used to isolate the genomic DNA from collected cells of strain 80(YS1)T, according to the manufacturer’s instructions. Sequencing and library construction were carried out using the Hiseq2000 90 bp paired end sequencing platform (Illumina, San Diego, CA, USA. In total, 273,256 polymerase reads (242 bp/reads) were generated with 33.1× theoretical coverage. After filtering out all low quality reads, 266,891 reads (97.7%) were assembled into 103 contigs using 454/Roche Newbler (Roche Molecular Systems Inc., Branchburg, NJ, USA). All the contigs were sorted using the genomes of H. pylori 26695 and H. cinaedi ATCC BAA847, and verified by sequencing using an ABI BigDye Terminator V3.1 Kit and ABI 3730 sequencer (Applied Biosystems, Foster City, CA, USA). Conventional Sanger sequencing of polymerase chain reaction (PCR) fragments, based on brute force PCR, was used to for gap filling among the contigs and scaffolds to complete the genome.

Comparative genomics analysis

The assembled genome of strain 80(YS1)T was compared with the available Helicobacter spp. genomes deposited in the NCBI Database. Sixteen Helicobacter spp. and two outgroup stains (Campylobacter jejuni and Acetobacter pasteurianus) were selected, and the whole genomes were download from NCBI Database. For pan-genome analysis, SNP sequence from the selected genomes were generated using KSNP3 (version 3.0) [34], and analysed using MEGA X [35] to construct the phylogenetic trees by Neighbor-Joining algorithm with 1000 bootstraps. For core-genome analysis, genes present in each selected Helicobacter genome were classified as core. The core-genome compared files were generated using roary (version 3.12.0) with a BLAST cutoff of 70% identity of gene (−i 70). MEGA X was used to construct the phylogenetic trees by Neighbor-Joining algorithm with 1000 bootstraps based on the core-genome. Collinearity analysis of H. hemalayensis with other three helicobacter species were performed using BLASTn. To identify homologous sequences (> 100 bp in length and at least 70% homology), 80(YS1)T genomic contigs were aligned with those of three other Helicobacter species, using BLASTn as implemented in NCBI BLAST+ v. 2.2.8 [36]. Progressive Mauve, with match seed weight = 15, minimum Locally Collinear Block weight = 45, minimum island size = 50, maximum backbone gap size = 50, and a minimum backbone size = 50 [37] was used to compare the genomes of the Helicobacter spp. using multiple sequence alignment. CIRCOS v. 0.64 [38] was used to construct a visual representation of the links between homologous regions of H. himalayensis sp. nov strain 80(YS1)T and those of three other Helicobacter species.

Function analysis of H. himalayensis

The identified H. himalayensis proteins were assigned and annotated to the database of COG by eggnog-mapper (version 2) [39]. COG classification was conducted by using DIAMOND v0.8.37.99 (−evalue 1e-5) [40]. The results of the COG classification were visualized using R (version 3.6.2). The genomic island of H. himalayensis was predicted by an online tool IslandViewer 4 [41] (http://www.pathogenomics.sfu.ca/islandviewer/upload/). The virulence factors of H. himalayensis were searched in the virulence factors database (VFDB) using the integrated and automatic pipeline, Vfanalyzer [42] (http://www.mgc.ac.cn/cgi-bin/VFs/v5/main.cgi?JobID=May_28-1427107863).