Introduction

Archaeoglobus fulgidus strain 7324 was recovered from hot oil-field water originating from a deep oil-well in the North Sea [1]. It shares many features with the A. fulgidus type strain VC16, e.g. dissimilatory sulfate reduction, utilization of lactate and pyruvate as carbon sources, irregular coccoid to disc-shaped cells, and blue-green fluorescence under the UV microscope due to the presence of Coenzyme F420. Strain VC16T was isolated from a shallow marine hydrothermal system at Volcano island, Italy [2]. The complete genome sequence of strain VC16T was reported in 1997 as the third archaeal genome to be fully sequenced [3] and A. fulgidus has since served as a prototype for studies of archaeal and hyperthermophilic sulfate reduction [4]. Here we report a summary of the features of A. fulgidus strain 7324, together with the description of the complete genomic sequencing and annotation and comparison with the genome of the A. fulgidus type strain and other Archaeoglobus spp.

Organism information

Classification and features

Genus Archaeoglobus comprises five validly published species; A. fulgidus [2], Archaeoglobus profundus [5], Archaeoglobus veneficus [6], Archaeoglobus infectus [7], Archaeoglobus sulfaticallidus [8], and one candidate species termed “ Archaeoglobus lithotrophicus” [9]. All are hyperthermophilic sulfate-reducers capable of heterotrophic or chemolitoautotrophic growth on H2 and CO2. The Archaeoglobus ‘clade’ also encompasses a few non-sulfate reducing anaerobic hyperthermophiles; Geoglobus acetivorans [10] and “ Geoglobus ahangari ” [11, 12], which are both Fe(III) reducers, and Ferroglobus placidus , which is capable of using ferrous iron, H2 and sulfide as electron donors with nitrate as electron acceptor [13]. Fig. 1 shows the phylogenetic affiliation of all current members of the Archaeoglobaceae family, including strain 7324. All Archaeoglobus species form small irregularly shaped cells. A scanning electron micrograph of A. fulgidus strain 7324 is shown in Fig. 2, revealing a similar cell shape as originally determined by transmission electron microscopy [1]. Strain 7324 has not been phylogenetically characterized by 16S rRNA gene sequencing before, but a wet lab genomic DNA: DNA hybridization with A. fulgidus strain Z, which, like the type strain, was recovered from the Vulcano island [14], revealed a genome hybridization value of 100% [1]. This close relationship was now confirmed via digital DNA-DNA hybridization [15] between strains VC16T and 7324 with a GLM-based DDH estimate of 93.9%. All three A. fulgidus strains share common physiological characteristics, like growth from 60 °C to above 84 °C, use of sulfate and thiosulfate as electron acceptors, optimal growth with lactate or pyruvate as carbon sources, and production of trace amounts of methane. Although the optimal growth temperature of strain 7324 was initially determined to 76 °C, we have routinely been cultivating it at 80 °C. In contrast to the other isolates, strain 7324 rapidly lyses after the stationary phase [1]. The main features of the organism are listed in Table 1.

Fig. 1
figure 1

16S rRNA-based phylogenetic tree showing the position of the Archaeoglobus fulgidus strains within the Archaeoglobaceae family, using Methanocaldococcus jannaschii as outgroup. Only species with validly published names are included. The sequences were aligned using Clustal X [46] and the tree was inferred using the Neighbor-joining algorithm in MEGA 6.06 [47]. The bar indicated number of substitutions per site. Bootstrap values ≥60% are indicated at nodes and are based on 100 replicates. Sequence accession numbers are indicated in brackets. A. fulgidus strain 7324 is in bold

Fig. 2
figure 2

Scanning electron micrographs of cells of A. fulgidus strain 7324. Bars equal 1 μm (upper panel) or 0.1 μm (lower panel)

Table 1 Classification and general features of Archaeoglobus fulgidus strain 7324 (DSM 8774)

Genome sequencing information

Genome project history

A. fulgidus strain 7324 was chosen for whole genome sequencing because it was isolated from a deep and hot oil reservoir while the closely related type strain, VC-16, was isolated from a shallow marine hot vent. A genome comparison might reveal particular adaptations of strain 7324 to the deep biosphere. The genome project information is given in the Genomes OnLine Database (Gp0102124). The genome sequence is deposited in GenBank (CP006577.1). A summary of the project information is shown in Table 2.

Table 2 Project information

Growth conditions and genomic DNA preparation

A. fulgidus strain 7324 was from our own collection at the University of Bergen. It was cultivated in anaerobic medium containing lactate and sulfate as described previously [1]. The incubation temperature was 80 °C. Genomic DNA was isolated using a modification of the cetyl trimethylammonium bromide method as described [16].

Genome sequencing and assembly

The genome was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [17]. The initial assembly of 454 raw data suggested a contamination of the sequenced sample. Using blast search, all contigs (>500 nt in length) could be assigned either to A. fulgidus or Thermococcus litoralis , an archaeon that shares the same habitat [18]. To overcome this issue, two additional blast searches including all contigs longer than 500 nt were performed against the previously sequenced genome of A. fulgidus VC-16T (NCBI/GenBank:AE000782) and all available genomic sequences of Thermococcus species in Genebank (Dec. 2010). Only sequences in length sharing more than 90% sequence identity with A. fulgidus VC-16T and having no hits in the Thermococcus blast database were kept. A total of 84 Newbler contigs could be assigned to A. fulgidus . Illumina raw reads were assembled to 223 contigs. Both draft assemblies were merged in a hybrid approach using the phred/phrap/consed pipeline [19]. After manual curation, a total of 27 ordered gaps were closed by bridging PCRs at LGC Genomics (Berlin). The final consensus sequence represents a single circular chromosomal element (103× coverage).

Genome annotation

Coding genes were predicted by GeneMark [20] as part of the genome annotation pipeline in the Integrated Microbial Genomes Expert Review system [21]. The tRNAs were identified by tRNAScan-SE-1.23 [22], while ribosomal RNA genes within the genome were predicted using the tool RNAmmer [23]. Other non-coding RNA genes were predicted using Infernal [24]. CRISPR elements were identified by the program CRT [25]. Manual functional annotation was performed within the IMG platform [21] and the Artemis Genome Browser [26].

Genome properties

The genome of A. fulgidus strain 7324 comprises one circular chromosome with a total size of 2,316,287 bp, which is 137,887 bp larger than A. fulgidus VC16T DSM 3404 [3]. The mole percent G + C is 48.08, which is slightly higher than the 47% value estimated previously with thermal denaturation [1] and slightly lower than for the type strain DSM 3404 (48.6%); in any case within the 1% threshold with the species’ type strain VC16T sensu Meier-Kolthoff et al. [27]. No plasmids were detected. The strain 7324 genome is the largest of the genome-sequenced Archaeoglobus species, the smallest one being the A. profundus genome with a total size of 1.56 Mbp [28]. Out of the total 2615 genes annotated in the 7324 genome, 2558 were identified as protein coding genes and 56 as RNA genes (Table 3). Only 67.29% of the genes could be assigned to COG functional categories as listed in Table 4. Five CRISPR repeat regions were identified, as compared with only three in strain VC16T (AE000782). There is only one rRNA operon (Fig. 3). As for VC16T, there is no apparent GC skew in the genome, which could indicate the presence of multiple DNA replication origins and explain previous difficulties in precise mapping of replication origin(s) in this species using a marker rescue analysis approach [29, 30].

Table 3 Genome statistics
Table 4 Number of genes associated with general COG functional categories
Fig. 3
figure 3

Circular map of the A. fulgidus strain 7324 genome. From outside to the center: Genes on forward strand (colored by COG categories), genes on the reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew. The non-coding RNA gene, hgcC, is indicated, as well as the large insertions (denoted InsI and InsII), as compared to A. fulgidus VC16T

Insights from the genome sequence

Genes encoding central metabolic pathways like dissimilatory sulfate reduction, lactate oxidation, a complete TCA cycle and the Acetyl-CoA pathway were identified in strain 7324, confirming previous physiological characterization and similarity with strain VC16T [1]. A genome alignment between strains VC16T and 7324 revealed a large degree of genomic similarity and sequence synteny, interrupted mostly by two large additional regions of about 64 and 109 Kbp (InsI and InsII, respectively) in strain 7324 as compared to VC16T (Fig. 4). Both these regions are flanked by a disrupted tRNA gene, which implies that InsI and II represent genomic insertions. They also possess a considerably lower G + C content (42%) as compared to the average of the genome (Fig. 3), indicating a result of recombination with an AT-rich organism. This is further supported by identification of a non-coding high-GC RNA gene in InsII belonging to the hgcC family of ncRNA (RFAM v12 accession code RF00062) typically found in AT-rich hyperthermophiles (Figs. 3 and 5b). This ncRNA family was originally identified in the genomes of Pyrococcus furiosus and Methanocaldococcus jannaschii [31] but its function is still unresolved.

Fig. 4
figure 4

MAUVE version 20,150,226 [48] alignment of the A. fulgidus strains VC-16T (upper) and 7324 (lower) chromosomes. The large insertions in the 7324 chromosomes are indicated as regions InsI and InsII. Approximate positions of the rRNA and dsr (dissimilatory sulfite reductase) genes are indicated by arrows

Fig. 5
figure 5

Genetic structure of the large insertions in strain 7324; a InsI; b InsII. Genes are colored according to COG color codes

InsII also contains a gene encoding an extra homologue of the Orc1/Cdc6 family of replication initiation control proteins in addition to the two other cdc6 homologous in the 7324 and VC16T genomes. The closest homologue identified by a blast search is from A. veneficus (62% amino acid sequence identity). The majority of the other genes are hypothetical or have a general function prediction only. InsI carries two CRISPR repeat regions and 14 genes encoding CRISPR-associated proteins including a Cas6 homologue (Fig. 5a). The rest of this insert mostly contains hypothetical genes.

A Venn diagram shows that A. fulgidus strains VC16T and 7324 share a large number of genes (2292) (Fig. 6a), reflecting the high degree of genome similarity. The 263 genes unique to strain 7324 include about 200 hypothetical genes/uncharacterized functions most of them belonging to the large insertions and the CRISPR-associated genes of InsII. The Venn diagram including all the five genome-sequenced strains revealed an Archaeoglobus core genome of 1001 genes (Fig. 6b), most of which encode energy-yielding, biosynthetic and regulatory functions. About 200 of the core genes belong to the hypothetical/uncharacterized category in the EggNog database [32]. This is considerably lower than the 32% fraction of unassigned genes for the entire strain 7324, but underpins that a large part of central gene functions in this genus still remain to be disclosed. About 2900 genes belong to the Archaeoglobus pan-genome, being unique to one of the genomes or shared by 2 to 4 of the species.

Fig. 6
figure 6

Venn diagrams showing the distribution of orthologous and unique genes for a A. fulgidus strains DSM 4304 and DSM 8774, and b for all the completely genome sequenced Archaeoglobus representatives; A. fulgidus DSM 4305, A. fulgidus DSM 8774, Archaeoglobus profundus DSM 5631, Archaeoglobus veneficus DSM 11195 and Archaeoglobus sulfaticallidus DSM 19444. The diagrams were prepared using ‘jvenn’ [49] as implemented in the EzBioCloud’s Comparative Genomics Database [50]

Archaeoglobus fulgidus strain 7324 has been reported to degrade starch [33] and several enzymes involved in starch degradation have been purified from cells grown on starch. These enzymes include cyclodextrin gluconotransferase, cyclodextrinase, maltodextrin phosphorylase, and phosphoglucomutase, ADP-dependent glucokinase, ADP-dependent phosphofructokinase and pyruvate kinase [34,35,36,37]. The enzymes were biochemically characterized and the N-termini (with exception of cyclodextrinase) have been determined. They showed highest sequence identity to proteins from Thermococcales , e.g. from Thermococcus litoralis . However, in the genome of A. fulgidus presented here, genes encoding sugar degrading enzymes have not been identified. Rather, various analyses indicated that A. fulgidus strain 7324, deposited as DSM 8774, was contaminated with a Thermococcus strain. The genome of this Thermococcus strain has also been sequenced and all genes encoding the sugar degrading enzymes, originally attributed to A. fulgidus strain 7324, were found in the Thermococcus strain genome. Also, glutamate dehydrogenase from a DSM 8774 culture grown in the presence of small amounts of yeast extract (0.3 gL) was previously purified and characterized [38, 39], but appears also to be encoded by the Thermococcus strain genome. The purity of the original A. fulgidus 7324 isolate was not assessed by 16S rRNA gene sequencing prior to deposition at DSMZ [1] and whether the Thermococcus contamination was present in the original culture or has been introduced at a different stage is not known. The genome analysis of this Thermococcus strain, which appears to represent a novel Thermococcus species, will be published separately.

Conclusions

The complete genome of A. fulgidus strain 7324, recovered from hot water produced from an oil well in the North Sea was sequenced and annotated. In addition to the A. fulgidus type strain, VC16, isolated from a shallow hot vent in the Mediterranean, this is the second A. fulgidus genome to be characterized. The two strains share 93.5% genome sequence similarity, and differ mostly by two large insertions of 64 and 109 Kbp in strain 7324 that seem to have originated from an AT-rich archaeon. The insertions carry two additional CRISPR elements, an extra cdc6 gene, a variety of mobile genetic elements and a large number of hypothetical and unassigned genes. Based on comparison with four other Archaeoglobus spp. genomes, the Archaeoglobus core genome was estimated to 1001 genes. No particular traits indicating adaptation to the petroleum reservoir subsurface environment could be identified.