Introduction

Biogeography plays an extremely important role in the speciation of plants and animals. Allopatric speciation occurs when plant or animal species are geographically separated from one another over a long period of time (Staley 2004). For example, islands such as the Hawaii Islands in the Pacific Ocean are separated from other land masses by thousands of kilometers. Those species that managed to be transported there have speciated over millions of years to form novel species, primarily through genetic drift and selection.

Although geography plays a major role in plant and animal speciation, its role in bacterial and archaeal speciation is poorly understood. Indeed, at this time there is only one reported species of prokaryotic organism, the thermophilic, acidophilic archaeon, “Sulfolobus islandicus”, in which biogeography has been shown to play a role in speciation (Whitaker et al. 2003).

At this time it is not yet known whether sufficient speciation has occurred among the strains from different locations to justify separate species for the geographic varieties that have been reported. In order to determine whether these are separate species of prokaryotes, it is necessary to carry out DNA–DNA hybridization (DDH) among the strains. Genomes are now available of ten strains of “S. islandicus” that have been isolated from four separate hot spring locations (Iceland; Yellowstone National Park, WY and Lassen Park, CA in North America; and Kamchatka in Russia). In this paper we performed DDH using in silico (computational) procedures (Auch et al. 2010a, b; DSMZ 2013) to determine whether the strains are sufficiently divergent to warrant separate species status. According to the most widely accepted bacterial species definition (Brenner et al. 2005; Staley 2006; Wayne 1987), a value of more than 70 % DDH is necessary between any of the strains to qualify them as members of the same species.

Unfortunately “S. islandicus” is not a validly named species of the Archaea because it has not been validly named according to the International Code of Nomenclature of Bacteria (Lapage et al. 1992; Wayne 1987). As a result, no type strain exists for this “species”. Nonetheless, it is perhaps the most thoroughly studied prokaryotic organism from the standpoint of understanding its geographical distribution and several genomes of this species have been sequenced and annotated (Reno et al. 2009).

This investigation provides information on the inter-relatedness among the genomes of this organism.

Materials and methods

Genomic sequences of the ten “S. islandicus” strains were downloaded from the NCBI FTP site (ftp://ncbi.nih.gov/genomes/Bacteria). Their accession numbers are NC_012588, NC_012589, NC_012622, NC_012623, NC_012632, NC_012726, NC_013769, NC_017275, NC_017276, and NC_021058. To infer phylogenetic relationships among these strains we used the whole-genome based and alignment-free CVTree method (Qi et al. 2004; Xu and Hao 2009). This method has high resolution at the strain level (Hao 2011) and does not require the identification of homologous proteins. The CVTrees were constructed for all available prokaryotic genomes using different peptide lengths from K = 3–7. Because the most reliable trees are obtained at K = 5 and 6 (Li et al. 2010), we only show the K = 6 data in this report.

For in silico DNA–DNA hybridization the sequences were submitted to the Genome-to-Genome Distance Calculator (GGDC) at DSMZ (Auch et al. 2010a, b ; DSMZ 2013). The program GGDC 2.0 was used and the most stringent distance function was chosen for the DDH values listed in Table 1. These values have been shown to have high correlation with the 16S rRNA distance and experimentally derived DDH values (Meier-Kolthoff et al. 2013). Another whole-genome-derived parameter, Average Nucleotide Identity (ANI), has been proposed as an alternative to experimental DDH values (Goris et al. 2007). We used the JSpecies software (Richter and Roselló-Móra 2009) to calculate ANI for the ten “S. islandicus” genomes.

Table 1 DDH values (in %) between “S. islandicus” strain pairs

Results and discussion

Rachel Whitaker’s lab has studied the biogeographical distribution of seven “S. islandicus” strains isolated from two major continental locations, Euroasia (Iceland and Kamchatka, Russia) and North America (Yellowstone National Park, WY and Lassen National Park, CA) (Whitaker et al. 2003; Reno et al. 2009). Using MLSA, as well as whole genome analyses, a clear branching pattern of the phylogenetic tree according to the geographical separation of the strains was found. Subsequently, the genomes of three more “S. islandicus” strains were sequenced and analyzed (Guo et al. 2011; Jaubert et al. 2013). In whole-genome based CVTrees (Xu and Hao 2009) the phylogeny of these 10 strains is the same (Fig. 1) as that shown in Fig. 2 of Reno et al. (2009).

Fig. 1
figure 1

The whole genome CVTree of ten strains of “S. islandicus” based on 152 Archaea + 2286 Bacteria + 8 Eukarya at K = 6. The geographical origin of strains is given in parentheses at the end of each entry

Fig. 2
figure 2

Tree constructed using distances calculated from the DDH values. Note that the 0.02 bar at the lower-left corner cannot be simply interpreted as “number of substitutions”, but it may indirectly reflect the evolutionary time span

Therefore the geographical pattern of the distribution of the strains is confirmed. However, the question remains: are these strains members of the same species? To assess this, DNA–DNA hybridization was calculated based on genomic sequences. Since genomes of these ten strains are available, it is unnecessary to conduct experiment determinations because in silico DDH tests are now available (Auch et al. 2010a, b). Furthermore, a public GGDC web site is provided for this analysis by DSMZ (2013).

The results of the pairwise DDH percentages for all ten strains of “S. islandicus” that were obtained by in silico DDH using the GGDC web server are shown in Table 1. These data indicate that the range of DDH of the strains within a particular location support their being members of the same species. For example, the strains from Icelandic hot springs range from 88.9 to 92.9 %. Strains from Russia show values from 90.8 to 95.4 %. Strains from North America range from 82.4 to 89.0 %. Since all of these values are substantially above the 70 % threshold defined by (Wayne 1987), it can be concluded that the strains sequenced from each location are all members of the same species.

The differences between locations are more marked, as one would expect based on the geographical separation among the groups of strains. The DDH values of the Icelandic strains compared to those from Kamchatka range from 76.6 to 83.3 % whereas their DDH values in comparison with the North American strains range from 75.5 to 81.8 %. The DDH values of the strains from Kamchatka compared to those from North American range from 75.8 to 85.1 %.

Therefore, using the currently accepted definition for prokaryote species, all of these strains, regardless of location, are members of the same species, “S. islandicus”. However, it is also clear that each geographic group comprises a separate variety or geovar (Staley and Gosink 1999). Indeed, some of the lowest DDH values found between strains from the different locations (75.5–76.6 %) are close to the threshold value of 70 %.

Using the DDH values, a tree can be constructed that shows the relatedness among the ten strains (Fig. 2). Here we define the “distance” between two strains using the following distance formula:

$${\rm Dis} = (100 - {\rm DDH})/100. $$
(1)

The topology of the tree derived from DDH values is noted to match that obtained in the whole genome CVTree (Fig. 1). It is notable from these trees and Table 1 that strains from the USA are more closely related than those from Iceland and Russia.

The ANI values calculated using the JSpecies software are given in Table 2. These values agree well but show slightly less divergence as compared to the DDH values in Table 1.

Table 2 ANI values (in %) between ‘S. islandicus’ strain pairs

Clearly our results support the view that biogeography plays a role in the speciation of “S. islandicus”. Further, an argument could be made that the strains at each of the locations should be defined as separate species because the current definition of a bacterial species is in flux and may be challenged (Krichevsky 2011; Richter and Roselló-Móra 2009; Staley 2006; Ward 1998).

At this time, in order to describe these strains as separate species, it is necessary to fulfill the criteria of the International Code for Nomenclature of Bacteria (Lapage et al. 1992; De Vos and Trüper 2000; Wayne 1987). Ideally, at least one significant phenotypic feature would need to be found that is unique for each proposed new species. One of the hypothetical questions this paper raises is: Would the geographic location of a strain suffice as an acceptable property for the description of a species? The source of a strain is already a primary feature used in the description of bacteria and archaea. However, until the discovery of the endemic biogeographical clustering of “S. islandicus” strains, it has not played a major role in the description of any species. Clearly more evidence would be needed before this property could be used as a primary property to separate one phylogenetically related geographic cluster from one area as a separate species, from a cluster in another area.

Also, it would be essential for the description of these geographic clusters as separate species that a ‘type strain’ from each area would need to be deposited in at least two different internationally accepted culture collections. As such, the authors recommend that those who work with “S. islandicus” follow the Interational Code of Nomenclature of Bacteria and provide cultures to culture collections so that this species can be validly named and appropriate strains designated and deposited to make them available for others to study.