Vascular plants have adapted to virtually all terrestrial environments, no matter how benign or stressful. Extremophiles are the plants operating in the most challenging environments [1], such as those dominated by the extreme cold in Antarctica [2], wide temperature swings and extreme drought in deserts [3], or salinity in combination with a broad range of other stresses. This last group, the halophytes, are the best documented [4]; the Kew Gardens database [5] recognizes over 1,500 species. Table 1 summarizes some examples of extremophile transcriptomes and genomes that have been published in recent years, at increasing levels of complexity as new sequencing technologies have become available. Six of these plants and their ecological contexts, not all familiar to most plant biologists, are illustrated in Figure 1.

Table 1 Recent studies on extremophile genomes and transcriptomes
Figure 1
figure 1

Some examples of extremophiles providing genome- or transcriptome-level data relevant to abiotic stress adaptation. These species are representative of those listed in Table 1. (a) The shores of Lake Tuz in central Anatolia (Turkey) were the original collection site for Thellungiella parvula(b). Note the extensive salt flat where an ephemeral lake would be in a rainy season. (c)Mesembryanthemum crystallinum, known as the common ice plant for the salt crystals excreted from bladder cells on the leaves and stems. (d)Salicornia europaea (a relative of S. brachiata, Table 1) shown in the mud flats at Bull Island, Dublin, Ireland. (e)Heritiera litoralis, one of 27 species in North Queensland, Australia, shown growing along a creek with salinity varying between fresh water and ocean water. (f)Rhizophora mangle, shown as an ocean-fringing forest in the background and as substrate-stabilizing pioneers in the foreground.

Because of their diverse life forms and life history strategies and in some cases their experimental tractability, halophytes have attracted more attention than the other groups at the molecular level. These include shrubs and forbs (such as Salicornia spp. (Table 1, Figure 1d), Chenopodium spp., Atriplex spp.), grasses (such as Festuca rubra (Table 1), Spartina spp., Aeluropus spp., and two adapted to saline sodic deserts, Leptochloa fusca and Leymus chinensis), trees (several mangroves, especially Avicennia and members of the Rhizophoraceae), and desert succulents (especially Mesembryanthemum crystallinum, Table 1, Figure 1c). Perhaps most importantly, from the standpoint of comparative genomics, the halophytes also include highly salt-tolerant close relatives of Arabidopsis thaliana.

Extremophiles are not simply outliers, plants with little to offer to the mainstream defined by poorly stress-adapted model plants. They occupy one end of a continuum of plant abilities to withstand stress. In all extreme environments, multiple stresses arise concurrently. For example, saline environments are often poor in essential nutrients (especially N and P), but replete to the point of toxicity in others (for example Mg, sulfate or micronutrients). They may experience seasonal swings between flooding and drought-related salt pans (for example, as shown in Figure 1b). Daily and seasonal temperature ranges may be very broad, or, increasingly over the past century, they may be natural or agricultural ecosystems degraded by overgrazing or inappropriate irrigation management. Understanding plants endemic to these environments provides us with the opportunity to understand the successful and unsuccessful adjustments that less tolerant plants make when faced with lesser stresses [1, 6].

Plant environmental responses are coordinated through crosstalk among multiple signaling and stress-response networks, and one of the major goals of modern plant biology is to understand these. For example, dehydration response elements, redox controls and the downstream processes they regulate are central to drought and cold responses [7]. In addition, abscisic acid mediates a broad range of environmental responses [8]. But networks are often, if not always, more complicated than can be revealed by analysis of genes 'known to be involved' in particular responses; using Gaussian graphical methods, for example, Ma et al. [9] visualized response networks to salt involved in signaling and adaptation - including a large number of unknown and uncharacterized genes. Clearly, 450 million years of land plant evolution has generated biological complexity that cannot be represented by the sequence of a single species, such as A. thaliana, or even a single representative of each major clade. By scrutinizing the few plant genomes that are available, however, the plant biology community is beginning to identify characters of developmental, physiological, and environmental integrative quality that can be deduced and refined into hypotheses for further scrutiny.

Next-generation sequencing (NGS) technologies (especially Roche 454 and Illumina-Solexa) brought with them the promise of high-quality, high-volume, low-cost genomes and transcriptomes. In fact, it is meeting this expectation. Using the resulting datasets, it is now possible to address the evolutionary mechanisms leading to adaptation to extreme environments. The recently sequenced genome of Thellungiellaparvula [10] exemplifies such efforts, providing resources for high-resolution genome-wide comparison with its non-extremophile relative, A. thaliana.

Here, we look at three notable evolutionary features reflected in the genomes that may contribute to adaptations to abiotic stress. These are gene duplication, lineage-specific, largely functionally uncharacterized genes, and epigenomic modifications effected by abiotic stress.

Genomic resources: the harvest of cheap deep sequencing

Clearly, the search for genetic mechanisms for environmental adaptation was never on hold pending the invention of NGS. Differences in individual genes unquestionably have a big role in adaptation to stress. In some cases, they have been inferred from the primary sequences of well-characterized genes, such as the 37-amino-acid stretch in L-myo-inositol-1-phosphate synthase, which distinguishes the salt-tolerant wild rice (Porteresia coarctata) from domesticated rice (Oryza sativa) [11], or the single-amino-acid variation in AtHKT1;1 (which encodes the high-affinity K+ transporter 1) that distinguishes coastal from inland clines of Arabidopsis [12]. In other cases, they have been implicated by the constitutively higher expression - in the absence of stress - of genes that are induced by stress in Arabidopsis, as in the resurrection plant Craterostigma plantagineum [13], the salt-tolerant poplar Populus euphratica [14], or the Arabidopsis relatives T. parvula and Thellungiella salsuginea (formerly T. halophila) [1517].

But genomes are far more than collections of protein coding sequences. To extend the search for 'genetic mechanisms' beyond this level of primary DNA or cDNA sequences, high-quality genomic resources are a paramount necessity. Especially critical are the genomes of closely related species, or even genotypes, that have adapted to different climates and habitats (that is, that have different lifestyles). Such genomes are beginning to appear, albeit few being proper extremophiles. The strawberry, apple, and peach genomes in the Rosaceae, for example, have begun to reveal how artificial selection for fruit quality has shaped these genomes [18]. Differences reflecting natural selection should also be discernible, for a start, from resources such as those summarized in Table 1.

However, given the long history of Arabidopsis as a model system, the new genomes most immediately useful for comparative studies at this point are likely to be those closely related to it. One of these is the genome of Arabidopsis lyrata [19], a potential comparative model for drought tolerance [20], and T. parvula (Figure 1a,b) will be perhaps even more useful for elucidating a broad range of environmental adaptations [10]. This species and the congeneric T. salsuginea are endemic to regions that experience temperature extremes, poor, degraded, and toxic soils, and especially very high salinities [6, 21]. The T. parvula genome is of particular interest because chromosomal assemblies that approach the coverage of A. thaliana are available. Moreover, because the Thellungiella species share many of the characteristics that led to the acceptance of Arabidopsis as a model (size, growth habit, seed amount, mutants, and transformation ability), they have been recognized as excellent candidates for comparative genomics studies [15, 22].

Data prospecting and data mining - finding the gems in the genome

Given the evolutionary continuum of genome-level adaptations to abiotic stress, the signatures of the critical adaptive mechanisms must be archived in the genomes of extremophiles. These are the gems in the genome; the challenge is to find and understand them. Comparisons of known genes and transfers between species - the mainstay approach before cheap deep sequencing - can now be supplemented with more extensive genome prospecting, and thereafter with large scale data mining. In this section, we consider three issues as they apply to the problem: what has been explored so far, what has been found, and what is needed to move forward.

First, comparing gene expression at the broad level reflected in Gene Ontology (GO) profiles, stress-tolerant and -sensitive species show different patterns [10]. Salt-tolerant extremophiles, on the one hand, seem to have a bias towards ion transporters in the gene function GO category that is not found in glycophytic species such as Arabidopsis. This bias is evident, for example, both in T. parvula and T. salsuginea [23, 24] and in the unrelated salt marsh halophyte Limonium sinense [25]. Arabidopsis, on the other hand, has invested in an arsenal of pathogen-responsive and developmentally related genes. It is reasonable to suppose - although future research could prove otherwise - that transporters would be critical to salt stress tolerance, and that developmental flexibility and pathogen protection would be important for a winter annual in a high resource environment.

Whole-transcriptome analyses of two mangrove species, Heritiera littoralis (Malvaceae; Figure 1d) and Rhizophora mangle (Rhizophoraceae; Figure 1e), showed a similar high representation of transport-related genes. Interestingly, despite these species having different life histories and physiological strategies in their adaptation to tropical intertidal habitats, their transcriptomes showed strikingly similar allocations in GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional categories, suggesting convergent evolution as 'mangroves' [26].

Going beyond transcriptomes, at the genome level, where are the gems, that is, what are targets currently considered most promising as being part of integrative mechanisms that lead to stress adaptation? At this point, there are few genomes complete enough to allow detailed comparisons, essentially only T. parvula and A. thaliana. In these two, although the gene spaces show extensive overall colinearity, there are also major translocations of gene-rich regions and extensive changes in intergenic sequences [10, 15]. Beyond this, there are three promising, potentially adaptive linkages to explore. These involve gene duplication, lineage-specific sequences, and epigenetic regulation. We look at these briefly below, with particular reference to their contributions as reflected in the newly released genome of T. parvula and the testable predictions that follow.

Stress adaptation by gene duplication

A striking feature of all plant genomes is gene enrichment due to duplication events. Suggested by Haldane in 1932 [27] and later popularized by Ohno [28], gene duplication as an evolutionary mechanism that adds new biological function is a well-established idea. Both the duplication rate and the proportion of retained duplicates seem to be greater in plants than in the other domains of life [29]. With respect to individual genes, the result is termed copy number variation (CNV). From resequencing the genomes of 80 individual Arabidopsis ecotypes, it seems that natural selection has led to CNVs covering 2.2 Mb of the reference genome [30]. CNVs can also arise in a short time. For example, they appeared in Arabidopsis in several generations under the selection pressure of a continuous stress in the laboratory [31]. These were distributed with a 42%:58% ratio between those initiated by transposable elements (TEs) and those involving tandem duplications.

Practically all angiosperms have polyploidy somewhere in their history, either current or long past. The initially increased gene dosage following duplication is often assumed to be beneficial for survival in new habitats, at least in the short term [32]. But although there are certainly polyploid species known for their extreme adaptations to abiotic stresses, an equal fraction are adapted to less harsh conditions, and there are also diploid extremophiles (including Thellungiella spp.). Thus, there is little overall evidence that polyploidy itself is a major evolutionary driving force leading to extremophiles.

In most plants, including T. parvula, genomes enriched by polyploidy have subsequently experienced extensive gene losses [33]. Their modern genomes reflect this. On the other hand, the copy numbers of other genes have increased as a result of segmental or tandem duplication events and duplication-translocation events. Individual copies of duplicated genes have, in many cases, also assumed new functionality resulting from mutation (neo-functionalization), or become specialized by acquisition of new promoters or regulatory elements (sub-functionalization). One such example is found in allopolyploid cotton (Gossypium hirsutum), in which reciprocal silencing of alcohol dehydrogenase homologs led to their expression in different tissues under distinct abiotic stresses [34].

An example of changes in transcript expression and neo-functionalization is provided by homologs encoding HKT1, a plasma membrane Na+/K+ transporter considered to be a genetic determinant of salt tolerance [12, 35]. HKT1 exists as tandem duplicated copies in both Thellungiella species [10, 17]. One copy encodes new protein functionality and also has an expression pattern different from that of the Arabidopsis counterpart [17]. This copy, called TsHKT1;2 in T. salsuginea, is induced under salt stress and leads to continued uptake of potassium ions. By contrast, TsHKT1;1 in Thellungiella behaves like the single-copy AtHKT1; because this protein transports sodium ions under salt stress [36], it exacerbates stress unless its expression is downregulated [37].

In T. parvula and in A. thaliana, a major source of CNV has been tandem duplication [10]. The extant populations of unique tandem duplicates reflect the fact that both copies originated since the species diverged about 11 million years ago [38] and that selective gene loss has occurred in each taxon in response to environmental selective pressures. Either through gene duplication or expression strength differences, a large number of other seemingly stress-relevant genes that have not been recognized in Arabidopsis show the hallmarks of CNV in Thellungiella, including a variety of ion transporters and membrane-located proton ATPases [10]. Such a difference might be expected, as Thellungiella shares only 40% of salt-induced regulation of transcript expression with A. thaliana [39].

Tandem duplications seem to have a more important role in shaping genomes for stress adaptations than polyploidy, segmental transposition-duplications, or ectopic duplication and translocation [40]; recombination and tandem duplication events may both become accelerated by environmental challenges [29]. As the result of unequal crossing-over during recombination, tandem duplications vary in their 'genetic neighborhoods', with copies receiving different regulatory motifs that can lead to drastic changes in expression [40]. A comparative study on plant genomes ranging from Arabidopsis to Physcomitrella showed genes associated with defense, transport functions, or abiotic stress responses enriched in tandem duplicates, whereas duplicates due to other mechanisms included genes enriched in other intracellular regulatory roles [41].

The A. thaliana and T. parvula genomes have approximately 10% of their total genes in tandem duplicates [10], and they are clearly implicated in the species' dramatically different stress tolerance strategies. This is exemplified by the amplification of NHX8 homologs (Figure 2a), known to encode a putative Li+ transporter in A. thaliana [42]. The duplication leads to a constitutively higher expression in T. parvula than in A. thaliana, which might be responsible for the apparently enhanced tolerance of T. parvula to high Li+ in its natural habitat in central Anatolia [43].

Figure 2
figure 2

Gain of stress-related gene copies through duplication inThellungiella parvula. Genomic regions containing homologs of NHX8, encoding a plasma membrane Li+ transporter [42], and AVP1, encoding a vacuolar proton transporter [79], were compared between T. parvula (Tp) and Arabidopsis thaliana (At). Shown are five colinear genes adjacent to NHX8 and AVP1 in the two species. Red arrows indicate duplications. (a)NHX8 is duplicated in tandem into three copies in T. parvula. (b)AVP1 homologs are duplicated and translocated from T. parvula chromosome 5 to chromosome 1. The colinear genomic region in A. thaliana chromosome 1 contains rolling-circle (RC)/helitron transposable elements in the place of an AVP1 homolog (dashed lines), suggesting a possible involvement of transposable elements in the translocation in an ancestor of the two species. The naming of T. parvula genes is according to version 2 of the genome [80].

Gene duplication may also result from single gene/segmental transposition-duplication or ectopic duplication/translocation [44] in such a way that any syntenic evidence for its ancestral origin is lost. Comparisons of T. parvula and A. thaliana genomes indicate multiple translocation-duplication events involving stress-related genes, exemplified by the duplications of orthologs of CBL10, encoding a calcium sensor [10], and AVP1, encoding a vacuolar proton transporter (Figure 2b) in T. parvula. The details of the relationship between this mechanism and stress-adaptive evolution deserve further exploration.

From these initial observations, there are a number of important questions for future studies. For example, how do duplications arise and become stabilized in targeted regions of the genome? Can stress increase the rate of their generation? How rapidly can new regulatory sequences evolve to become operational and do they evolve along with duplicated genes or independently? How rapidly can neofunctionalization occur and how is it balanced by gene loss? And how is tandem duplication called into play to adjust expression levels?

Stress adaptation through lineage-specific sequences

In any single genome, the suite of genes shaped by stress during adaptation should reflect, above all, the nature of the stresses. In turn, physiological and developmental changes will mirror genomic changes. Thus, both the suite of altered genes and their regulatory sequences can be expected to demonstrate lineage specificity.

Lineage-specific or taxonomically restricted genes (TRGs) are protein-coding genes that do not share sequence similarity outside the lineage. For that reason, they are also sometimes referred to as 'orphan genes' [45], or 'unknown'. Indeed, with each new EST collection or genome, the number of new unknowns (or 'unknown unknowns') proliferates. Regardless of the taxon, and in all the examples included in Table 1, 10 to 20% of the genes in eukaryote genomes or transcriptomes are TRGs [46]. In the Brassicaceae, family-specific TRGs are enriched for genes responsive to abiotic stresses [47]. It should be noted here that 'stress-responsive' or 'stress-related' are not labels indicating that the functions of the genes are then known. They simply mean that expression is induced by stress. In Arabidopsis, but not in T. parvula, the expansion is pronounced in pathogen-responsive genes; in T. parvula, but not in Arabidopsis, the expansion is pronounced in abiotic stress-related genes [10]. Across the spectrum of plant stress tolerance, pools of rapidly evolving TRGs may function as a reservoir of adaptive potential to challenging environments.

In Arabidopsis, 3.4% of all genes share sequence similarity only within the Brassicaceae, and another 5% lack similarity with any sequences deposited in public databases [48]. Because the Arabidopsis genome is the most fully annotated, it can be expected that the more evolutionarily distant from Arabidopsis a species is, the larger will be the number of TRGs, especially if the species is highly adapted to an environment in which Arabidopsis cannot survive. In the T. parvula genome, 11% of the annotated non-transposon putative protein-coding genes show no sequence similarity with A. thaliana genes. About two-thirds of those also lack similarity with any known plant sequence [10]. In Lobularia maritima (sweet alyssum), a salt-tolerant coastal relative of Arabidopsis [49], 35% of the salt-induced transcriptome is 'unknown', as are half of the salt-stress-induced transcripts from a facultative halophyte, Festuca rubra ssp. litoralis [50] and nearly 55% of the contigs in two mangrove transcriptomes (R. mangle and H. littoralis) [26].

Regulatory elements in the untranslated regions and promoters also show lineage specificity. For example, a detailed comparison of the upstream regulatory region of SOS1, a gene critical for salt tolerance in both Arabidopsis and Thellungiella [51], showed conserved repeat sequences and secondary structures in Thellungiella spp. and other halophytes that are absent in Arabidopsis. These differences in regions that are not transcribed are correlated with differences in expression observed for SOS1 in Thellungiella [15, 16].

TEs seem to have a key role in generating TRGs [31], because novel chimeric genes originate when active retrotransposons recruit new exons from flanking sequences [52]. About 10% of the Arabidopsis TRGs showed degenerate sequence conservation with transposable elements, a proportion double that among non-TRGs [47]. In the T. parvula genome, TRGs are enriched in pericentromeric TE-rich regions, suggesting roles of transposons in their evolution [10].

Without sequence similarities on which to base annotation, 'orphan genes' usually lack assignable functions [10, 26]. Clearly, this is a major obstacle to elucidating the genetic basis for any characteristic, not just for understanding stress tolerance, and overcoming this is an important target. Again, there are associated questions to be addressed. For example, why do duplications, especially those associated with TEs, seem to be clustered in centromeric regions? And how do lineage-specific, taxonomically restricted, or 'orphan' genes fit in the overall picture of functioning organisms? With regard to this last question, network analysis has already proved to be a good starting place. As has already been demonstrated in Arabidopsis transcriptional network models, the correlated expression of TRGs and genes with assigned functions in response to stresses provides, even without definitive annotations, useful linkages for visualizing co-expression patterns and identifying 'hub' genes that have core roles in regulating pathways [53, 54]. Although still limited for extremophiles, RNA-sequencing experiments performed under both transient and chronic stress conditions should, before long, contribute the expression data needed for extending similar networks to non-model - or new model - species.

Epigenetic modifications and non-coding RNAs

Beyond adaptations embedded in the basic nucleotide sequence of a genome, epigenetic controls have key roles in ensuring plant survival and reproduction under suboptimal growth conditions [55, 56]. Selective hypermethylation on salt stress adaptation in the extremophile Crassulacean acid metabolism (CAM) plant Mesembryanthemum crystallinum, for example, indicates both specific and global epigenetic restructuring in plant abiotic stress response regulation [57].

Methylation, alone or in combination with small interfering RNA degradation pathways, can also regulate transposon activity [58]. Although most TEs are inactive at any time, the proportion that is active is highly dynamic and stress responsive [59, 60]. TE copies can vary significantly within single species (for example, maize haplotypes [58]), or between closely related species; in T. parvula and T. salsuginea, TEs make up about 7.4% [10] and up to 50% (Q Xie, personal communication) of the genome, respectively.

The potential influence of retrotransposon-rich gene neighborhoods undoubtedly varies in ways yet to be fully appreciated. It may, for example, be represented in the HKT1 locus in T. parvula [10], as it is for ArabidopsisTIP1;2, the aquaporin whose high basal expression has been caused by TEs in the promoter region [61].

Plant microRNAs (miRNAs) also act epigenetically, through target mRNA cleavage or translational inhibition, and their effects are further compounded by feedback regulation. The majority are lineage specific or species specific. Even conserved miRNAs, however, have species-specific functions, as demonstrated by comparisons of Arabidopsis and poplar [62]. Only 80% of known miRNAs identified in the T. parvula genome share sequence similarity with A. thaliana miRNAs. Another 10% are found in Brassicaceae species, but not in A. thaliana [10].

An in silico comparison of the target sequences of miRNAs in the mRNAs of mangroves and Arabidopsis showed that both the conservation of miRNA targets in stress-responsive genes and their placements within those genes are lineage specific. They may also be similarly represented in unrelated species showing similar ecological affinities [23].

Both methylation and miRNA-based epigenetic regulation are fields of intense activity at present and, from the standpoint of stress adaptation, how miRNA targeting comes about and varies between species is an important question. Another is how the functions of miRNAs and protein-coding genes are regulated and coordinated. Can epigenetic signatures due to stress adaptation be trans-generational, and if so, for how many generations? The concept of trans-generational epigenetic stress signatures has support from some studies. For example, when Arabidopsis parent populations were exposed to abiotic stresses that increased global methylation, their progeny were more stress tolerant [63]. Similarly, in rice, parents with hypermethylation of particular loci in response to low-nutrient stress produced progeny with increased tolerance [64]. In dandelion (Taraxacum officinale), exposure to stress resulted in heritable markers, again implying epigenetic heritability for stress adaptation [65]. In Arabidopsis mutants impaired for small interfering RNA biogenesis, increased copy numbers of the ONSEN retrotransposon element were induced by heat stress. ONSEN insertion, in turn, rendered adjacent genes heat inducible. Unlike in wild-type plants, these numbers failed to decay over a period of 20 to 30 days. Because transposition was particularly active during flower development and before gametogenesis, the effect was trans-generational [60].

Concluding remarks

To know that the phenomena we have presented here operate is not sufficient. By themselves, sequences provide only the raw materials for addressing more important questions. On the one hand, they set the stage for exploring how genomes have evolved in plants with different adaptations to environmental conditions. On the other, and more fundamentally, expanding genomic resources bring the opportunity to explore mechanisms of genome evolution themselves.

The recently completed genome sequences of T. parvula [10] and the soon to be available genome of T. salsuginea [66] are critical resources, enabling high-resolution genome-wide comparisons between extremophiles and their non-extremophile crucifer relatives. Along with a dozen other transcriptomes of extremophile plants and numerous genomes from non-extremophiles, they have supported the ideas, first, that there is a basal set of genes shared between all plants, and second, that a subset of these has experienced selective modification and amplification of a sort required for adaptation to and success in changing or stressful environments. With sequencing technologies evolving rapidly, a 'third generation' of instruments will undoubtedly have an even greater transforming effect.

As output increases in amount and quality and cost comes down, it seems clear that the genome sequence of any plant species deemed important, and eventually multiple ecotypes of each, can, as needed, become available. The value and importance of this cannot be overstated in a world where the population is rising much faster than total agricultural production and land degradation is rapidly reducing the area useable for crops. Extremophiles provide not only a model for what is possible, but for the traits that may be necessary for crops in the future.