Hydrobiologia

, Volume 783, Issue 1, pp 191–208

Genomic tools for new insights to variation, adaptation, and evolution in the salmonid fishes: a perspective for charr

Open Access
CHARR II Opinion Paper

DOI: 10.1007/s10750-015-2614-5

Cite this article as:
Elmer, K.R. Hydrobiologia (2016) 783: 191. doi:10.1007/s10750-015-2614-5

Abstract

The past few years have seen an absolute revolution in genomic technologies and their potential applications to ecology and evolutionary biology research. Such advances open up a range of opportunities for research on non-model organisms and individuals drawn from wild populations. This has resulted in exciting new research seeking to identify the genetic polymorphisms important in adaptation and speciation and how they are organised within the genome. Building on this, there is great interest in the extent to which similar evolutionary patterns are found across multiple populations, particularly whether consistent genetic mechanisms are associated with recurrent phenotypes. A powerful context for disentangling these mechanisms is to focus on highly diverse radiations, where phenotypes vary in and across environments. Therefore, the high diversity found within and among species of salmonid fishes such as charr (Salvelinus) make for an ideal ‘non’-model for genomic research. This paper outlines some of the current approaches available in ecological genomics and highlights some recent advances in salmonid research. It also suggests avenues for the sort of predictions that can be derived from ecological genomics, with the aim of understanding the genetics behind the fantastic diversity of salmonid fishes.

Keywords

Ecological genomics Transcriptomics Speciation Genetic mapping Salmonid fishes Charr 

Recent advances in the field of molecular biology have exciting implications for research on the ecology and evolution of natural populations. Particularly, high throughput ‘next-generation’ sequencing (NGS) (also known as ‘second-generation’, or ‘massively parallel’ sequencing) can generate huge amounts of genomic or transcriptomic data on almost any organism. NGS is dramatically decreasing in cost and the associated tools and pipelines are within reach of even modest research groups. This is therefore becoming an invaluable tool for understanding the origins and maintenance of biodiversity. These exciting new approaches can address long-standing questions in evolutionary biology, such as: What is the genetic basis of adaptations? How do closely related species differ? Why are some lineages more diverse than others?

The challenge for ‘omics’ of non-model organisms now shifts away from raw data generation to focusing on informative evolutionary, ecological, and environmental contexts in order to most efficiently and effectively address the question at hand. For questions on adaptive divergence and ecological speciation, salmonid fishes in general and charr (genus Salvelinus) in particular are exceptional models because of their high diversity. Across the Holarctic there are multiple and rapidly evolving divergent phenotypes (known variously as ecomorphs, morphotypes, or trophic morphs) of charr that differ in traits such as size, shape, diet, spawning time, and life history (Klemetsen, 2010; Muir et al., 2015). This high diversity is particularly informative in the context of parallel evolution of similar ecologically relevant morphologies across independent sites; such replication increases power to distinguish signal from spurious noise (Schluter, 2000; Elmer & Meyer, 2011).

The phenotypic and genetic variation within species is the putty from which new diversity, local adaptation, specialisation, and extent of variation can arise. Population genomic and association mapping approaches have made it possible to detect selection and unravel the genetic basis of variable phenotypic traits in the complexity of natural environment. ‘Population genomics’ refers to the study of genetic variation at high resolution within individuals (from hundreds to thousands or even millions of loci distributed across the genome), focusing on individuals within and across populations (Luikart et al., 2003). Population genomics can be seen as a step change from population genetics because it involves genome-wide effects rather than locus specific effects that are disassociated from the overall level of genome organisation. Because of this higher resolution in number of markers and in principle understanding of their organisation, inferences of the patterns under very shallow divergences can be identified. For understanding and disentangling evolutionary processes, population genomics is powerful because it is possible to identify the genomic regions that are responsive to selection as well as seek the causative genetic variation underlying adaptive divergences in natural populations (Luikart et al., 2003; Storz, 2005; Butlin, 2010). Further, loci under selection can also be differentiated from neutral regions, which can then be used for estimating divergence time, population splits, bottlenecks, and other demographic processes. This population genomic perspective is one not just on individual loci, but their organisation in the genome and their influence on phenotypic traits (loosely defined as ‘genomic architecture’). While ‘population genetics’ tends to focus on estimators that summarise that variation into a single metric, ‘population genomic’ approaches focus instead on where in the genome the differences between individuals and populations lie.

When studying rapidly diverging and highly variable species such as charr, one aim is to identify if there are distinctive genomic organisations that might facilitate rapid adaption and divergence (Nosil, 2012; Seehausen et al., 2014). For example, for causative genetic variants, it is thought that if de novo mutations have very large effects and increase fitness, selection acting directly upon them can overcome the influence of gene flow and facilitate divergence (Barton, 2010; Yeaman, 2013; Flaxman et al., 2014). Alternatively, tight complexes of loci or functional supergenes (e.g. through genetic linkage, proximity, or chromosomal rearrangement such as inversions) could allow rapid (Flaxman et al., 2014)—even immediate—segregation of phenotypic traits under selection (Schwander et al., 2014). Divergence despite gene flow (e.g. in sympatry) is hypothesised to have a distinctive signature across the genome, with much of the genome having low divergence and some regions of the genome being very different between diverging populations (for review see Feder et al., 2012; Via, 2012). Discerning and disentangling such patterns in wild populations, to identify whether empirical data support theoretical predictions, is a key goal of population genomics (Butlin, 2010; Elmer & Meyer, 2011; Rice et al., 2011; Rogers et al., 2013).

The aim of this paper is to highlight some of the exciting genomic approaches for studying ecology and evolution. While genomics can be used to address myriad questions in ecology and evolution, from systematics to functional genetics (Landry & Aubin-Horth, 2014; Seehausen et al., 2014), in this paper, I focus particularly on the genomics of how species differ in ecologically relevant phenotypes within and across environments, and the genetic basis of adaptive phenotypes (Fig. 1). First, I explain some key sequencing and genotyping tools in ecological genomics. Then I outline some of the fascinating current research in the field, focusing especially on results from NGS ecological genomics on wild populations of charr and other Salmoninae (salmon, charr, and trout in the genera Salmo, Salvelinus, and Oncorhynchus). Then I highlight some key research questions on charr that have been identified by the community, and suggest how some of these recent methods can be applied to outstanding questions about this highly diverse lineage. I close with a perspective on how future research in ecological genomics can help inform predictions and conservation efforts for postglacial salmonids.
Fig. 1

A simplified conceptual workflow for ecological genomics. Biological sampling should include individuals from populations of interest (here shown as fish in lakes, but could alternatively include captive populations), as well as some surrounding populations as genetic context or outgroups (dashed lines). See Box 1 for some sampling recommendations. These individuals are then genotyped or sequenced using any of a variety of different methods outlined here, including genotyping by NGS, resequencing, sequence capture, or arrays. This results in raw sequence data reflecting genetic polymorphisms. Depending on sequencing method, these data are demultiplexed by individual, stacked into loci, and organised into a panel of sequence or SNP variants (here, Dataset). Then a range of genetic and genomic analyses can be conducted, including (but not limited to) genetic mapping, phylogenetics, detecting loci under selection, identifying genomic regions under selection, or quantifying gene flow and demographics. These are conducted in the conceptual framework of the ecological variability of interest, which was targeted in the biological sampling (for example, the divergence between different environments, trophic morphologies, or life history traits)

Tools for ecological genomics

As a discipline, ‘genomics’ broadly analyses the function and structure of genomes. A major aim is to identify where the genetic variation is located in the genome, such as in what chromosomal location or linkage group, whether it is in a coding or a non-coding region, and what genes might lie nearby. In the context of non-model organisms, genomics can also simply mean examining many markers—on the order of thousands—but without inferring their location. Genomics is frequently, and maybe even inherently, comparative: comparing individuals within species and comparing among species (Hawkins et al., 2010; Sarropoulou & Fernandes, 2011).

While the revolutionising advances possible from genomic technologies have been heralded for ecology and evolution for some time (Feder & Mitchell-Olds, 2003; Cossins & Crawford, 2005; Travers et al., 2007), in reality the potential was still difficult to tap for the average ecology or evolution researcher. Only just recently have NGS technologies opened the field for genomics on non-model organisms. There have been two major advances relevant here. One, inherent to NGS, is that now no prior genetic information is needed in order to sequence or genotype. This differs from most earlier approaches of the ecological genomics toolkit such as microsatellite loci genotyping, candidate gene sequencing, or microarray or quantitative PCR for gene expression analysis; in those cases one needs some prior information on the sequence in order to develop targeted primers to amplify the DNA of interest. Secondly, while in the early days of NGS the costs still placed it out of the reach of many labs, now the costs are truly decreasing dramatically (some ddRADseq costs using different platforms are outlined in Recknagel et al., 2015). For example, in 2009, 320 million reads of paired-end sequence data from illumina GAIIx cost approximately £12,000, while in 2015, an equivalent single run on illumina NextSeq giving 440 million paired-end reads costs approximately £1700 in consumables (excluding library preparation), and advances in illumina HiSeq X-ten predict 90 Gb of sequence data for ~£1000 in 2016 (exemplar costs from Univ. Konstanz GeCKo, Glasgow Polyomics, and illumina). Therefore, combined with the availability of new approaches to reduce genome representation, genomic projects are now feasible even on quite tight budgets (Davey et al., 2011; Sboner et al., 2011; McCormack et al., 2013; Recknagel et al., 2015) and for the first time are less than or on par with the cost of approaches like microsatellite genotyping on ABI. This opens up great possibilities for ecological and evolutionary researchers of salmonids in the wild (Box 1). One of the challenges for maximising the high amount of information available in genomics is linking those data with informative and biologically relevant reference genomes.
Box 1

Suggested tissue sampling procedures for ecological genomics

Following some very simple collection procedures can ensure that samples have the potential to be used for ecological genomics methods for years to come. In the case of salmonid fish research, just 25 mg of muscle tissue preserved in pure ethanol is sufficient for good quality DNA extraction and generating high-quality libraries. For example, current ddRADseq methods suggest 1 ug of DNA at concentration of 24 ng/μl (e.g. Recknagel et al., 2015), though lower DNA quantities are possible under high multiplex conditions. The volume of tissue to volume of ethanol should not generally exceed around 30%, and the ethanol should be changed upon return to the lab before storing the sample in a fridge or freezer. Adipose or fin or muscle tissues are suitable and ideally the tissue should be harvested freshly. Freezing the entire fish at −20°C and later sampling for genetics after thawing tends to result in poor quality DNA; this technique should be avoided. RNALater is an alternative and stable fixative, which has the advantage of preserving RNA activity (e.g. for transcriptomics) but the downside of being quite costly if purchased commercially (homemade inexpensive alternatives are available).

Sufficient sample sizes should be sought; exact numbers will depend on budget, context, and research question, but usually should aim for at least 20 or 30 individuals per population (Fig. 1). Following some simple planning guidelines and sampling as many specimens as possible can secure a great breadth of potential research projects with minimal additional effort in the field. This can hopefully provide incredible return on investment bringing the hard-earned ecological research through to ecological genomic applications

Reference genomes

An annotated reference genome is a digital assembly of the nucleotides that make up an organism’s complete DNA sequence, usually drawn from a single representative exemplar, and organised into a database with information on the relevant structures such as chromosomes and genes therein. Any assembled DNA sequence can in principle act as a ‘reference’ and for this reason, the level of refinement in contiguous nucleotides (or contig; maximal length being a chromosome) and gene annotation across those contigs reflects the quality, usually with each iterative draft representing a refinement (see Ekblom & Wolf, 2014). Reference genomes provide critical resources for orienting, organising, and annotating the sequence reads and genetic variation inferred from population genomics.

Salmonid research is proceeding greatly with reference information, for example with the recent publication of the O. mykiss genome (Berthelot et al., 2014) and advances in the on-going Atlantic salmon genome (Davidson et al., 2010; International Cooperation to Sequence the Atlantic Salmon Genome, 2014). Genome information from both these species is available for free download and use by the community: for salmon from http://www.icisb.org/atlantic-salmon-genome-sequence/and for rainbow trout from https://www.genoscope.cns.fr/trout/. A valuable general genomic and transcriptomic resource for salmonids and comparative genomics is available at SalmonDB http://genomicasalmones.dim.uchile.cl (Di Génova et al., 2011). As all these resources grow taxonomically and with their annotation of existing information, they provide an excellent resource for maximising ecological genomics of salmonids.

To generate new, de novo reference genomes is achievable but non-trivial (Ekblom & Wolf, 2014). Reference genomes involve not only generating sufficient sequence coverage of the genome on average, but also should aim to bridge complex regions, be oriented and annotated with linkage maps, and informed by transcriptomes (Genome 10K Community of Scientists, 2009; Wong et al., 2012; Ekblom & Wolf, 2014). Despite the considerable effort, reference genomes provide critical advances for genome research. I expect we will see increasing individual and collaborative efforts to develop those important resources, which no longer require large consortia to complete.

Whole genome resequencing

Sequencing entire genomes and comparing across individuals is the top bar for genomics. To accomplish that, first, ideally one needs a reference genome sequence against which future genomes sequenced at moderate coverage with short reads can be mapped (so called ‘resequencing’). At present, the feasibility of whole genome resequencing for ecological genomics depends somewhat on genome size and complexity, as well as budget. In ecological genomics of fishes more generally, stickleback fishes (genome size 675 Mb) are now often whole genome resequenced (Jones et al., 2012; Terekhanova et al., 2014) as are some cichlids (genome size ~1 Gb) either with few individuals at high coverage (e.g. Brawand et al., 2014) or with individuals pooled and overall lower coverage focusing on fixed differences (e.g. Elmer et al., 2014). However, the very complex and large genomes of salmonids (~3 Gb) have made whole genome sequence analyses difficult and not yet well established. For this reason, the advances in ecological genomics of salmonids is currently being driven by new methodologies to sequence a reduced representation of the genome using NGS.

Genotyping with NGS

Because genomes are large and complex—especially so in salmonid fishes—reducing the amount of genome that is sequenced to a representative and unbiased part has practical and analytical benefits. This can be done using physical or enzymatic methods that cut the genome into shorter pieces, and then only a portion of those are sequenced (Fig. 2). An extremely efficient and increasingly popular approach is to sequence a reduced representation of the genome to identify and genotype single-nucleotide polymorphisms (SNPs) (Davey et al., 2011). There are a number of genome reduction techniques, including Restriction Site Associated DNA sequencing (RADseq) (Baird et al., 2008), double-digest RADseq (ddRADseq) (Peterson et al., 2012), or Genotyping-by-Sequencing (GBS) (Elshire et al., 2011), as well as other derivations (see Puritz et al., 2014 for a comparative assessment). Genotyping with NGS can be used for identifying genetic polymorphisms and, because the read also contains the sequence around the SNP, the reads can be mapped to reference genomes, if available. Further, the same methodology can be used for population genomics and, when some pedigree information can be calculated or is known, genetic mapping (Davey & Blaxter, 2011). There are a number of excellent reviews and special issues on genotyping with NGS, methodological and analytical considerations, and its application to ecological and evolutionary research questions (e.g. Davey et al., 2011, 2013; Narum et al., 2013; Puritz et al., 2014), and will not be covered in detail here.
Fig. 2

An example of genome reduction process for genotyping with NGS. Here is a typical genotyping protocol for double-digest restriction site-associated DNA sequencing (based on Peterson et al., 2012). 1 From each individual sample, DNA is extracted. The DNA is cut by restriction enzymes (here, Cut Site 1 and Cut Site 2) so that the entire genome is reduced to smaller fragments. Adapters are ligated to the cut DNA, one adapter type to Cut Site 1 and another adapter type to Cut Site 2. One or both adapters carry unique indexes (also called barcodes or MIDs) so that individuals can later be distinguished after sequencing. DNA extraction, fragmention, and ligation are done in parallel across many individuals, which are then pooled into a single library. 2 The pooled sample of DNA is size selected to retain only fragments of a precise size range (here, 130–200 bp in length), for example from an automated gel extraction. The remainder of the DNA is discarded. 3 The library is enriched through PCR (polymerase chain reaction) for those fragments that contain Adapter 1 and Adapter 2. 4 Library is then sequenced using a next-generation platform. Modified from Recknagel et al. (2015)

The different genotyping by NGS protocols all have strengths and limitations (reviewed in Davey et al., 2011; Puritz et al., 2014). For example briefly, GBS is designed to skim the genomes at high numbers of loci and therefore often low coverage and tends to be used when inference of individual level polymorphism is less important (e.g. in genetic mapping of recombinant inbred lines) (Elshire et al., 2011). RADseq uses one restriction enzyme and then fragments the DNA mechanically so it is in random lengths. This random and informative sequence at the other end of the read from the enzyme cut site is an advantage of RADseq. Also because of the variable length, there is the possibility to assemble longer de novo contigs (Puritz et al., 2014). ddRADseq instead uses a combination of restriction enzymes and fragment size selection to be highly customizable in terms of numbers of loci and units of sequencing effort (Peterson et al., 2012) (Fig. 2). For individual and population-level research on non-model organisms with relatively large genomes, ddRADseq is emerging as a popular approach.

Another tool for SNP discovery and genotyping is sequencing only the expressed portion of the genome; that is the messenger RNA. RNAseq of messenger RNA (mRNA) for population genomics has the further benefit of a direct phenotypic link because it represents the protein-coding portion of the genome, meaning it has the potential to be functional and a target of selection (De Wit et al., 2015). For RNAseq, no restriction enzymes are needed, because RNA transcripts are generally short in length. While having an array of benefits, there are distinct challenges to genotyping with RNAseq such as choice of tissue, influence of alternative splice variants, and that samples must be preserved appropriately (e.g. in a −80°C freezer or buffer such as RNALater solution) for RNA to be harvested (De Wit et al., 2015).

In all cases, most library preparations can be accomplished using the standard equipment available in a molecular biology lab (Davey et al., 2011; Peterson et al., 2012). Alternatively, some commercial service providers now offer GBS or RAD library preparation and sequencing and all offer RNAseq. The most common platform for genotyping by NGS sequencing is currently illumina (e.g. MiSeq, HiSeq or NextSeq platforms, all of which can use the same adapter set) because of the low cost, high throughput, and large market share. Methods for genotyping with sequencing on other platforms such as Ion Torrent have also been developed (e.g. Mascher et al., 2013; Recknagel et al., 2015). Salmonids have been genotyped with a number of these different protocols (discussed in detail below).

Reviews and protocols outlining how genotypes are inferred from NGS sequences in detail can be found elsewhere (Davey et al., 2011; Etter et al., 2011; Catchen et al., 2013; Recknagel et al., 2015) and therefore will not be covered in detail here. Briefly, the read is sequenced from the restriction enzyme cut site, either in one direction (single-end sequencing) or from two directions (paired end sequencing). Each individual (or set of pooled individuals) has a unique identifier sequence (barcode or MID) at the start of its sequence, which is how the data are later separated by individuals (demultiplexed) for analysis. Sequencing is usually done from 10- to 100-fold average coverage of the number of loci estimated to be in the library, though this will vary depending on project aims and budget (Sims et al., 2014). Currently, Stacks (Catchen et al., 2013) is a popular software for identifying and analysing SNPs for genotyping with NGS. In that process, the raw sequence data are demultiplexed and filtered to remove low-quality reads. Data for each individual are then grouped into loci, which represent sequencing coverage of homologous locations in the genome, and SNP genotypes are inferred for each individual (Catchen et al., 2013). These data can be used for addressing a range of genomic research questions.

All of these genotyping with NGS protocols generate far more data than are used. Most loci (from 75 to 90%, depending on the level of genetic variability in the experimental samples; see Gonen et al., 2014; Recknagel et al., 2015) are discarded because they are invariant; the chance of finding a SNP in any given read is more or less equal to background mutation rate and diversity in the sample. Data are also discarded because a proportion of loci have incomplete coverage across individuals or populations, probably because of library preparation effects and/or insufficient sequencing coverage. The role of missing data in biasing the outcome of analyses from these datasets is currently not well understood (Arnold et al., 2013; Huang & Knowles, 2014). Further, genomic genotyping with NGS techniques will rarely capture the functional targets; reads often cover <1% of the genome. Instead, genotyping by NGS is a tool to reflect processes such as the pattern and extent of genome divergence and population patterns and, when markers are ordered by mapping to a reference genome or linkage map, the genomic regions under divergence can be identified (Fig. 1).

The evolutionary divergence between species has implications for the number of shared markers that will be found (Recknagel et al., 2015) and for extrapolating population genomics to reference genomes even of closely related species. For example, our preliminary analyses found that only 28.2% of Scottish Sv. alpinus ddRADseq reads map to the Sm. salar genome and only 25.8% map to the O. mykiss genome (240,494 genomic ddRADseq reads, three mismatches to reference allowed) (Jacobs & Elmer, unpubl.), which seems relatively low given the ca. 22–28 MY evolutionary divergence between genera (Crête-Lafrenière et al., 2012). Similarly, other researchers found that transcriptome reads from O. nerka mapped to Sm. Salar and O. mykiss with intermediate success (Everett et al., 2011). These studies emphasise how valuable species-specific reference genomes are for advancing ecological genomics.

Targeting regions: SNP arrays and sequence capture

Information-free methods like genotyping by NGS are increasingly efficient and cost effective, yet there may be many instances or reasons why one might prefer to generate targeted and consistently reproducible resources to infer SNPs. Therefore, resources like SNP arrays have their strengths for simplicity, reproducing the same and known panel of markers in all experiments, and very low per genotype cost after initial set up. For example, if research requires a reduced set of key SNPs of interest to be replicated across a very high number of samples, one might generate a SNP array, primers for targeted genotyping (or sequencing), or sequence capture and enrichment followed by high coverage resequencing (Ekblom & Galindo, 2011). Using genome-wide marker discovery to identify those loci of interest can be an effective way of doing this, either from genotyping, genome resequencing, transcriptome sequencing, or a combination of approaches.

Such resources can be used to address important fundamental, genetic, and applied research questions for salmonids (e.g. Koop et al., 2008) and they also provide resources for ecological genomics. Because of the economic importance, conservation and natural heritage value, and the large genome size of salmonids, investing in SNP arrays of various technologies has been a popular approach. For example, Houston and colleagues developed an informative panel of SNPs for cutthroat trout (O. clarkii) with a particular focus on distinguishing various cutthroat subspecies from each other and to assess if those native populations admix with stocked rainbow trout (O. mykiss) (Houston et al., 2012). The authors used RADseq to scan for SNPs genome-wide and reduced the panel to a smaller set of markers. Through a process of filtering they then established 125 SNPs that could distinguish subspecies and species reliably for genotyping on the Fluidigm array (Houston et al., 2012). Gomez-Uchida and colleagues also had the aim of developing a SNP panel for Chinook salmon (O. tshawytscha) as a resource for population genomics, inferring selection, or defining conservation units (Gomez-Uchida et al., 2014). They chose to focus on the coding and therefore putatively functional portion of the genome by sequencing transcriptomes for SNP discovery. Sauvage and colleagues conducted a similar approach for brook charr (Sv. fontinalis), first screening for SNPs with RNAseq and then developing a robust panel of 280 SNPs for genotyping on the Sequenom MassARRAY platform (Sauvage et al., 2012). They then combined the set of SNPs with microsatellites and used it for QTL analysis of reproductive traits relevant for hatchery aquaculture.

The most ambitious recent resource development is the ~130 K SNP Affymetrix array developed for Atlantic salmon (Sm. salar) (Houston et al., 2014). Polymorphisms were identified by combining RADseq, reduced-representation sequencing, and RNA sequencing for a comprehensive coverage of the coding and non-coding portions of the genome. This had a focus on wild European and aquaculture populations, but the array is primarily a tool for aquaculture research on the genetic architecture of quantitative traits relevant in these economically important species (Houston et al., 2014). With a focus on geographic variation of wild populations rather than aquaculture, Bourret and colleagues developed a panel of 6176 informative and validated SNPs for Atlantic salmon from expressed and genomic sequence (Bourret et al., 2013). They also identified high levels of differentiation between populations differing in life history, being anadromous or freshwater resident (Bourret et al., 2013). The panel effectively distinguished spatially differentiated populations, as well as clinal variation suggestive of genetic incompatibilities between distinct lineages (Bourret et al., 2013).

An important lesson from all of these resource developments is that there is a staggering attrition from ‘first pass’ SNPs identified by sequencing to those that are validated for high quality, function technically in the array, and are orthologous, Mendelian, and reproducible. In the case of the salmon array, more than 400,000 SNPs were discovered by sequencing, of which 132,033 were established on the array (Houston et al., 2014). In the cutthroat and rainbow trout, 43,558 SNPs were found at first pass, which was reduced to 125 SNPs of interest (Houston et al., 2012), and in brook charr 4841 first pass SNPs were identified and filtered down to 270 SNPs of interest (Sauvage et al., 2012). Therefore initial costs can be considerable, both in sequencing and in developing arrays or primer combinations. When many further individuals are planned for genotyping, this cost is offset by a low genotyping cost per sample once the resource is developed.

Genomic organisation

To answer the big questions in ecological genomics about how genetic variants underlie adaptive phenotypes, SNP data are most informative when the genomic location is known. Therefore the combined approach of genetic linkage mapping and population genomics is especially attractive in non-model organisms that lack a sequenced and annotated genome (Hohenlohe et al., 2010; Bradic et al., 2013). Developing a linkage map is a powerful first step in population genomics in the absence of a reference genome, because (a) it allows one to set up protocols and pipelines on a situation of limited and likely known genetic diversity (because it is a single family or few families of known pedigree, such as in laboratory crosses), (b) it develops a key resource for comparison across species, and (c) it is a critical resource for later mapping and localising the SNPs from population genomics when a complete genome is not available.

Genetic maps from NGS have been developed quite extensively for salmonids, for example with RAD sequencing in O. nerka (Everett et al., 2012), O. mykiss (Hecht et al., 2012), and Atlantic salmon (Sm. salar) (Gonen et al., 2014). The Atlantic salmon map gained additional power from integrating with genome sequence from the on-going salmon genome project (Gonen et al., 2014). Data can also be used to draw comparisons across species. For example, Kodama and colleagues developed a linkage map for coho salmon (O. kisutch) and compared it to previously published Chinook salmon, rainbow trout, and Atlantic salmon maps for genetic analysis of chromosomsal evolution across the groups (Kodama et al., 2014).

As these ecological genomics tools become more cost effective (e.g. reduced cost of NGS sequencing) and bioinformatics tools become more user friendly with workflow implementations (e.g. Galaxy analysis server, Blankenerg et al., 2010), the barriers for applying genomics to any organism become fewer. Ecological genomics research on salmonids has led the way in moving some of these resources and tools to addressing exciting ecological and evolutionary questions.

Genomics for evolution and adaptation in salmonids

As these tools develop for salmonids—genetic linkage maps, SNP panels, reference genomes, and population genomics databases—they are contributing importantly to advances in understanding the genetics of adaptive phenotypes. Populations differ so dramatically and there is such high local adaptation in salmonids (Fraser et al., 2011) that the genetic basis of adaptation is a major open question and the target of considerable research. Here, I touch on some of the key areas of research effort using NGS tools: migratory versus resident life history tactics, spawning timing and location, and concerns about the loss of genetic integrity of native populations due to introgression with aquaculture stocks.

Migration

Salmonids have a fascinating migratory behaviour associated with dramatic physiological changes and renowned site fidelity. The switch to anadromy seems to be partly genetic and partly triggered by environmental conditions and smoltification involves a suite of changes including osmo-regulatory changes to survive in salt water, revised foraging behaviour, and developmental rate (Aas-Hansen et al., 2005; Jonsson & Jonsson, 2009; Dodson et al., 2013). This process is fundamentally similar across species in Oncorhynchus, Salmo and Salvelinus (Dodson et al., 2013).

Migration is a particularly appealing phenotype to study with genomics of wild populations because it is difficult to analyse in laboratory conditions and may not express until rather late in development. It is also very important from a conservation perspective because it involves how organisms interact and can manage their changing environments including dams, habitat degradation, and pollution (Aas-Hansen et al., 2005; Jonsson & Jonsson, 2009; Dodson et al., 2013). Therefore, probably the most intensive area of ecological genomics research on salmonids to date has focused on the genetic basis of this trait. In particular, a number of researchers have used paired designs comparing freshwater-anadromous populations to seek the loci that differ between habitats and identify if the genomic divergence is consistent (parallel) across replicates. Such parallelism might be expected if phenotypes were responding to selection in similar ways or if the same genetic loci underlie the migration traits in different lineages.

Because of the established genomic resources for Atlantic salmon, it is an excellent candidate for seeking the genetic basis of this complex trait. In a recent study, Perrier and colleagues sought to identify the genomic patterns associated with migratory phenotypes. They examined 2336 genetically mapped SNPs among three pairs of North American anadromous and freshwater Atlantic salmon (Sm. salar) populations (Perrier et al., 2013). Overall the patterns reflected the microevolutionary processes unfolding in the smaller and isolated freshwater populations: across the genome, freshwater populations had lower genetic diversity and higher interpopulation genetic differentiation compared to the patterns among anadromous populations (Perrier et al., 2013). No evidence of individuals migrating from anadromous to freshwater populations was found, but there was some evidence for a handful of migrants out of freshwater populations (Perrier et al., 2013). Genome scans found incomplete parallelism across population pairs, with little evidence that the same genomic regions were responding to selection in the same way across replicate freshwater-anadromous populations (Perrier et al., 2013).

This echoes the patterns found in research on migratory and resident O. mykiss. With a similar paired design, Hecht and colleagues sought the genetic loci associated with propensity to migrate using a genome-wide association approach using SNPs of known location from species-specific linkage maps (Hecht et al., 2012; Miller et al., 2012). Genome-wide association analysis suggested different genomic regions underlying different phenotypic components of the migratory phenotype (Hecht et al., 2012, 2013). Annotation of genome regions linked to significant SNPs indicated they were likely in regions associated with physiological processes important in migration (Hecht et al., 2013; Hale et al., 2013). The study identified a number of new loci associated with migratory traits and corroborated loci that had been identified in earlier QTL studies. Numerous and non-parallel outlier loci tend to be found among wild populations, with some regions shared across population replicates and others being population specific. In a similar study, Limborg and colleagues found that the pattern of outliers differed between the two pairs of anadromous-resident populations and considerable interpopulation divergence reflecting geographic isolation and local adaptation (Limborg et al., 2012).

Despite complex genome-wide signatures of divergence between resident and anadromous populations, a genomic region on chromosome Omy5 has been repeatedly associated with the migratory traits (references in Miller et al., 2012, Pearse et al., 2014). Focusing on SNPs in that region relative to background revealed significant genetic differentiation, with a haplotype associated with anadromous phenotypes absent or rare in isolated resident populations, suggesting strong selection and functional genetic role in a region of Omy5 (Pearse et al., 2014). The associated loci are in strong linkage disequilibrium, and the authors suggest that a chromosomal inversion or other genomic rearrangement may be limiting recombination (Pearse et al., 2014). Such genomic blocks are a way divergence and adaption can occur very quickly and overcome otherwise homogenising gene flow (Jones et al., 2012; Yeaman, 2013). The recently completed genome project of O. mykiss (Berthelot et al., 2014) promises to help resolve the genetic basis of this complex and fascinating phenotype and help resolve how often the functional bases are in fact parallel.

Spawning time and location

Other phenotypes of particular interest in salmonids are spawning time and location, which are potentially both a mechanism facilitating, and an outcome of, local adaptation. Shifts in spawning time and location that are maintained over time have been implicated as an important driver of sympatric diversification in salmonids (Fraser et al., 2011; Dodson et al., 2013).

For example, pink salmon (O. gorbuscha) in the Pacific Northwest have two genetically distinct lineages that spawn in the same locations but in alternate years. Using three population pairs of alternate-year lineages, Seeb and colleagues assessed the extent of parallelism in genomic patterns between the two lineages using 8036 SNPs from RADseq data (Seeb et al., 2014). Background patterns of differentiation between populations within lineages differed somewhat, but in both lineages there was a consistent effect of site and latitude (Seeb et al., 2014). Interestingly, 15 SNPs were divergent in a parallel manner between different spawning year lineages, suggesting they represent genomic regions responding to selection in a parallel way (Seeb et al., 2014). While those SNPs were unmapped, presumably future work can aim to identify the genomic architecture associated with the parallel signals.

Developmental rate and timing

Salmonids are under strong selection for developmental rate because of their complex life history. This includes their emergence as fry from the gravel, which should match a time suitable for foraging, and the importance of suitable timing and strategies or decisions for migration (Miller et al., 2012; Dodson et al., 2013).

Studying rainbow trout from different geographic regions and with different developmental rates, Miller and colleagues identified a conserved haplotype that was found to be associated with rapid developmental rate of the young in lines of O. mykiss when compared to a slower developing line (Miller et al., 2012). Given the similar pattern and the genetic divergence between lineages, the authors suggested a repeated utilisation of a conserved haplotype from standing genetic variation to make locally adapted and differentiated phenotypes (Miller et al., 2012).

Salmon differ in the timing and duration of their sea migration and this is an important trait for conservation of wild stocks. Johnston and colleagues used an array of 6000 SNPs to study genomic patterns associated with differences in sea age (Johnston et al., 2014). The markers were localised on reference genomes and therefore an extremely powerful approach that trait mapped individuals in natural populations. The authors detected genomic regions significantly associated with differences in sea age and these were distributed across several regions of the genome (Johnston et al., 2014). Thus, across species, the consistency of the genetic mechanisms for migration remains to be identified.

Introgression

Salmonid fishes rank among species being most severely affected by introgressive hybridization as a result of a long tradition of stocking natural waters with hatchery-reared conspecifics. Such admixture can have a range of deleterious effects, including a break down of locally adapted genetic variation (Hindar et al., 1991; Fraser et al., 2011; Fraser, 2013). This is an area in which genotyping by NGS can be especially useful because the markers are highly sensitive and can distinguish closely related groups, can be comparable and informative across species, and also a new panel of markers is not required for each species (Allendorf et al., 2010).

Rainbow trout (O. mykiss) is one of the world’s most widely introduced species and it can interbreed with native species such as cutthroat trout (O. clarkii), resulting in major concerns for genetic integrity of endemic populations (references in (Houston et al., 2012). RADseq has proved to be a powerful approach for resolving the extent of admixture, with recently 16,788 putatively diagnostic SNPs identified that can distinguish cutthroat trout from rainbow trout (Hand et al., 2015), which advanced previous efforts along these lines (e.g. Amish et al., 2012; Hohenlohe et al., 2013). Of those, 10,267 SNPs could be mapped to anchored chromosomes in the recently published rainbow trout genome and therefore used to infer genomic location of the variant (Hand et al., 2015). This demonstrates the very high resolution capable from genotyping by NGS and the range of opportunities and applications available for conservation genomics.

Combining quantitative trait information with high-resolution SNPs is a powerful way to test not only the extent of introgression from hatchery to native genomes, but potentially its effect on phenotypic traits in natural context. Based on the panel of SNPs established for brook charr (Sv. fontinalis) (Sauvage et al., 2012) and informed by QTL analysis of reproductive traits including some impacts of hybridisation (Bougas et al., 2013), Lamaze et al. (2012) found signals of admixture between stocked and native populations. This included evidence that stocking results in genetic homogenisation among geographically distinct populations (Lamaze et al., 2012). There was also an indication that the rate and structure of introgression was not neutral, with some regions inhibited and others exceeding background; these included genes and QTLs associated with reproduction, growth, and behaviour in salmonids (Lamaze et al., 2012).

Ecological genomics and the ‘charr problem’

The genomic approaches described above are also highly relevant to questions specific to Salvelinus species. Where charr are extraordinary compared to the salmonid species discussed above is in their rapid, widespread, and frequent diversification into different ecomorphs within postglacial lakes (known as the ‘charr problem’) (Klemetsen, 2010). These sympatric divergences provide a potentially rich research avenue for the application of ecological genomics approaches.

Of all salmonids, the lake-dwelling charr Sv.alpinus and Sv. namaycush are particularly renowned for their exceptional degree of phenotypic variability and rapid diversification (Klemetsen, 2013; Muir et al., 2015). This manifests as replicate divergences into subpopulations between the benthic (both littoral and profundal) to pelagic (open water) habitats, resulting in a bimodal or multimodal distribution of phenotypes within lakes. Observed trophic partitioning includes littoral and profundal benthivore morphs (large, small, and/or dwarf), planktivorous morphs, and even piscivorous morphs in sympatry (reviewed in Klemetsen, 2010; Muir et al., 2015). These morphotypes differ in a range of ecologically relevant traits such as body size, head shape, parasite load immunogenetics, growth rate, spawning timing and/or behaviour, and lipid content of the muscle (Skúlason et al. 1996; Jonsson & Jonsson, 2001; Adams & Huntingford, 2004; Goetz et al., 2010, 2014; Gudbrandsson et al., 2015). Many of these traits are in fact archetypal of intralacustrine differentiation in other freshwater fishes, with similar components found from stickleback fishes (Schluter, 1993; Willacker et al., 2010) and coregonids [lake (Bernatchez et al., 2010) or European whitefish (Østbye et al., 2006)] in the northern hemisphere, to cichlid fishes in the Neotropics (Elmer et al., 2014) and Africa (Hulsey et al., 2013). However, among the northern fishes, charr are the species in which multiple sympatric morphs are found most abundantly (Klemetsen, 2010).

Thus, the charr provide a wealth of potential models to address multiple and important ecological and evolutionary questions through their adaptive divergence across these replicate complexes. Such questions, pervasive in the literature, have been: Are the sympatric ecomorphs reproductively isolated? Was the divergence originally sympatric or allopatric? Determining sympatric speciation is important because the process by which diversification is expected to occur is different compared to, for example, that arising from multiple invasions. Specifically, it is predicted that sympatric divergences will involve a shift from generalist to multiple specialists, as a response to disruptive selection. Emerging populations must overcome the homogenising effect of gene flow and, as discussed above, this may be facilitated by genomic co-localisation of the loci for the relevant ecological trait(s) and assortative mating (Nosil, 2012).

It should be possible to track reproductive isolation between sympatric phenotypically distinct groups by determining levels of genetic differentiation reflected in neutral markers. With this end, many studies have identified genetic differentiation between sympatric ecomorphs of charr with microsatellite and mtDNA markers, in Arctic charr (e.g. Wilson et al., 2004; Adams et al., 2007a; Gomez-Uchida et al., 2008; Corrigan et al., 2011; Kapralova et al., 2011; Garduño-Paz et al., 2012; May-McNally et al., 2014; Gordeeva et al., 2015) and in lake charr (e.g. Guinand et al., 2012; Harris et al., 2015). Morphs of Arctic charr also differ significantly in functional loci such as immunological genes (Kapralova et al., 2013; Conejeros et al., 2014). Therefore, it is clear that ecomorphs are, at least at times, reproductively isolated and young species in the early processes of divergence.

Are these divergences sympatric? The current state of knowledge suggests there might not be a consistent pattern. In Arctic charr, the weight of evidence at some locations points to dual invasions, with historical postglacial invasions of multiple lineages that had diverged in allopatry in un-glaciated areas (e.g. Lochs Tay, Maree and Stack in Scotland; Wilson et al., 2004; Adams et al., 2008; Garduño-Paz et al., 2012). In other locations, the most parsimonious explanation is that the divergence is in fact sympatric, for example in a number of the Transbaikalian lakes of Russia (Alekseyev et al., 2014; Gordeeva et al., 2015), Loch Awe in Scotland (Garduño-Paz et al., 2012), Lower Taziminian Lake in Alaska (May-McNally et al., 2014), and some ecomorphs in Thingvallavatn in Iceland (Kapralova et al., 2011). Even within geographic regions, many lakes show a signal of sympatric divergence yet some others nearby do not. Thorough geographic sampling of putative source populations is critical for reconstructing the colonisation history of any given focal population.

The genetics of functionally important traits in these repeatedly arising ecologically relevant phenotypes have not been identified within or across charr species; this is an exciting area for genomic advances. Genomic approaches offer improvement over previous neutral marker-based techniques, such as microsatellite loci, in that they are higher resolution and can in principle identify the location, extent, and number of genomic regions associated with diverging phenotypes. From an ecological genomics perspective, an important element of these sympatric divergences is that they occur in parallel and are found globally. Looking for common patterns across multiple systems will allow us to distinguish true patterns from the background noise that is inherent in samples drawn from the natural environment. Therefore, this study system has the potential to tackle a fundamental and unresolved question about the extent to which there are many different genomic routes to similar phenotypic ends (Elmer & Meyer, 2011). Thus, an important question that charr as a model species will allow us to address in the future is: Do ecomorphs diverge in the same way across locations?

This aim can be achieved with a combined route of population genomic analyses across many different populations, comparative genomics between those populations, and information on the genomic organisation underlying those divergences (e.g. Fig. 1). Some of the methods and study approaches described above for other salmonid species illustrate just a few of the many ways it is now possible to tackle ‘the charr problem’. Given the different approaches, study populations, and research priorities of charr researchers around the world, the coming decade promises to be an era of major new discovery in the ecological genomics of charr. Given the high rate of sympatric divergences (or divergences with gene flow) and parallel phenotypic evolution, we can hope that the findings from charr will help resolve some of biology’s central questions about the speed and predictability of evolution.

Ecological genomics for predictions

The genomic variation within species is the substrate upon which new species arise, with which existing populations respond to environmental change, and by which individuals counter myriad other challenges (Stillman & Armstrong, 2015). Therefore, ecological genomics can inform about evolutionary and ecological dynamics and processes, uncovering important mechanisms for how biodiversity—in its array of forms—emerges and changes. These analyses are important tools to compare contemporary versus retrospective demographic processes, population variability, and genetic regions associated with local adaptation or speciation (Stillman & Armstrong, 2015). Importantly, by identifying the genetic basis underlying phenotypes in natural context, we can study and therefore ultimately aim to predict evolutionary paths under different environmental scenarios (Violle et al., 2014).

As genomic data can be collected more readily, the greatest gains in mining the genomics of adaptive traits will come from environment and phenotype matching, as well as increasing levels of biological replication (populations and individuals) (Elmer & Meyer, 2011; Hendry, 2013; Roesti et al., 2014). Future research will also benefit from direct and indirect functional validation, for example, the comparisons possible with the growing available genomic resources for salmonids (Pavey et al., 2012; Primmer et al., 2013).

The extent to which population-specific patterns reflect local adaptation versus stochastic patterns or confounds of population genetic structuring is unclear and an on-going problem for ecological genomics to untangle (Roesti et al., 2014). Such confounds can generate false positives for loci associated with adaptive phenotypes, for example, if genetic incompatibilities between evolutionary lineages mimic signals of response to selection (Bourret et al., 2013). These false positives lead us to conclude incorrectly about the genetic bases of the populations or phenotypes being studied. This is a central argument for leveraging the framework of parallel evolution (Bernatchez et al., 2010; Elmer & Meyer, 2011). Replicate phenotypes in the parallel evolution framework are so called ‘natural evolutionary experiments’ (Doughty, 1996) used in comparative approaches as a way to tackle the challenges of evolutionary time scales and environmental stochasticity. This parallelism is why postglacial salmonids, especially charr with their extensive diversification in sympatry and in allopatry, are ideal model organisms for ecological genomics and inferring the genetic origins of extant diversity.

Salmonid fishes such as charr, trout, and salmon are of extremely high natural heritage value and play a major role in the food security and economic health of many northern countries (Fraser et al., 2011). These are regions of the globe at risk due to climate changes such as global warming, with effects already being felt by salmonids; e.g. trophic mismatches in great Arctic charr (Sv. umbla) (Jonsson & Setzer, 2015), declines of Arctic charr due to rising lake temperatures (Winfield et al., 2010), and in some regions extensive habitat modification by humans (e.g. spread of invasive species, pollution, and modification of waterways by dams) that is impacting salmonid population health (Adams et al., 2007b; Brodersen & Seehausen, 2014). Yet making accurate predictions for the evolutionary capacity of salmonids to respond to these challenges is difficult because of the dearth of information on the quantitative genetic potential of wild populations (Carlson & Seamons, 2008; Brodersen & Seehausen, 2014). It is therefore timely that we can draw upon the exciting new suite of tools available for ecological genomics of wild salmonid populations. This new era will allow us to recreate population histories with high-resolution neutral demographics, to infer how genomes respond to selection and thereby hone in on functional bases of salmonid adaptive phenotypes, and even to cast our eyes forward to try and make predictions about future adaptation of these diverse populations.

Acknowledgements

I thank the organisers for the invitation to the 8th International Charr symposium. I thank my research lab colleagues A. Jacobs, M. Carruthers, H. Recknagel, J. McCaw, J. Burgon, M. Chen, and D. Stern for comments on a draft of the manuscript, A. Jacobs and H. Recknagel for major contributions to figures, and C. Adams for relevant collaborations and comments on the manuscript. My apologies to any researchers whose relevant work I may have overlooked. I wish to dedicate this paper to WB Scott, an eminent and inspiring ichthyologist from Canada’s Royal Ontario Museum, who passed away last year after a long career of important contributions to northern freshwater fish biology.

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life SciencesUniversity of GlasgowGlasgowScotland, UK

Personalised recommendations