The importance of salmonid fishes: from evolution to sustainable food production

Salmonids have combined scientific, societal and economic importance that is unique among fish (reviewed in [1]). They are naturally distributed in fresh and marine habitats throughout the Northern hemisphere and have been introduced to South America, Australia, Africa and the Middle East. They perform key ecological functions, e.g. [2], but many populations are declining, and extensive effort is being directed towards their conservation and management, especially with respect to anthropogenic-driven change, e.g. [3]. Salmonids include at least 70 species (but are sometimes classified as >200), possessing a rich diversity of adaptations and life-history strategies [4]. The great phenotypic diversity amongst salmonids provides an excellent study system to understand adaptive divergence and ecological speciation [4, 5] and was potentially facilitated by a whole genome duplication (WGD) in their common ancestor ~95 Mya [6, 7]. Salmonid aquaculture and capture fisheries (mainly of Atlantic salmon Salmo salar L. and Oncorhynchus spp.) play an important role in the economic and/or food security of several nations, accounting for 7.2/16.6% of all traded fish in terms of share by weight/value [8].

Rationale for the FAASG initiative

The FAASG initiative follows the recent publication of the genomes of Atlantic salmon [9] and rainbow trout (Oncorhynchus mykiss) [10], which have proved invaluable to salmonid researchers (section Genome-led science in salmonids: progress, challenges and unresolved questions) and establish a solid foundation for generating reference genome sequences for other salmonid species (Fig. 1). The next step for salmonid research is to annotate genome function, considering species and populations of major scientific interest (sections The FAASG framework, Data and assays). This will lay foundations to understand how genotypes are translated to phenotypes via different layers of regulation of gene and protein expression. Covering a broad diversity of research in salmonid biology will aid this action and is best achieved by involving the widest possible research community (section Operational structure, funding and research community engagement). FAASG will follow principles established by the ‘Functional Annotation of Animal Genomes’ (FAANG) consortium (section Rationale for linking with FAANG) [11], a similar international consortium initiative aimed at producing comprehensive maps of functional elements in terrestrial livestock genomes. This will include use of standardized approaches for functional annotation, including bioinformatics protocols and pipelines exploiting knowledge from other species and through an array of experimental assays (Table 1, section Data and assays). However, the FAASG framework (section The FAASG framework, Fig. 1) will also exploit unique features of salmonid biology, including recent WGD and extensive phenotypic variation at both macro- and micro-evolutionary timescales, to generate broad mechanistic insights into genome evolution and adaptation.

Fig. 1
figure 1

The comparative-evolutionary framework of FAASG. Shown are the initial target species for functional annotation (see Table 1) and their evolutionary relationships (time-calibrated tree after [7]). The selected species come from all three salmonid subfamilies. The position of the salmonid-specific WGD is highlighted (after [7, 9, 10]), along with Latin names of genera. Additional salmonid species that are future potential targets for functional annotation are not shown. Two lineages where anadromous life-history is thought to have evolved independently are highlighted ‘A’ (after [47]). The status of genomics resources are shown to the right of the tree: squares and circles indicate genome and transcriptome assemblies, respectively (dark grey = resource either published or close to being published; light grey = resource under active development; ‘Ch’ = chromosome-anchored genome assembly)

Table 1 Levels of genome-wide functional annotation within the FAASG framework

Genome-led science in salmonids: progress, challenges and unresolved questions

Notable progress in understanding of salmonid biology has stemmed from sequencing two salmonid genomes, as well as that of northern pike Esox lucius [12], a sister lineage that did not undergo the salmonid-specific WGD (Fig. 1). Genome-wide analyses have offered key insights into the remodelling and divergence of duplicated genome content and functions during the post-WGD rediploidization process [9, 10]. Population genomics has been revolutionized by genotyping-by-sequencing, whole genome re-sequencing and high-density SNP arrays [13,14,15], used for example to discover SNPs near the vgll3 gene that explain 40% of the variation in sea-age at maturity [16, 17], genomic variation explaining the timing of migration [18] and adaptive population differentiation in immune function [19]. Population genomics is now routinely applied in salmonids without a genome sequence, by exploiting conserved synteny with rainbow trout or Atlantic salmon, e.g. [20,21,22,23]. Genome-wide approaches have also been applied to improve the accuracy of selection for key production traits (e.g. disease resistance) in breeding programs, either through genomic selection [24,25,26] or by characterization of major effect loci, e.g. [27, 28]. Further, the salmonid and pike genomes have been used to progress understanding of salmonid phylogeny and species diversification [7] and facilitate characterization of the molecular basis and post-WGD evolution of several physiological systems, including smoltification [29], growth [30], immunity [19, 31, 32] and olfaction [33]. Finally, the recent demonstration of successful genome editing in salmonids for gene knockout [34,35,36,37] opens the door for validation of candidate functional genomic elements and causative polymorphisms. Genome editing also has potential to address certain challenges in aquaculture, by creating new alleles and introducing them to farmed populations, and by expediting the selection of existing beneficial alleles [38].

Nonetheless, salmonid research and its applications have only just begun to exploit the possibilities of genome-led science. Undoubtedly, a number of unresolved questions and important challenges can be addressed through the FAASG initiative (Table 2).

Table 2 The role of functional genome annotation in addressing key challenges for salmonid research and its application. Below we list selected key questions, highlight their importance, and then briefly describe (in italics) how the FAASG initiative will help address them

Traits of crosscutting relevance: from aquaculture to evolution (and beyond)

Several traits of importance to aquaculture show extensive natural variation among salmonid species and populations, including disease resistance, growth rate, the control of sexual determination and maturation, and the physiological transition from fresh to saltwater. These traits have crosscutting relevance to multiple scientific fields, both fundamental and applied, and the dissection of their functional genomic architecture under the FAASG initiative will help address challenges faced by the aquaculture sector, along with long-standing research questions. Accordingly, the outcomes of FAASG will facilitate selection of aquaculture strains with improved disease resistance and higher product quality that reach market earlier [39,40,41], while explaining the evolutionary role of trait variation in wild populations [16, 42, 43] and informing management actions influencing population resilience, conservation, and re-introduction [23, 44,45,46]. Comparing the outcomes of artificial vs. natural selection on functional pathways under different conditions will also help dissect the genetic architecture of traits. For example, different populations will often share genetic variation influencing a trait, but aquaculture and wild conditions impose divergent selective pressures, leading to unique, yet complementary opportunities to understand natural selection and domestication.

Rationale for linking with FAANG

The FAANG consortium aims to produce comprehensive maps of the functional elements in the genomes of domesticated animal species [11], building on the ENCODE project [47]. Underpinning principles of both consortia include use of robust, standardized experimental protocols based on defined tissues or cell types. These principles apply to both ‘wet lab’ experiments and bioinformatic analyses of data, which provides a comprehensive and reliable resource available for use by a wide research community. The FAASG initiative will link to FAANG, adhere to these principles, and utilise and build on the FAANG protocols and pipelines to avoid redundancy. FAANG is focussed on livestock species with high-quality reference genomes (chicken, pig, cattle and sheep), but with scope for inclusion of other species. The initial focus of FAASG will be the key farmed salmonids (Atlantic salmon and rainbow trout), but will expand to a broader range of lineages of interest to conservation, management and evolution (Fig. 1). In doing so, the initiative will harness wider diversity within a comparative context (section The FAASG framework) to understand the evolution of functional genome elements following species radiation and WGD. FAASG will provide a FAANG-type model for other species and lineages with recently-developed genome assemblies, the number of which is rapidly increasing. This includes other species of importance for global aquaculture and food security, for example tilapia, carp, catfish and shellfish species. There will also be great scope for cross-talk between FAASG and research communities for model fish species where functional annotation is advanced, including zebrafish Danio rerio (https://zfin.org/). All data generated via FAASG-linked projects will be made publicly available in a timely manner, in keeping with the principles of FAANG. More specifically, the consortium is committed to the release of all data produced in an open access manner, rapidly and before publication, in adherence with the standards defined in the FAASG Data Sharing Statement (https://www.faasg.org/data-sharing-principle/), which includes both the Toronto Statement about pre-publication data sharing, and the Fort Lauderdale principles about the release of data and materials prior to publication.

The FAASG framework

The initial approach of FAASG will exploit a rich phylogenetic framework, documenting functionally important sequence variation and data derived from a core set of experimental assays (section Data and assays) across nine salmonid species and the northern pike (Fig. 1), under experimental conditions representative of the traits listed in section Traits of crosscutting relevance: from aquaculture to evolution (and beyond). Salmonid species were selected on the basis that genome sequencing projects are underway within the research community and represent six out of nine true genera from all three subfamilies, namely Salmoninae (Salmo, Oncorhynchus, Salvelinus and Hucho), Thymallinae (Thymallus) and Coregoninae (Coregonus) (Fig. 1). This phylogenetic context traverses the diversification of salmonid lineages and evolutionary origins of anadromy, a life-history strategy that is thought to have evolved at least twice independently [48] (Fig. 1) and potentially facilitated species diversification [7]. While the initially planned FAASG framework will hence enable high-resolution evolutionary reconstructions, additional taxa may be added as the salmonid research community progresses, potentially from the remaining genera (i.e. Prosopium within Coregoninae, Parahucho and Brachymystax within Salmoninae). FAASG will also address micro-evolutionary variation by contrasting wild populations that evolved divergent phenotypes over thousands of years and aquaculture vs. wild strains separated by a small number of generations (Fig. 1). The combination of experimental assays and evolutionary analyses done across the salmonid phylogeny (section Data and assays) will be applied to assess ‘genome function’ , thereby addressing a potential shortcoming of the original interpretations of the ENCODE data [49].

Data and assays

The assays being considered for FAASG are described in Table 1 (also, see Additional file 1: Table S1). Annotating distinct classes of sequence variation will identify the genome-wide evolution of orthologous protein-coding genes, along with the large number of retained functional gene duplicates (>50% of those created) from WGD [9, 10]. Comparison of chromosome-anchored genome assemblies will provide insights into chromosomal re-arrangements accompanying rediploidization (e.g. [9]) and its potential impact on lineage-specific evolution. Population-level sequence variation will inform the role of functional elements in recent phenotypic divergence and adaptation (Table 1). The inclusion of northern pike (Fig. 1) will enable the ancestral (non-duplicated) state of sequence variation to be inferred, including the direction of divergence between duplicated genes. Comparisons to more distantly related fish with well-annotated genomes, including zebrafish [50], three-spined stickleback Gasterosteus aculeatus [51], spotted gar Lepisosteus oculatus [52], European seabass Dicentrarchus labrax [53], and Asian seabass Lates calcarifer [54], will allow salmonid-specific changes to be contextualized in the broader framework of teleost evolution, especially with respect to an earlier WGD event that occurred in the teleost ancestor ~320–350 Ma (e.g. [55]).

Transcriptome and proteome phenotypes will be characterized for a panel of tissues and developmental stages, sampled from both sexes under common-garden conditions using standardized sampling and analytical protocols (e.g. RNA extraction, quality control (i.e. integrity and purity), library preparation, choice of sequencing platform, and bioinformatic analyses) that distinguish divergence in expression of duplicated loci [9, 10]. Discerning the regulation and evolution of transcript complexity (e.g. non-coding, miRNome and splice variants) will necessitate stranded approaches [56] and may be facilitated by capture of full-length transcripts through single molecule real-time sequencing [57]. Standardized proteome expression profiling will also be performed after experimental separation of different cellular fractions.

FAASG will implement genome-wide experimental assays being used or considered under FAANG [11] (Table 1, Additional file 1: Table S1), potentially including: 1) methylation at nucleotide-level resolution (several approaches available, e.g. [58, 59]), 2) chromosome accessibility and architecture (via ATAC-Seq [60], DNase I footprinting [61], or ChIP-seq approaches), 3) histone modifications (using ChIP-seq approaches [62, 63]), 4) genome conformation (via Hi-C [64]) and 5) transcription factor binding occupancy (via ChIP-seq approaches [65]). It is noted that the lack of salmonid-specific reagents and antibodies present an initial barrier to implementation of these protocols. Indeed some have yet to be employed in salmonids and thus significant effort in methodological development will be required (Additional file 1: Table S1). However, several studies have laid the groundwork for such efforts, and no technical limitations are expected given that these approaches rely on generic techniques and conserved features of molecular biology. Initial experiments in Atlantic salmon and rainbow trout will be conducted in the context of regulation across tissues and developmental stages. Assays incorporating different lineages, populations, and physiological manipulations will follow within the wider proposed comparative-phylogenetic framework. Targeted genome editing can subsequently be used to infer causality of sequence variants and functional genomic elements.

When planning experiments, the FAASG consortium will implement a number of measures to reduce the need for experimentation with animals. These include giving due consideration to alternatives to in vivo experimentation such as cell culture, use of power analyses to determine appropriate sample sizes, exploiting already published RNAseq and microarray datasets relevant to traits of commercial or evolutionary interest, and the running of various FAASG assays across the same individuals within a study, as much as practicable. The latter will also increase power for linking variation across different levels of genome functional annotation.

Importance of standardized phenotypic data

Informative genome functional annotation will necessitate standardized measurement and recording of both ecologically and production-relevant traits (section Traits of crosscutting relevance: from aquaculture to evolution (and beyond)) and for the effects of plasticity [66] to be controlled. Comparisons of the genetic architectures for complex phenotypes are confounded not only by the environment in which traits are measured, but also by how those traits are quantified. We view common-garden experiments, performed under agreed standardized conditions and treatments, as central to the collection of high-quality phenotype data. Salmonids are well-suited for common-garden experiments as they possess external fertilization, high fecundity, and have high survival rates in captivity. In addition, facilities are widely available to raise large numbers of fish under a range of controlled experimental contexts. Such features also facilitate robust and powerful studies to dissect the quantitative genetic basis of complex traits, albeit seasonal spawning may present logistical challenges for experimental planning. The standardized recording of both ecologically and production-relevant phenotypes and cataloguing of functional and phenotypic responses, e.g. within the Gene Ontology framework are also high priorities. Standardised phenotypic assays will also help interpret the molecular basis of phenotypic variation observed in the numerous wild populations gained by long-term data series, e.g. [67].

Operational structure, funding and research community engagement

The initial governance of FAASG is via a Secretariat that supports a steering group incorporating chairs of four working groups and facilitates interactions with key industry and funder representatives. The working groups are generally similar in nature to those in FAANG, and consist of (i) animals, samples and assays, (ii) metadata and data sharing, (iii) bioinformatics and data analysis, and (iv) phenotyping. Details of the FAASG governance structure and working groups can be found at https://www.faasg.org/faasg-working-groups/. As the FAASG initiative requires major engagement and buy-in from researchers, industry and national funding bodies to be able to deliver the ambitious, high-level goals outlined above, members will seek opportunities to link existing or future projects to FAASG, in addition to capitalising on funding calls specifically aimed at reference genome annotation. The initiative will promote inclusiveness among all stakeholders and draw in expertise in aquaculture, bioinformatics/biostatistics, genetics, molecular biology, functional genomics, physiology, ecology and conservation, ensuring quality at all levels. For example, the second FAASG workshop in January 2017 in San Diego had 55 participants from 10 countries, including representatives of several funding bodies. The FAASG website (​https://www.faasg.org/) will report progress, including experimental and computational protocols, publications and datasets, along with contact information for interested researchers or funders who are invited to register on the same site. In addition, the initiative is being advertised at several scientific conferences to promote wider awareness.