Background

The field of genomics is expanding rapidly with full genome and transcriptome sequencing of many model and non-model species. Annotating these genomes continues to pose a challenge [1]. Due to sequence conservation of functional genes and the rapidly growing molecular knowledge of model organisms, basic local alignment search tools (e.g. BLAST) facilitate the initial annotation of non-model genomes [2]. However, the ecological context of genes largely remains a mystery; nearly all gene annotation is based on studies of few model organisms in laboratory environments [36]. Thus, genes that function primarily in natural settings remain unannotated, and other genes with known function in laboratory organisms have no ecological context.

Studies of fishes are leading the way in providing an ecological context to genomes [e.g., lake whitefish [Coregonus clupeaformis[7, 8]], Atlantic salmon [Salmo salar[911]], killifish [Fundulus heteroclitus[1214]], threespine stickleback [Gasterosteus aculeatus[15, 16]] and sockeye salmon [Oncorhynchus nerka[1719]]]. These studies have employed three basic methods to relate gene transcription to ecological systems [20]. First is in situ gene expression analysis [e.g. [7, 19]]. Sampling occurs in the ecological context of interest in nature; fish capture and RNA preservation occur in the field. This method measures both genetic and environmental effects on the transcriptome and it is often not possible to assign gene expression differences to either source. A study design with replication reporting parallel expression differences between two systems reduces population and perhaps environment specific gene expression [e.g. [7]]. In general, this method is applicable to many species including large or long-lived species where laboratory rearing or genetic crosses are not practical. Second, one can remove the natural environmental effect and only test for genetic effects on gene expression in common garden experiments. This strategy compares transcriptomes of genetically distinct ecotypes in controlled conditions [e.g. [21]] and is generally applied to species that can be reared artificially. Reaction norms may be tested by experimentally manipulating conditions. Third, one can perform expression quantitative trait loci (eQTL) analyses by crossing genetically distinct ecotypes in a laboratory setting and mapping gene expression phenotypes to linkage groups [e.g. [22, 23]]. This method requires artificial rearing and is only practical for species with short generation times; however, this is also the only method of these three able to determine the overall genomic architecture of gene expression [20]. Unfortunately, the latter two methods are removed from natural environment variation, and therefore may miss heritable expression that requires certain environmental conditions to manifest.

Juvenile sockeye salmon exhibit a life-history dichotomy in their freshwater rearing environments; lake-type populations rear in lakes for one to two years before travelling to the ocean to feed whereas riverine populations rear in river habitats for up to two years [in the riverine subset "sea-type", individuals go to sea before the first winter, whereas "river-type" spend at least one winter in the riverine habitat; [24, 25]]. Foraging, water current, and predation differ between habitats [26, 27]. Body shape differs between these life history types in association with the environment. In southwest Alaska, riverine sockeye exhibit a deep robust body whereas lake-type sockeye are more fusiforme [27]. This may be the result of both predation regime and a foraging strategy favoring burst swimming in riverine and continuous swimming in lake-type habitats. Similar morphological and behavioral differences are apparent within and among different species of Pacific salmon [2831].

A set of recent studies characterized the transcriptome in ecotypes of another salmonid, the lake whitefish, employing both the in situ and common garden approaches in dwarf and normal ecotypes [7, 21]. The primary ecological trade-off between an increase in growth and fecundity in the normal ecotype is the increased energetics in the dwarf ecotype [7, 32, 33]. The dwarf ecotype exhibits continuous swimming for feeding and is subjected to high predation compared to the normal ecotype [7, 33]. Therefore, both continuous swimming during plankton foraging and burst swimming during predator avoidance are likely favored in the dwarf ecotype, resulting in an energy expenditure for metabolism at the expense of growth [32, 34]. This trade-off results in great differences in growth rate, age at maturity, body shape, and maximum lifespan.

Phenotypically, lake whitefish ecotypes have drastically different sizes at the same age [34]. Although the size distributions of the sockeye populations in this study overlap at the juvenile life stage, the riverine sockeye are longer and have a more robust body shape [27]. With less extreme morphological differences in ecotypes of sockeye salmon juveniles, we expect the molecular trade-offs to be different from the lake whitefish studies. We expect genes differentially expressed to reflect the differing emphasis on continuous swimming for lake-type and burst swimming for riverine.

In this study, we compare the body morphology and in situ transcriptome of two sockeye salmon populations in the same drainage that exhibit these divergent life histories. Differences in foraging strategy and predation may have led to genetic differences between these populations [35]. We expect the transcriptome to reflect the functional molecular trade-offs driven by the ecological differences in these life histories. A greater understanding of the molecular mechanisms that relate to functional ecology will enhance our understanding of the phenotypic diversity of this species, as well as place specific gene annotations into an ecological context.

Results

Morphology

Albert Johnson Creek (AJC) juvenile sockeye were differently shaped compared to the Surprise Lake (SL) population, indicated in the significant population term of our model (df = 19/348; F = 6.10; P < 0.001). Generally, SL juveniles were streamlined compared with the robust shape of AJC juveniles (Figure 1). The interaction term centroid size × population was not significant and therefore not included in the final model.

Figure 1
figure 1

Landmarks used and deformation grids of Albert Johnson Creek (AJC) and Surprise Lake (SL) fish. The middle panel depicts the locations of the twelve landmarks used in the geometric morphometric comparison of body shape. the top and bottom panels depect the details of the shape differences of these populations from ecolgically different habitat between AJC (top) and SL (bottom).

Microarray expression profiles

The microarray analysis indicates 141 transcripts with significant differential expression (average fold change (FC) ≥ 1.5) between individuals from the riverine and lake populations (t-test with Benjamini Hochberg FDR multiple test correction [MTC]; P ≤ 0.05). Of these, 81 were over-expressed in AJC compared with SL (Table 1) while 60 were over-expressed in SL compared with AJC (Table 2). The fold differences were modest, most of which were below two-fold. In AJC, the genes with the highest over-expression with corrected P values were: type II keratin E1 (Genbank:CB510619; FC 2.1; P = 0.033), kinesin-like protein (Genbank:CB491150; FC 2.1 P = 0.033), and CCAAT/enhancer-binding protein (Genbank:CA050914; FC 1.9; P = 0.044). In SL the genes with the highest over-expression were: structural maintenance of chromosomes protein 1B (Genbank:CB488712; FC 2.9; P = 0.024), CD81 antigen (Genbank:CA039936; FC 2.76; P = 0.033), collagen alpha-2(I) chain precursor (Genbank:CB515159; FC 2.36, P = 0.028), ferritin, heavy subunit (Genbank:CB505886; FC 2.33, P = 0.044), and troponin I, slow skeletal muscle (Genbank:CB509964; FC 2.29; P = 0.039). Overall, the cGRASP annotation file matched our checking of EST sequences in Megablast and tBLASTx quite well, but 6 of 99 genes from the differentially expressed lists with distinct descriptions (not containing words like "unknown" or "predicted") were found to have different descriptions (Additional File 1, Table S1), many of which have been submitted to NCBI databases recently.

Table 1 Genes significantly over-expressed in Albert Johnson Creek (AJC) sockeye salmon muscle compared to Surprise Lake (SL)
Table 2 Genes significantly over-expressed in Surprise Lake (SL) sockeye salmon muscle compared to Albert Johnson Creek.

In order to expand differential expression lists to facilitate functional analysis, the stringent multiple test correction was removed during significance testing, and transcripts that showed any expression difference were included in this analysis (P ≤ 0.05). As a result, 1026 genes were found significantly differentially expressed in muscle. Of these, 498 genes were expressed at higher (or over-expressed) levels in AJC compared with SL. Of these over-expressed transcripts, 230 and 240 were annotated with biological process and molecular function Gene Ontology (GO) terms, respectively. In all cases we used the GO Slim dataset. In biological process, biosynthesis (GO:9058; P = 0.009) and behavior (GO:7610; P = 0.019) were the only GO categories significantly enriched (Table 3). In the molecular function ontology, the only enriched category was structural molecular activity (GO:5198; 43 genes, P < 0.001).

Table 3 Gene Ontology (GO) enrichment results

We found 528 genes significantly expressed at higher levels in SL muscle compared to AJC (P < 0.05, no MTC). Of this list, 267 and 313 features were annotated with biological process and molecular function GO terms, respectively. In this analysis, metabolism (GO:8152) was the only biological process category significantly enriched (P = 0.019), containing 192 genes. There are six significant categories enriched from the molecular function category (Table 3).

Reverse-Transcription Quantitative Polymerase Chain Reaction (RT-qPCR)

As 40s ribosomal and 5-amino-levuleninic acid synthase (Genbank:CB493907 and CA058136) were identified as the most stable normalizer candidates through the geNORM algorithm, these transcripts were used to generate relative expression ratios of genes of interest (GOI). The four GOI's are 72 kDa type IV collagenase precursor (Genbank:CB510651), troponin I, slow skeletal muscle (Genbank:CB510901), single-stranded DNA-binding protein, mitochondrial precursor (Genbank:CA062007), and malate dehydrogenase (Genbank:CA044864). Two of the four investigated genes were differentially expressed, 72 kDa type IV collagenase precursor and troponin I, slow skeletal muscle, both significantly over-expressed in SL juveniles, presented in Figure 2. 72 kDa type IV collagenase precursor was highly significant (FC > 2; p = 0.00013). Troponin I, slow skeletal muscle displayed a high level of biological variation among biological replicates, as can be viewed by the large 95% confidence intervals for this GOI (Figure 2).

Figure 2
figure 2

RT-qPCR transcript profiles displaying mean normalized quantities in the lake-type and riverine ecotypes. "72 kDa" is 72 kDa type IV collagenase precursor (Genbank:CB510651), "TropS" is troponin I, slow skeletal muscle (Genbank:CB510901), and "SinMit" is single-stranded DNA-binding protein, mitochondrial precursor (Genbank:CA062007). Expression is relative to the geometric mean of expression levels of normalizers 40S ribosomal and 5-aminolevulinate synthase (Genbank:CB493907 and CA058136). Malate dehydrogenase (Genbank:CA044864) is included as an example a feature that was not significantly differentially regulated with the microarray analysis. Significance was determined by a one-tailed Mann-Whitney U test, *denotes p ≤ 0.05 **denotes p ≤ 0.001.

Discussion

We have characterized molecular phenotypes in muscle tissue that relate to morphology, life history, and ecology in sockeye salmon. We also discovered differentially expressed genes and enriched functional categories associated with differing morphology and life history types of sockeye salmon in two habitats. This work represents the first characterization of a molecular phenotype in muscle or any other tissue of juvenile sockeye between these common habitat types. Because these populations are in relatively pristine habitats, these ecologically based gene expression differences provide a reference for published and future studies of sockeye salmon in habitats more impacted by human activities [17, 36, 37]

Riverine sockeye have a deep, robust body compared with the lake-type life history [27]. We find this pattern between the AJC and SL populations (Figure 1). In parallel, some patterns in expression profiles in the present study reflect these phenotypes. For example, in AJC, ten ribosomal proteins were over-expressed compared with SL and one of these (Genbank:CA045500) was among the highest over-expressed in AJC (Table 1). In comparison, we did not identify any ribosomal proteins over-expressed in SL compared with AJC (Table 2). All of these features on the array map to different contigs with the cGRASP Expressed Sequence Tag (EST) clustering online tool and therefore are likely to represent different genes. Many ribosomal proteins stabilize the structure composed mostly of ribosomal RNA [38]. Thus, the differential expression of these ribosomal proteins may indicate more protein synthesis in the muscle tissue of AJC sockeye. In addition, five genes associated with cell division, DNA replication and a growth hormone gene were over-expressed in AJC (Table 1 and Additional File 1, Table S1). These patterns are consistent with faster growth and more muscle mass associated with the deeper body morphology in AJC sockeye [27]. The GO category of biosynthesis (GO:9058) is defined as "The chemical reactions and pathways resulting in the formation of substances; typically the energy-requiring part of metabolism in which simpler substances are transformed into more complex ones" (http://amigo.geneontology.org). This was an enriched category in AJC of the GO Slim biological process ontology which further underscores that the expression profile in AJC corresponds to increased biomolecule production.

Creatine kinase (Genbank:CB503498) was over-expressed in AJC compared with SL. This gene is potentially important in both aerobic respiration in the pathway of oxidative phosphorylation, as well as anaerobic metabolism in glycolosis [39]. However in both processes this gene regulates the amount of available cellular ATP so it facilitates fluctuating energy demands [39]. This may be important to the riverine "wait and burst" feeding style of AJC, which may involve more variable levels of feeding activity.

In the SL gene enrichment analysis, the sole significant generic GO Slim ontology is metabolism (GO:8152). Also, one of the significant GO terms in the molecular function ontology is electron transport indicating aerobic respiration. Many of the individual genes in the over-expressed list relate to energy metabolism, mitochondria, and muscle contraction regulation. This is compatible with increased metabolism, especially for continuous swimming. Several of these genes may be particularly important for the continuous swimming strategy of lake-type sockeye. Troponin I, slow skeletal muscle (present twice in the over-expressed list; Genbank:CB509964 and CB510901) is a gene that regulates muscle contraction and the "slow" label of the annotation indicates that this transcript is specific to slow twitch or aerobic muscle fiber [40]. We confirmed with the cGRASP EST clustering database that these features map to different contigs and therefore likely represent two different genes. The latter of these two genes was also found to be significantly over-expressed in our RT-qPCR analysis (Figure 2). These findings could be the result of either increased red muscle fibers present, increased recruitment of red muscle fiber, or both. Additionally, 72 kDa type IV collagenase precursor (Genbank:CB510651), was over-expressed in SL juveniles and is implicated in blood vascular remodelling [41]. This gene was also found to be over-expressed in our qPCR analysis. These may lay the infrastructure for increased aerobic needs. Another SL over-expressed gene, selenoprotein K (Genbank:CA054647), is a response to oxidative stress [42], which may occur in increased aerobic activity. We did not separate red and white muscle tissue in our experiment. Many fish species have the muscle fiber types distinctly separated and ecotypes may differ in their composition of red and white muscle sections [43]. Pacific salmon, however, have red muscle fibers mixed in with the main white muscle mass [44]. We collected all of the main locomotion muscle tissue from each individual as we also wanted to capture gene expression differences due to different muscle fiber composition.

Other significant GO terms have less of a clear functional relationship with the ecology of these populations. The translation factor activity (GO:8135 and GO:45182) terms represented in SL is composed of translation initiation factor genes. This is in contrast to over-expression of ribosomal proteins in AJC including 10 in the significantly over-expressed genes (Table 1) and 37 of the 52 genes in the biosynthesis (GO:9058) term (Table 3). It is also unclear why behavior (GO:7610) is a significant GO term enriched in AJC (Table 3). There are likely important behavioral differences between these populations, but the ten genes contained within this GO term, appear to be genes that have many divergent functions and the behavioural annotations are mostly related to mice.

RT-qPCR results were concordant to the microarray results in three out of four cases. Additionally, in all four cases, the average expression level was in the same direction for the RT-qPCR and microarray assays (Additional File 2, Table S2). Our sample size was smaller with the RT-qPCR (SL: n = 9 and AJC n = 10) and this may have resulted in a reduction of power compared with the microarray assay and the lack of significance agreement in one of the four comparisons.

An unanticipated discovery was the increased expression of immune function genes in AJC, including two features annotated with MHC II function (Genbank:CB492871 and CA048654; Table 1). This may indicate differing immunity challenges in the river and lake rearing habitats of this study. This finding is a good example of indirect hypothesis generation that can come from using such large data-set producing tools. As microarrays facilitate the screening of a large number of genes they may uncover unexpected traits that are difficult to measure, even if not identified as potential traits of interest during experimental design [20].

We detected differential regulation of select regulatory genes over-expressed in AJC including two transcription factors. Pro-B-cell leukemia transcription factor 2 (Genbank:CB499801) and CCAAT/enhancer-binding protein delta (Genbank:CA050914) regulate transcription [45]. The latter has the second highest over-expression fold change in AJC. In SL, one gene annotated as "unknown" in the cGRASP annotation file, was identified through the re-BLAST methods as "far upstream binding element protein 3" (Entrez Gene ID 100194998). These regulatory genes could have cascading effects in gene expression [46], and their roles in these ecotypes should be investigated further.

Whether due to recent sequence submissions [47], or through the challenges of assembling large EST datasets in individuals with recent genome duplications [48, 49], a few annotations of differentially expressed genes varied from the originally released 16K annotation file [50]. The genes with new annotation can be viewed in Additional File 1, Table S1. One example that our individual BLAST efforts identified was 60 S ribosomal protein L14 (BT060370.1). Also, we identified another gene as the antifreeze protein, type 2 ice-structuring protein (Entrez Gene ID: 100195780). This gene has obvious ecological implications for the colonization of new lakes and may have been especially important in post-glacial lakes. These two different BLAST hits were very recently annotated 25-August 2010 [47]. These few difference between the cGRASP annotation file and current blast hits underscore both the computational complexity of assembling genomes and the constantly changing knowledge of gene function.

Both lists of differentially expressed transcripts contain many unknown function annotations, and although we cannot assign any molecular function to these genes based on this study, we do now have ecological context for these genes. Furthermore, as more genes are annotated, we may gain more insight on the role of these unknowns in the ecotype variances, as was the case with "far upstream binding element protein 3" as described above.

Our results yield both similarities and differences when compared to the gene expression work on lake whitefish [7, 21]. The morphological and expected ecological differences in juvenile sockeye salmon are not nearly as extreme as those observed in lake whitefish, which are drastically different in growth rate and age of maturity. However, like the present study, the fold change differences between ecotypes in the lake whitefish work with both microarrays and qPCR are modest, suggesting this may be the norm for ecological transcriptomic differences in natural populations. Unlike lake whitefish, sockeye salmon are anadromous, and our study populations move to the ocean after freshwater rearing, where feeding environments and access to them may be similar [51]. Therefore, differences at the juvenile rearing stage may be limited, because this is only one part of a complex life history, and the life history types may developmentally converge for the ocean feeding stage.

In lake whitefish, parvalbumin beta (Genbank:AF538283) was the only gene involved with muscle contraction regulation that was consistently over-expressed in the dwarf ecotype. We did not find evidence of over-expression of this gene in SL, but another gene involved with muscle contraction, the slow twitch isomer of troponin, was significantly over-expressed in SL. It is expected that feeding strategy promotes continuous swimming in dwarf lake whitefish [34]. In addition, dwarf lake whitefish are under high predation compared with normal whitefish, an ecological attribute responsible for increased burst swimming. This should favor both aerobic and anaerobic metabolism in the same ecotype resulting in selection favoring overall increased metabolism and muscle contraction [34]. In the present study, high predation and a burst swimming feeding strategy are expected only in riverine AJC, whereas a continuous swimming strategy and low predation should occur in lake-type SL. These differing scenarios of selection may result in less-pronounced partitioning of swimming energetics in sockeye salmon compared with lake whitefish.

In lake whitefish, many of the differentially expressed genes in nature retained differential expression when individuals were raised in a common environment [21]. Also, gene mis-expression in lake whitefish dwarf × normal backcross is associated with reduced egg survival [52]. It is difficult to distinguish cause from effect in these situations, as the mis-expression in underdeveloped eggs may be the result of the underdeveloped phenotype and the cause may be in an unmeasured earlier stage of development [53]. In summary, in lake whitefish, gene expression traits have a genetic component and can affect traits important to reproduction.

Other fish species also manifest the benthic/limnetic ecotypes including threespine stickleback [54], Dolly Varden (Salvelinus malma) [55] and Arctic charr (S. alpinus) [56]. Though despite many behavioral, morphological, and genetic studies, relatively few of these important ecological model species have been investigated at the transcriptomic level. Elmer et al. [57] found non-synonymous divergence in ESTs related to biosynthesis, metabolism and development in South American crater lake cichlids (Amphilophus astorquii and A. zaliosus). Other studies of fish transcriptomics have focused on spawning survivorship [36] and salt/freshwater transitions [37, 58]

Our study has limitations in that we only present a single tissue type in a single point in time for these populations. Also, the morphological sampling and the gene expression sampling took place in different years, though we expect that the morphological differences are temporally stable, at least in the time scale between the two sampling periods. The morphological and gene expression differences between these populations may be due to phenotypic plasticity, adaptive or non-adaptive genetic processes, or a combination of all three [12, 59]. Like many phenotypic traits, gene expression is an integration of both environmental and genetic components [20, 60]. Phenotypic plasticity itself may have a genetic component and may be adaptive, especially in species with range expansion and contraction, where colonization of new habitats occurs often [61, 62]. Even gene expression differences that are purely plastic are important to further our understanding of ecology and colonization, and may facilitate adaptation in other non-plastic traits [63].

Conclusions

We have developed the first dataset characterizing gene expression differences between two populations of sockeye salmon representing lake-type and riverine life histories. Although this represents a first step in considering the ecological transcriptomic differences of juvenile sockeye, we have already identified clear patterns relating to the divergent ecological phenotypes of these populations. In riverine sockeye muscle tissue, genes of higher expression were primarily associated with growth whereas in the lake-type sockeye, metabolism was the theme. Since these populations reside in a pristine part of the sockeye range, this study may serve as a reference location for future studies of populations that are more impacted by human activities.

Methods

Study site

Aniakchak National Monument and Preserve (ANMP) in southwest Alaska provides a unique system to study these sockeye life history strategies (Figure 3). The ANMP has undergone several recent geologic events. A massive volcanic eruption 3,650 years before present (b.p.) formed a large caldera (Aniakchak Caldera) that filled with water creating a lake [64, 65]. Approximately 1,800 b.p. [66] the caldera wall collapsed resulting in a large flood and the formation of the Aniakchak River, which connects the remainder of the caldera lake (Surprise Lake; elevation 321 m) with the Pacific Ocean through "The Gates", a chasm opened through the caldera wall by the flood [67]. A large fluvial plain was established when the passing flood dropped sediment as it exited the caldera. Several smaller eruptions have occurred, including well-documented events approximately 500 and 80 b.p. [64]. Sometime after the 500 b.p. eruption lake-type sockeye salmon colonized Surprise Lake (SL) and used the lake for juvenile rearing [68]. A riverine sockeye population also rears in Albert Johnson Creek (AJC), the largest tributary of Aniakchak River [35]. Albert Johnson Creek is a low gradient stream that meets Aniakchak River at the base of the volcano in the large fluvial plain that was the result of the caldera draining flood, 1,800 b.p.. Thus, current populations representing each of lake-type (SL) and riverine (AJC) life history types coexist in the same drainage.

Figure 3
figure 3

The two sampling locations of this study showing Albert Johnson Creek (AJC) and Surprise Lake (SL) in Aniakchak Caldera.

Morphology methods

In order to make a morphological comparison between the two populations in this study, we reanalyzed a subset of the morphological dataset from Pavey et al. 2010 of 360 age 0 (meaning previous to first winter) juvenile sockeye with only the two current study populations [Table 1 in [27]]. Twelve landmarks were digitized on each image using TpsDig (Figure 1, middle panel). All methods were identical except here we only compared the SL and AJC populations. All uses of animals in this study were approved by the animal care and use committee of either Simon Fraser University or University of Alaska Anchorage.

Gene expression methods

Juvenile sockeye salmon were sampled on August 8th 2007. The time of sampling for Albert Johnson Creek was 1535 h to 1703 h and Surprise Lake was 1832 h to 2057 h. The entire sampling effort took place within 5.5 hours including transportation from Albert Johnson Creek to Surprise Lake by a Cessna 185 airplane. A beach seine was used to capture fish and a strict sampling protocol including sampling time was enforced to reduce fish-to-fish sampling bias. Fish of similar lengths were sampled from each site. Mean fork length was 45.9 mm (n = 17; SD = 3.5 mm) for AJC and 45.0 mm (n = 13; SD = 5.6 mm) for SL. One fish from each seine haul was placed in a lethal solution of MS-222 (100 mg/l). An incision was made in the body cavity with a scalpel and the entire fish was placed in RNAlater™ (Ambion). The maximum time between netting a fish to RNA preservation was five minutes. The samples were kept cool in the field and transported, then frozen, and stored at -20°C to -80°C until RNA extraction.

RNA preparation

The samples were thawed and blotted with a Kimwipe®. All of the primary locomotion muscle tissue including red and white muscle tissue was removed from each fish. Total RNA was extracted with a modified protocol of the Invitrogen TRIzol® Plus RNA purification kit using PureLink™ Micro-to-Midi™ columns. Disruption and homogenization were achieved with a MixerMill MM301 (Retsch). The manufacturer's protocol was followed for each extraction with the exception of using 150 μl of chloroform and 150 μl of low pH phenol to ensure dissociation of proteins and isolation of RNA. The quality of all RNA samples was verified on a 1% agarose gel. All samples were quantified with a Spectrophotometer ND-1000 (NanoDrop).

cDNA and aRNA synthesis and labeling

cDNA was synthesized with Invitrogen SuperScript™ III Indirect cDNA labeling system kits per manufacturer's instructions. In brief, 10 μg of total RNA from a single individual was combined with a master mix including reverse-transcriptase and oligo (dT)20 primers. This reaction was incubated for three hours at 46°C to synthesize single-stranded cDNA. The samples were cleaned with the S.N.A.P ™ column purification procedure (Invitrogen).

A reference pool was prepared with representative total RNA samples of juvenile sockeye muscle and liver, and adult sockeye brain, muscle and liver. Multiple tissues were used to ensure hybridization of the reference channel to all spots that may have been hybridized by the sample channel cDNA, and therefore able to be quantified as a ratio. The total RNA was amplified using an Amino Allyl MessageAmp™ II aRNA amplification Kit (Ambion AM1753) as per manufacturers instructions. Briefly, RNA from several individuals from both populations of a single tissue type was combined. Then, single-stranded cDNA was synthesized from the RNA whereupon the second strand was synthesized with DNA polymerase. This product was purified through columns, and then amino allyl-modified aRNA was transcribed from the cDNA. The aRNA from divergent tissue types was combined in equal amounts at this point and this common reference pool was labeled with Cy3 to be compared with a single individual in the experiment labeled with Cy5.

Individual samples and reference material was coupled with mono-reactive CyDye™ packs (GE Healthcare). In short, the common reference pool of aRNA was coupled with Cy3 and the individual sockeye muscle tissue cDNA with Cy5 dyes for one hour at 4°C. The samples were then purified to remove all uncoupled dye using S.N.A.P.™ columns as per manufacturer's instructions (Invitrogen). The dye coupled sample and reference were stored at 4°C in the dark until hybridization.

Microarray hybridization

We used the cGRASP 16K cDNA microarray to compare the transcriptomes of these populations with divergent life histories. This microarray consists of 16,006 elements chosen from 300,000 Atlantic salmon and rainbow trout cDNA libraries [50]. The libraries were derived from a variety of tissue types at different development stages, and conditions. Element sequences were chosen for minimum overlap, sequence quality, and other criteria [50].

We followed an established hybridization protocol for the 16 k cDNA array [50]. In brief, both 250 ng of reference aRNA and 500 ng of sample cDNA were collected in a single tube and kept dark. The mixture was concentrated with a speed vacuum and brought up to 23 μl with RNase free water (Gibco). Hybridization buffer #3 (Ambion) was heated to 65°C while occasionally mixing for one hour. The heated buffer and LNAdt blocker (Genesphere) was then added to the collected sample, as per manufacturer's instructions. We used the Tecan HS 4800 Pro, an automated hybridization machine to hybridize sample cDNA to the arrays (Tecan). Before the sample injection, the programmed Tecan washed with several solutions containing first 1 × SSC, then 0.1 × SSC 0.014% SDS, then 5 × SSC, 0.01% SDS, and 0.2% BSA. Samples were heated to 80°C for 5 minutes, and then kept at 65°C until injected onto the pre-washed arrays in a Tecan HS 4800 Pro, as per manufacturers' instructions. Microarrays were hybridized for 16 hours, and the full protocol for the hybridization can be viewed in Additional File 3, Table S3.

Post-hybridization, arrays were rinsed in the Tecan modules with increasingly stringent SSC and SDS solutions, starting with 2 × SSC, 0.014%SDS for four washes incrementally decreasing temperature, then one final wash of 0.2 × SSC at 23 °C. Finally, slides were dried with 37 psi nitrogen gas and kept dark until scanned. Current protocols for cGRASP microarrays are available at: http://web.uvic.ca/grasp/microarray/protocols/tecan_hybridization_protocol.pdf

Scanning and quantifying

All microarrays were scanned immediately after hybridization was complete using a ScanArray Express (Perkin-Elmer). The microarray images were quantified manually with ImaGene 5.6.1 (BioDiscovery). Spots with unusual morphology, offset, or other poor quality parameters were flagged as marginal and excluded these from downstream analyses.

Array normalization and statistical analysis

We performed all analyses in GeneSpring GX 7.3 (Agilent). The arrays were normalized as per typical two-color experiments by performing an array-wide intensity-dependent Lowess normalization, followed by a per gene normalization, which normalized each spot to the median value. The average base/proportionate value was calculated to be an intensity of 72, so we filtered data to retain only the 14,652 entities with average raw signal expression values greater than 72 in at least one of the populations. This became our base expression data for analysis. Our GeneSpring analysis was performed in two ways. First, the dataset was filtered to retain only the genes where the average differential expression was ≥ 1.5 fold. This list was used in a t-test without equal variances assumption (P ≤ 0.05; no equal variance assumption) with a Benjamini and Hochberg False Discovery Rate multiple test correction (MTC; [69]. The spot ID's from the cGRASP 16 k annotation file (current annotation files available at: http://web.uvic.ca/grasp/microarray/array.html; [48]) were used to associate ESTs on the array with gene descriptions. To confirm current annotation of the differential gene list, a Megablast was performed on associated EST sequences, or used tBLASTx on the associated EST, or contiguous sequence (contig). The default parameters were used for these database queries. All normalized expression values as well as raw data was deposited in the NCBI Gene Expression Omnibus database (GEO Accession: GSE31214).

Gene Ontology analysis

To account for all genes potentially differentially expressed, not just those with high fold changes, or that passed highly stringent statistical methods such as those that passed the multiple test correction, a less stringent filtering on the base gene list was generated for the Gene Ontology (GO) analysis. Genes significantly differentially expressed by any amount that passed a t-test (P ≤ 0.05) and without a multiple test correction were included. We then performed GO enrichment analysis on this list of over-expressed genes using the GO browser in GeneSpring. GO categories that were significantly represented at a higher proportion in the over-expressed list than the array at large (P ≤ 0.05) for GO Generic Slim ontology of both biological process and molecular function were produced.

Reverse-transcription Quantitative Polymerase Chain Reaction (RT-qPCR)

Total RNA samples used in microarray analysis were used for RT-qPCR. Single-stranded cDNA was synthesized from 4 μg total RNA using SuperScript® III First-Strand Synthesis System for RT-PCR (Invitrogen), as per manufacturer's instructions. Briefly, total RNA was incubated for 50 min at 50°C with 5 μM oligo(dT20) primers. Each sample was then diluted 10 fold to prepare for qPCR. Four genes of interest (GOI) were selected for potential ecological relevance. Amplicons were designed within 500 base pairs of the 3' end of the coding sequence for each GOI in conserved regions between Atlantic salmon and rainbow trout (O. mykiss), and checked for specificity of sequence by BLAST.

Primer efficiency was tested by a standard curve of experimental sample cDNA synthesized as described above. The standard curve was generated from an initial 10-fold diluted sample which was then used as the starting point for a two-fold, 6 point serial dilution series. qPCR amplification was performed with SYBR GreenER™ qPCR SuperMix Universal master mix, as per manufacturer's instructions (Invitrogen), in 20 μL reaction volumes containing 400 μM primers on an Mx3000P™ thermal cycler (Agilent) with the following thermal regime: 95°C for 7 min (1 cycle); 95°C for 30 s, 60°C for 1 min, 72°C for 30 s (40 cycles); followed by a melt curve of 95°C for 30 s reading at every 0.5°C increment. Singularity and correct product identification was determined by agarose gel electrophoresis, melt curve analysis, and amplicon sequencing. Primer sequences, correlation with dilution series (R2), and efficiency values are presented in Table 4.

Table 4 Gene descriptions, primer sequences, efficiency, and R2 for each gene quantified with RT-qPCR.

For each GOI, biological replicates were run in quadruplicate on one plate with 9 and 10 biological replicates for lake-type and riverine ecotype conditions, respectively. Clear outlier technical replicates (> 0.2 Ct values from the other replicates) were removed from analysis. If one biological replicate had two technical replicates indicating one Ct value and the other two indicated a different Ct value, none were removed, as the correct pair could not be discerned. The replicate variability was within 0.5 Ct for 110 of 114 sample-target combinations. All NTC did not indicate the melt temperature of the GOI amplicon, and 5 of 6 investigated genes had more than 7 Ct between the average NTC Ct and the most dilute unknown sample (troponin I, slow skeletal muscle was only 3.8 Ct from the average NTC primer dimer; SABiosciences). Additionally, all GOIs were contained within the standard curve dilution series, with the exception of one malate dehydogenase sample which was more dilute than the most dilute point of the dilution series, and troponin I, slow skeletal muscle, which contained 3 samples more concentrated than the dilution series (all within 1.5 Ct of the most concentrated), and 8 samples less concentrated than the dilution series (all within 2 Ct of the least concentrated).

Data analysis was performed using qbasePLUS (Biogazelle). All quantified genes were tested as normalizer candidates using geNORM. We did not include Single mitochondial precurser in this test as it appeared to be co-regulated with 5-aminolevulinate synthase in these samples (both transcripts are mitochondrial precursors and showed similar non-normalized expression patterns (data not shown)). The most stably-expressed transcripts were 40S ribosomal and 5-amino-levuleninic acid synthase, collectively displaying an M value of 0.645 and coefficient of variation of 0.225, within limits typically observed for stably expressed reference genes in heterogeneous samples (M value ≤ 1 and CV ≤ 0.5) [70]. Additionally, malate dehydogenase was identified as the third best candidate, and although it was not used for normalization, it was identified as non-significantly differentially expressed element in the microarray results.

Normalized relative quantities were tested for normality through an Anderson-Darling test (Minitab 16). Not all GOIs were found to display normally distributed expression ratios, and therefore a non-parametric, one-tailed Mann-Whitney U test was used to determine significance of fold change between the groups (a one-tailed test was selected as the directionality was expected from microarray results).