Background

The advent of large-scale transcription profiling technologies, such as DNA microarrays [1] and RNAseq technology [2], has allowed the analysis of gene expression patterns at the genome level. DNA microarrays have been used for genetic mapping studies based on polymorphisms between parental genotypes [1]. When used to analyze genetically segregating populations, DNA microarrays have facilitated the discovery of gene expression markers [2]. Gene expression markers can be defined as transcripts encoded by genes whose relative messenger RNA expression level is inherited and segregates as a phenotypic trait [3].

Apple tree breeding is a slow and difficult process for reasons that include long juvenility periods, large size of mature plants, inbreeding depression, reproductive self-incompatibility, and complex phenotypes related to grafted trees [4]. The use of molecular markers for marker-assisted breeding and selection has the potential to expedite this process by increasing the percentage of desired genotypes and associated phenotypes early on in the breeding pipeline and assisting breeders in combining desirable traits from different parents into breeding progenies [5, 6]. In the present study, our objective was to identify genes whose steady-state expression level in healthy, uninfected apple shoot tips correlated with the inheritance of agriculturally important traits in an apple rootstock breeding population. These transcripts would have the potential to be used as molecular markers by themselves, or could be used to develop DNA polymorphism markers for marker-assisted selection and gene mapping in the population.

Genetics in apple is typically done in the F1 generation due to the self-incompatibility of apple [7]. The apple rootstock population used for this study was an F1 population from a cross of two highly heterozygous rootstock parents, ‘Ottawa 3’ (O3) and ‘Robusta 5’ (R5). This is a phenotypically diverse and well-characterized population that is segregating for numerous traits of interest to apple growers. The segregating traits include resistance to biotic stresses such as fire blight (Erwinia amylovora; [8]) and powdery mildew (PM, Podosphaera leucotricha; [9]) diseases and resistance to the woolly apple aphid (WAA; Eriosoma lanigerum; [8, 10]) pest.

We applied DNA microarray transcription profiling to 48 individual F1 trees from the O3 × R5 cross and identified transcripts with expression levels associated with PM disease and WAA pest resistance phenotypes. When the genes encoding these transcripts were mapped to the apple genome, they were found to be physically clustered. This is similar to the phenomenon described for single nucleotide polymorphisms and gene expression markers in Brassica napus[11]. The utility of using physically clustered, differentially expressed genes for DNA marker development will be discussed.

Results

Microarrays

RNA was isolated from healthy, uninfected, uninfested shoot tips collected from 48 individual F1 trees from the O3 × R5 cross population growing in a rootstock production stool bed in Geneva, NY. The RNA was used to probe 24 microarrays in multi-plex format, using two different color probes per array, so that all 48 RNA samples could be assayed on the 24 microarrays. The microarrays were clustered based on their expression profiles using the hierarchical clustering function in R (complete linkage, Figure 1). The array clustering groups did not consistently correspond with either PM resistance or WAA resistance, with closely clustered arrays often including F1 individuals with contrasting phenotypes for PM and WAA resistance (Figure 1).

Figure 1
figure 1

Clustering of the 48 arrays based on their overall similarity in gene expression patterns. The tree represented by each array is indicated by tree identification number. The powdery mildew and woolly apple aphid resistance phenotype for each tree is indicated.

Physical clustering of differentially expressed genes

A group of F1 individuals resistant to PM and a group of F1 individuals susceptible to PM were selected (Figure 1). Using standard differential expression analysis [12], thirty transcripts whose expression levels were differentially expressed between the PM-resistant and PM-susceptible phenotype groups were identified (q-value < 0.05; Table 1; Additional file 1: Table S1). The physical locations of the genes encoding these transcripts on the ~742 Megabase (Mb), 17-chromosome apple genome [7] were determined using the BLAST [13] server on the Genome Database for Rosaceae [14]. Twenty-four of the genes were located on chromosome 12 (Figure 2a), with nineteen being clustered within a 10 Mb region centered on the major, previously-identified PM quantitative trait locus (QTL) segregating in the in the O3 × R5 F1 breeding population (Figures 3a and 4).

Table 1 Differentially expressed genes for powdery mildew and woolly apple aphid resistance identified by q-value analysis
Figure 2
figure 2

Physical clustering of differentially expressed genes at the genome level. Genes that were differentially expressed between phenotypic groups of trees were mapped to each of the seventeen apple chromosomes, as indicated. Phenotypic groups were developed based on resistance to powdery mildew disease (a), resistance to woolly apple aphid (b) and several gene expression markers (GEMs, c-f).

Figure 3
figure 3

Physical clustering of differentially expressed genes at the chromosome level. Genes that were differentially expressed between phenotypic groups of trees were mapped on to the chromosome where they were most abundant for that trait, as indicated. The distribution along single chromosomes of genes that were differentially expressed between phenotypic groups of trees based on resistance to powdery mildew disease (a), resistance to woolly apple aphid (b), and four different gene expression markers (GEMs, c-f) are shown. Asterisk indicates chromosome segment containing the major, previously identified QTL for the trait (a, b) or containing the gene expression marker gene used as the molecular trait (c-e).

Figure 4
figure 4

Expression patterns of physically clustered differentially expressed genes. For each trait, a subset of all the differentially expressed genes was located within a 10 Megabase (Mb) window centered on the physical location of the trait of interest in the apple genome. For powdery mildew resistance, a 1.6 Mb expanded window shows details, including transcript identifier numbers. Differentially expressed genes correlating with powdery mildew and woolly apple aphid resistance included some having higher expression (positive correlation) and some having lower expression (negative correlation) in resistant trees. Similarly, differentially expressed genes correlating with gene expression markers included some having higher expression (positive correlation) and some having lower expression (negative correlation) in trees where the gene expression marker gene expression level was high. Chr, chromosome.

Similarly, a group of F1 individuals resistant to WAA and a group of F1 individuals susceptible to WAA [8, 10] were selected (Figure 1). Using standard differential expression analysis, seven transcripts that were differentially expressed between the WAA-resistant and WAA-susceptible groups were identified (Table 1; Additional file 1: Table S1). The genes encoding five of these transcripts lay on chromosome 17 (Figure 2b), all within the top 9 Mb of chromosome 17 (Figure 3b), and three of these were within about 1 Mb of a major, previously-identified WAA resistance QTL (Figures 3b and 4).

Additionally, we performed similar analyses using gene expression markers [3] with clear segregation patterns in the apple F1 population as molecular traits (Additional file 2: Figure S1). Trees were separated into phenotypic groups based on the expression level of the gene expression marker, with one group containing plants with high expression levels of the gene expression marker, and the other group containing plants with low expression levels of the gene expression maker. Then, genes that were differentially expressed between the two phenotypic groups were identified by standard differential expression analysis. In all cases examined, genes that were differentially expressed between phenotype groups based on gene expression markers were disproportionately located on single chromosomes (Figure 2c-f, and Additional file 1: Table S1), often clustering in the physical vicinity of the gene expression marker gene used to define the phenotypic groups (Figure 3c-e), or occasionally clustering at a separate location (Figure 3f; and Additional file 1: Table S1).

The expression levels of the physically clustered, differentially expressed genes had a mixture of positive and negative associations with their associated trait (Figures 4 & 5). For example, some of the physically clustered genes with expression levels associated with PM resistance had higher expression levels in the resistant trees (positive phenotype association), while others had lower expression levels in the resistant trees (negative phenotype association), and the expression levels of intervening genes had no phenotype association (Figures 4 & 5). In some instances, differentially expressed genes with positive and negative phenotype associations were in juxtaposition (Figures 4 & 5) with as little as 90 kb distance between them (Additional file 1: Table S1). Gene expression marker genes with expression patterns not associated with the PM resistance phenotype were visible within the PM QTL region (Figure 5).

Figure 5
figure 5

Expression heat map of genes in the area of the powdery mildew resistance QTL. Genes are arranged in their linear order along the chromosome, and trees are divided into groups according to powdery mildew resistance phenotype. Each column of colored blocks represents gene expression readings from one individual tree. Data for all genes queried by the microarray lying between positions 28.7-30.9 Mb of chromosome 12 are shown. Green blocks indicate trees where the expression of a given gene was lower than the average for that gene across all 48 trees; red blocks indicate plants where the expression of a given gene was higher than the average for that gene across all 48 trees. Genes differentially expressed between the powdery mildew disease resistance phenotype groups are indicated by sequence ID numbers. Gene expression markers with segregation expression patterns that did not correlate with powdery mildew disease resistance are denoted by diamonds. Chr, chromosome.

It is notable that the genes with trait-associated expression levels did not necessarily have the same expression pattern in all individuals in a phenotype group. For example, PM resistance-associated genes APPLE0F000001606, APPLE0FR00048809, and APPLE0F000052120 are visibly quite consistent in their expression within a phenotypic group, while APPLE0F000026140 and APPLE0F000002331 are less consistent within each phenotypic group (Figure 5).

Validation of gene expression level heritability and consistency

The heritability of expression level of selected genes with trait-associated expression levels was validated by quantitative polymerase chain reaction (qPCR) analysis using the O3 and R5 parents and a group of F1 individuals in the O3 × R5 F1 population growing in a location different from the trees used for DNA microarray analysis and sampled during a different year. qPCR analysis showed that PM resistance-associated gene APPLE0FR00048809 had higher expression in parent R5 compared to parent O3 (Additional file 2: Figure S2), just as predicted by the microarray experiment. Furthermore, expression of APPLE0FR00048809 among the O3 × R5 F1 population used for qPCR analysis segregated at a 1:1 ratio (Additional file 2: Figure S2), just as predicted by the DNA microarray data. In addition, a gene expression marker that had a distinctly bimodal expression in the DNA microarray analysis (APPLE0F000001974) also had bimodal expression in the O3 × R5 F1 population used for qPCR validation; this bimodal expression segregation was visible among PCR amplicons (Additional file 2: Figure S3). Finally, the relative gene expression level relationships between APPLE0FR00068101 and several genes with associated expression patterns and which exhibited continuous expression level distribution in the array data (APPLE00R00024612, APPLE0F000011491, APPLE0F000050102) were maintained in the qPCR relative expression data (Additional file 1: Tables S2 and S3).

Development of DNA markers based on genes with trait-associated expression level

As a proof of concept, DNA markers were developed for WAA resistance-associated gene APPLE0FR00068101. The sequence of APPLE0FR00068101 was used to identify Malus × domestica contig MDC015568.236 (6,831 bp long), which contained the best matching DNA sequence according to BLAST analysis. Several polymorphisms were identified within this region when comparing the O3 haplo-contigs with the R5 haplo-contigs. Of particular interest for easy marker development were two microsatellite regions between bases 2,500 - 3,500 of contig MDC015568.236, for which PCR primer pairs were designed (Waa68101-236ssr, forward primer 5′-GGGTTGAAGTGCGAGAC-3′, reverse primer 5′-CACGCGACGAGGTATTCCAAC-3′; and Waa68101-236Indel, forward primer 5′-CCAAATTATGCATACAGATG-3′, reverse primer 5′-GATTAATGATTAGAAGAAC-3′) and tested with parent DNA with annealing temperature gradient PCR (Additional file 2: Figure S4). Both markers were polymorphic between the parents, but only the Waa68101-236ssr was heterozygous in the R5 parent, showing bands at approximately 360 bp, 460 bp and 520 bp (Additional file 2: Figure S4). Segregation analysis in the O3 × R5 population showed very strong association of the Waa68101-236ssr 360 bp band with resistance (p = 0.0001). The Waa68101-236ssr SSR marker was more predictive of WAA resistance in the O3 × R5 population than the published interval containing the R5-derived Er2 gene delineated between SSR markers GD96 (MDC021359.285 at 11,796 Kb on Chr17) and GD153 (MDC013709.214 at 9,138 Kb on Chr17) [7, 10].

Discussion

The results of array clustering for the 48 microarrays indicated that overall gene expression patterns of individual plants were not robust indicators of PM or WAA resistance phenotype. In contrast, the differential gene expression analysis based on phenotypic groups of F1 individuals yielded relatively small numbers of genes that were differentially expressed between the phenotypic groups, and these differentially expressed genes displayed a remarkable degree of physical clustering on the apple genome. The clustered genes were typically in the physical vicinity of the major locus controlling the trait, in the case of PM and WAA resistance, or in the vicinity of the gene expression marker used as the molecular trait. Clusters at locations unlinked to their corresponding trait of interest (Figure 3f; and Additional file 1: Table S1) might represent the locations of other QTLs related to the phenotype. All of the phenotypes examined in this study were controlled by single, major, dominant QTL, which allowed detection of linkage using only 48 F1 individuals. Analysis of larger numbers of individuals would certainly be required in order to analyze multi-locus traits effectively.

The clustered differentially expressed genes are not necessarily involved in controlling their associated phenotype. For example, the differentially expressed genes associated with PM resistance did not show any obvious functional patterns or similarities (Table 1). It is also important to note here that differentially expressed genes we examined here were not selected based on their induced expression during pathogen or insect interaction. Rather, the differentially expressed genes represented transcripts whose steady-state expression levels in healthy tissue were associated with PM or WAA resistance phenotype status. It is possible that examining differential gene expression using infected or infested samples might mask the clustering due to the large numbers of genes being up- and down-regulated in response to the stress.

The clustering pattern of differentially expressed genes is consistent with the relative expression levels of these genes being inherited from a parent. This is different from genome neighborhood effects, where groups of linked genes are typically up- or down-regulated together [15]. Just as DNA polymorphism markers can be linked to a trait locus, expression patterns of some of the nearby genes are also linked. By grouping trees according to an inherited trait of interest, one might expect that the differentially expressed genes would be identified simply due to their decreased expression variation within a particular pool compared to non-correlating genes at loci unlinked to the trait of interest. However, it is remarkable that the differential expression patterns between the phenotypic groups included so few genes, and that so many of these were physically clustered. This suggests that heritable differences in gene relative expression were predominantly detected by the analysis, rather than genes whose expression levels might contribute to or be necessary for the development of the phenotype. Such genes would be expected to be scattered randomly across the genome. Our results are similar to those seen in Brassica napus using single nucleotide polymorphisms (SNPs) and gene expression markers [11]. However, the present study used a segregating apple tree F1 population, while the other study used a collection of accessions.

Validation of the expression patterns using qPCR indicated that most of the genes indeed had patterns of expression consistent with the array data. The congruence of DNA microarray and qPCR data for selected differentially expressed genes and gene expression markers provided strong validation for the DNA microarray data. qPCR validation was successful using a different set of individuals from the same cross in a different environment and year from those used for the microarray, indicating that the differentially expressed genes had relative expression levels consistent across different growing conditions and years and between different groups of individuals from the O3 × R5 F1 population. The development of PCR based molecular markers associated with several of the differentially expressed sequences was in many cases successful because sequence mutations such as large INDELs and microsatellite variation was discovered within or nearby the target genes. While several methods are available to detect polymorphisms in marker assisted breeding, markers based on the polymerase chain reaction are still the most accessible and least expensive for small scale breeding programs. The combination of expression analysis for target identification and sequence based marker development proved a good strategy as the PCR markers developed in this study have been routinely proven useful in apple rootstock breeding program in Geneva, NY. As RNAseq methods become more refined, it may be possible to find differentially expressed genes associated with traits of interest and at the same time leverage polymorphisms contained in the expressed sequences for haplotype specific breeding marker development.

Conclusion

We have shown in a segregating population from the cross of a highly heterozygous plant that gene expression analysis can result in identification of differentially expressed genes that are physically linked to one another. Gene expression heritability as a method to detect genes physically linked to trait loci of interest could be useful in crops for which few genetic and genomic information resources are available. Even in the absence of a genetic map, molecular markers linked to traits of interest could be developed using transcriptome profiles of segregating populations, since a substantial proportion of the differentially expressed genes would be expected to cluster in the vicinity of the trait of interest. While there may be no causal link between the differentially expressed genes and their associated traits, they do provide an excellent starting point for development of DNA markers linked to segregating traits of interest.

Methods

Plant materials

The trees used for the DNA microarray analysis were from a segregating F1 population from an O3 × R5 cross and were grown in an orchard in Geneva, NY [16]. Samples were taken in early summer of 2009 from healthy, uninfected, uninfested individual shoots from first-year growth of 48 plants in a propagation stool bed. Shoot tips samples comprised all shoot tissues up to and including the first fully-expanded leaf. Sampled shoots were carefully selected so that they were as similar to each other as possible in size and shape to minimize sampling variation. The samples were flash frozen in liquid nitrogen and stored at -80°C for later RNA isolation.

For the qPCR analyses, shoot tip samples were collected in late spring of 2013 from a separate group of 46 individuals belonging to the same O3 × R5 population from clonally propagated material in a replicate orchard in Geneva, NY. Shoot tips from the population parents (O3 and R5) were collected from trees at the apple collection of the USDA ARS Plant Genetic Resources Unit in Geneva, NY.

RNA isolation and microarray analysis

Total RNA was isolated from whole apple shoot tips as previously described [17]. The microarray data used in the present study were generated during a previously-reported study [18] and subjected to a new analysis. The contig sequences used for array probe development are accessible at the Gene Expression Omnibus (GEO) dataset website [19]. The array used was a second-generation array in a 12-plex array format containing 135,000 probes per plex, representing 26,017 transcripts, enabling us to query a relatively large number of samples. Each transcript was queried by 4–5 probes of 60–70 bases in length. The array included the best-performing probes from the first-generation array and was enriched for differentially-expressed genes based on analyses of the first-generation array [17]. The genes predicted to encode the 26,017 transcripts probed by the array represented were evenly and randomly physically distributed across the apple tree genome. The expression levels for each individual F1 tree were analyzed using a single array only; however, analyses were conducted using pooled data from trees with similar phenotypes, which represented pseudo-replicates in this context [20].

While the parents of the breeding population were different from the varieties used to design the DNA microarray, this did not interfere with probe performance or account for patterns. Any nucleotide polymorphisms between the probes and the samples, when present, did not affect hybridization, as the other probes for the same transcripts, which had no polymorphisms, gave similar intensity values (Additional file 1: Table S4). In addition, mismatches between probes and samples did not correlate with variation in between probe signals (Additional file 1: Table S4). For example, the probes for APPLE0FR00039157 had 1 or 5 mismatches to their target per probe, yet they produced data with similar signal intensity and standard deviations to the APPLE0FR00031686 probe set, which had no mismatches to their target (Additional file 1: Table S4).

Differential gene expression analysis

The gene expression data from the DNA microarray hybridization experiments were previously normalized using R software [18]. To identify differentially expressed genes based on PM resistance phenotype, for example, the trees were divided into two groups, one group containing the PM-resistant trees, and the other group containing the PM-susceptible trees. The mean of the log2 (expression) value for each transcript was then calculated separately in each phenotypic group, and then the M-value (log2 fold difference in expression) for each transcript was computed as the difference in the mean log2 (expression) value for each transcript between the two groups of trees. Empirical Bayes ANOVA analysis [21] was performed using the LIMMA (Linear Models for Microarray Data) package [22] as part of the R Bioconductor suite [23]. P-values (false positive rate) from this analysis were then converted to q-values (false discovery rate) using the methodology of Storey and Tibshirani [11] as implemented in the Bioconductor q-value routine [24]. Differentially expressed transcripts were identified as transcripts with statistically different levels of expression between the phenotypic groups (q-value < 0.05), regardless of the magnitude of the difference (M-value). Differentially expressed genes for WAA and the several GEM traits were identified using the same approach, using phenotypic groups defined by each trait.

Identification of gene expression markers

The first step was to calculate the mean of the log2 (expression) value for each transcript in the entire F1 population. Next, the divergence in expression for each individual tree compared to the average was determined for each transcript. GEMs were identified as transcripts that could divide the trees into groups of roughly equal size based on having at least a 1.5-fold difference in expression levels between the groups (Additional file 2: Figure S1).

Expression pattern validation by qPCR

Shoot tip samples were processed fresh, immediately after collection, using the ZR Plant RNA MiniPrep kit (Zymo Research, Irvine CA, USA) following the manufacturer’s instructions, with the addition of DNAse I (Invitrogen, Grand Island, NY, USA) to the RNA wash buffer in the kit as recommended in the manufacturer’s protocol. cDNA synthesis was performed using the QuantiTect Reverse Transcription kit (Qiagen, Germantown, MD, USA) according to the manufacturer’s instructions. PCR primers for selected differentially expressed genes and gene expression markers were optimized using parental genomic DNA with Annealing Temperature Gradient PCR (ATG-PCR; Additional file 1: Table S5). qPCR was performed in 25 μl reactions using LightCycler 480 SYBR Green I Master reaction mix (Roche, Indianapolis, IN, USA) and a LightCycler 480 instrument (Roche) A Basic Relative Quantification workflow was used that included fluorescence measurement at each PCR extension step and a final melting step measuring fluorescence from 95-55°C for Melt Curve Genotyping. An actin gene (Genbank accession number EB 136338, primers: 5′-GGCTGGATTTGCTGGTGATG-3′ and 5′-TGCTCACTATGCCGTGCTCA-3′) was used as the reference for Relative Quantification Analysis (RQA). The Crossing Point (Cp) values and ratios between target and references were calculated using the LC480 software and algorithms (Roche). Melt curves were analyzed for non-specific amplification peaks (Additional file 2: Figure S2). Amplicons were resolved on 1.5% ethidium bromide (EtBr) stained agarose gels and visualized with an AlphaImager HP gel documentation system (ProteinSimple, Santa Clara, CA, USA).

DNA marker development

The sequence of gene APPLE0FR00068101, whose expression was associated with WAA resistance, was compared to the ‘Golden Delicious’ apple genome hosted by the Genome Database for Rosaceae [14] using the National Center for Biotechnology Information (NCBI)’s BLASTN program [12] to identify four contigs (MDC015568.236, MDC013761.427, MDC015568.269, MDC000748.724) containing similar sequences (e-values between 1E-97 to 2E-89). All four contigs were located on apple chromosome 17 within a 74 kb interval (genome base pair positions 1,405,743-1,479,871; Additional file 2: Figure S5). Genomic sequences for parents O3 and R5 had been obtained by Next-Gen Illumina Hi-Seq paired end sequencing. Geneious bioinformatic software (Biomatters, San Francisco, CA, USA) was used to construct a local alignment of next-gen sequences to the Malus × domestica contig containing the complete predicted target gene sequence (MDP). Contig MDC015568.236 contained sequences most similar to the R5 next-gen sequences, and was therefore chosen for further analysis. Unique single nucleotide polymorphisms (SNPs), simple sequence repeats (SSRs), and haplotypes were identified for the R5 (WAA-resistant) parent. Several 18–21 bp PCR primers were designed to match unique SNP haplotypes at the 3′ end, and primer pairs were tested with parental DNAs using annealing temperature gradient PCR (45°C to 65°C) to verify genotype specificity, stability and reproducibility of amplicons. Amplicons were resolved on 2% EtBr-stained agarose gels and visualized with an AlphaImager HP gel documentation system (ProteinSimple). In addition, some primers were designed flanking microsatellite SSRs. Genotype-specific amplicons were then tested on segregating individuals in the O3 × R5 population and a diversity panel of apple rootstocks to verify genetic inheritance, linkage to other markers, and haplotype uniqueness.

Availability of supporting data

Microarray data are available through the GEO website using accession number GSE43268.