Background

Local adaptation, wherein populations have higher fitness in their ‘home’ environments than in non-native locales, is a topic of great interest in the field of evolutionary biology (e.g., [1]). The genetic basis of such adaptive divergence has not, however, been elucidated in the vast majority of non-model organisms. For plants, the selective pressures leading to local adaptation can include a variety of abiotic and biotic factors such as: soil type [24], water availability [5], photoperiod [6], temperature [7], herbivores [8], mycorrhizal associations [9], and proximity to agricultural fields [10]. Because these selective pressures are expected to produce characteristic patterns of genetic variation in and near genes conferring adaptive differences, population genetic approaches have the potential to provide insight into the genes, or at least genomic regions, responsible for producing locally adapted traits across the range of a species.

In the case of divergent selection, which would be expected to play an important role in the production of locally adapted populations, the focus is typically on measures of population genetic differentiation. More specifically, divergent selective pressures would be expected to produce elevated population structure in the vicinity of targeted genes relative to the genome-wide average (e.g., [1114]). In contrast, balancing selection would be expected to result in much lower levels of population genetic differentiation [15, 16]. When combined with high-throughput genotyping approaches, such population genetic approaches have been used to identify genes thought to be involved in adaptation in a variety of species, including boreal black spruce [17], Atlantic cod [18], prairie-chickens [19], and moor frogs [20].

In addition to overall levels of population differentiation, clinal patterns of genetic variation can also be indicative of local adaptation (e.g., [21, 22]). A variety of environmental variables typically vary across the ranges of species, and thus there may be selection for different phenotypic values at the extremes of a species’ range. While allele frequencies at many loci might exhibit weak correlations across a given environmental contrast due to the joint effects of genetic drift and gene flow, alleles at loci that play an important role in local adaptation should clearly correlate with relevant environmental variables [21]. For example, adaptive clines in allele frequency have been identified in Arabidopsis thaliana for the flowering time genes FRIGIDA [23] and PHYTOCROME C [24], in Populus tremula for the flowering time gene PHYTOCHROME B2 [25], in Drosophila melanogaster for the insulin-signaling gene INSULIN-LIKE RECEPTOR [26], and in Peromyscus polionotus for the coat color gene AGOUTI [27]. While the above studies have provided tremendous insight into the genetic basis of local adaptation, studies of non-model organisms will help to broaden our understanding of this fundamental evolutionary process. In the present paper, we report on range-wide patterns of phenotypic and genetic diversity in common sunflower, Helianthus annuus.

Sunflower is a member of the Compositae (a.k.a., the Asteraceae), which is one of the largest and most diverse families of flowering plants. The native range of common sunflower spans much of North America, and wild populations occur in habitats that are characterized by variation in a wide range of environmental variables, including: photoperiod, growing season, minimum/maximum temperatures, and precipitation. Common sunflower is also the wild progenitor of cultivated sunflower (also H. annuus), which is native to east-central North America [2830] and is one of the world’s most important oilseed crops [31]. Cultivated sunflower shows significant phenotypic differences as compared to common sunflower, including branching, flowering time, plant height, and various seed traits [32].

Here, we describe patterns of phenotypic and genetic diversity within and among 15 wild sunflower populations across a latitudinal gradient in central North America. We grew and phenotyped individuals from these populations in a greenhouse environment and genotyped them using a single-nucleotide polymorphism (SNP) array targeting 384 loci distributed throughout the sunflower genome. We used these data to investigate geographic patterns of phenotypic differentiation, describe overall patterns of population genetic variation, and identify loci that harbor the population genetic signature of local adaptation. We also placed our population genetic results in the context of prior quantitative trait locus (QTL) mapping studies in sunflower to determine whether highly differentiated loci co-localize with known QTL regions.

Methods

Plant materials and phenotypic analyses

Seeds from 15 wild-collected populations of H. annuus were obtained from the USDA’s North Central Regional Plant Introduction Station (Ames, IA). These populations, which were sampled from a range of latitudes across central North America (Fig. 1; Table 1), were selected to represent truly wild populations that appear to be free from the effects of past introgression with cultivated sunflower (L. Marek and G. Seiler, personal communication). Care was taken to avoid sampling different subspecies of H. annuus (e.g., H. annuus ssp. Texanus), as that could inflate genetic structure and/or phenotypic differentiation. Prior to germination, all seeds were cleaned with 3 % hydrogen peroxide, rinsed with deionized water, and placed on moist filter paper in a petri dish. To break dormancy, petri dishes were placed at 4 C in a dark cold room for 14 days. After the cold treatment, they were moved into a growth room where they were maintained under 16 h days at 23 C. Following germination, seedlings were planted in soil trays. Once established, these seedlings were transplanted into soil pots (900 Classic, Nursery Supplies Inc, Kissimmee, FL) and moved to the greenhouse, where supplemental lighting provided a consistent cycle of 16 h days and 8 h nights.

Fig. 1
figure 1

Map of the locations of the 15 populations used in this study in the central USA and Canada. Map was constructed in R using the library ‘maps’ [65]

Table 1 Range-wide population sampling information

Plants were arranged in the greenhouse in four blocks, each of which contained five individuals from each of the 15 populations (75 individuals total per block). All plants were phenotyped for a variety of traits, including: days to four pairs of true leaves, days to flowering, plant height at senescence, branching architecture, seed size, and seed oil content/composition. Because wild sunflower is self-incompatible, manual crosses were performed to produce seeds. This involved intercrossing individuals within populations (i.e., bulked pollen collected from individuals within a population was used to pollinate individuals within that population), with inflorescences being bagged to prevent cross-contamination. Seeds were then collected at physiological maturity and phenotyped. Oil traits were assessed following established protocols [32]. Briefly, percent oil content was determined via pulsed nuclear magnetic resonance (NMR) analyses using a Bruker MQ20 Minispec NMR analyzer (Billerica, MA) that had been calibrated with known standards. Fatty acid composition was determined by gas chromatography (Hewlett-Packard, Palo Alto, CA) with known fatty acid standards (Nu-Check Prep, Elysian, MN).

All traits were tested for deviations from normality by determining whether a frequency histogram of trait values across all 286 full grown individuals (14 of the originally planted individuals died early in development, but at least 12 individuals for each population were analyzed (Table 1)) was significantly different from a normal distribution with the Shapiro-Wilk test in JMP 11 (SAS Institute, Cary, NC) and trait values were transformed using a Box-Cox transformation [33] as necessary. Restricted maximum likelihood was used with region as a fixed effect (blocks and a block-by-region interaction were included as random effects) to test for regional differences in trait values. For fatty acid traits, the date of fatty acid extraction was used as a blocking factor instead of greenhouse block because an inspection of the raw data indicated clear variation in extraction efficiency across days. Least-squares means were compared amongst regions using Tukey’s test.

DNA extractions and SNP genotyping

Leaf tissue was harvested from the 286 fully grown (Table 1) individuals described above and DNA was extracted using the Qiagen DNeasy Plant Mini Kit (Valencia, CA). All DNA samples were quantified using a NanoDrop (Thermo Scientific, Wilmington, DE) and diluted to 50 ng/μl prior to genotyping. Each sample was then genotyped using a GoldenGate assay (Illumina, San Diego, CA) targeting 384 SNPs selected from the larger collection of sunflower SNPs described by Bachlava et al., [34]. These loci were chosen to provide even coverage of the 17 sunflower linkage groups (LGs), with an average of one SNP every 3.5 cM. Genotype calls were made using Illumina’s GenomeStudio (ver. 2011.1) followed by manual inspection. Loci that exhibited aberrant hybridization signals (perhaps due to presence/absence variation or the occurrence of duplicate genes), an overall lack of polymorphism (i.e., minor allele frequency < 0.05), and/or large amounts of missing data (i.e., fraction of missing data > 0.05) were removed prior to population genetic analysis. A total of 246 loci (average = 14.5 per LG; range = 11–20 per LG) were retained for further analysis (http://dx.doi.org/10.5061/dryad.6p1c4).

Population genetic analyses

Measures of genetic diversity, including the percentage of polymorphic loci, observed heterozygosity (Ho), and Nei’s unbiased expected heterozygosity (UHe; [35]) were calculated at the population level using GenAlEx (version 6.501; [36]). We also used GenAlEx to investigate genetic differentiation amongst populations by performing an analysis of molecular variance (AMOVA) with 999 permutations to determine the level of population structure in our dataset. Finally, the program STRUCTURE (version 2.3.4) [37] was used to investigate population genetic structure across the species range. Specifically, STRUCTURE was run using the admixture model from K = 1 to 17 population genetic clusters with a burn-in of 100,000 and 1,000,000 MCMC iterations (with 20 replicates for each K value). Results were imported into STRUCTURE Harvester [38] where the most likely value of K was determined using the DeltaK method [39]. STRUCTURE, was additionally used to test individual subsets of the data to investigate finer levels of genetic structure.

The potential role of local selective pressures in shaping diversity at individual loci was investigated using multiple approaches. First, we used Arlequin to calculate 20,000 simulations in order to obtain a null distribution for FST, which was then used to develop a 99 % confidence interval for high and low outlier identification (version 3.5.1.2; [13]). In general terms, over-differentiated loci are regarded as candidates for local adaptation, while under-differentiated loci are generally viewed as candidates for balancing selection [15, 16], or possibly a sweep across multiple populations [40]. BayeScan was also used to test for selection by comparing the posterior probabilities of two models (selection vs. no selection) for each locus [14]. Following Foll and Gaggiotti (version 2.1; [14]), loci whose posterior probability for the model including selection was greater than 0.91 were regarded as being ‘strong’ FST outlier candidates. We then mined the sunflower QTL literature to identify any QTL whose confidence interval co-localized with a putative local adaptation SNP identified in this study, as such overlapping loci might be particularly attractive candidate regions for future research. Co-localization information was obtained using previously published studies from a variety of sunflower crosses [32, 4144].

Results

Phenotypic diversity

We identified numerous traits that exhibited differentiation amongst the five sampled regions, with latitude being a significant factor in the partitioning of phenotypic diversity for traits such as flowering time, plant height, branching, and a number of seed oil traits (Table 2; Additional file 1). Individuals from the southern regions (Texas and Oklahoma, Regions 1 and 2; Table 2; Additional file 1) tended to flower later, grow taller, have thicker stems, and have a higher proportion of saturated fatty acids within their seeds compared to individuals from the northern regions found in Saskatchewan, North Dakota and Montana (Regions 4 and 5; Table 2; Additional file 1). The fatty acid composition data also showed some interesting trends, with the saturated type (i.e., palmitic and stearic acid) showing the same sort of regional differentiation as noted above. In contrast, the unsaturated types (i.e., oleic acid and linoleic acid) did not show significant differences between regions. Seed oil content showed no significant differences among regions across the entire range (Table 2; Additional file 1). Aside from the aforementioned differentiation in saturated fatty acid percentage in seed oils, regions were significantly differentiated for seed length with respect to latitude. While seed weight and seed width both exhibit some regional differences, the differences were not due to latitude as the most southern region was not significantly different from the most northern region for these two traits (Table 2; Additional file 1). Notably, the latitudinal trends found in saturated fatty acid content and flowering time are consistent with the results of previous studies [45, 46]. While total branching exhibited significant differences among regions, there was no clear trend with respect to latitude. However, plants from Texas and Oklahoma (Regions 1 and 2; Table 2; Additional file 1) had significantly more top branching compared to the three northern regions. Other plant architecture traits, such as branch length and the extent of secondary, tertiary, or higher-order branching, were significantly different between regions, but those differences likewise did not show a latitudinal pattern (Table 2; Additional file 1). Interestingly, no traits exhibited significant differentiation between all five regions (Table 2; Additional file 1).

Table 2 Phenotypic variability among five latitudinal regional groupings of sunflower populations

Population genetic structure

Calculation of population genetic statistics for each of the 15 populations revealed a substantial, albeit variable, amount of genetic diversity across the range of wild sunflower (Table 3). There was no trend towards either latitudinal extreme of the range having a reduced level of genetic diversity (Table 3). However, two populations (WY1 and ND1) exhibited a noticeably lower percentage of polymorphic loci compared to the other 13 populations. An analysis of molecular variance revealed that approximately 20 % of the observed genetic variation could be attributed to population level differentiation (data not shown). Of the remaining genetic variation, 76 % was seen at the within individual levels whereas only 4 % was found at the among-individual level. A STRUCTURE [37] analysis of the data coupled with the DeltaK method for determining the most likely number of population genetic clusters [39] identified K = 2 clusters (Fig. 2). The STRUCTURE bar plot for K = 2 revealed a north-south divide with the east-central portion of Region 3 corresponding to a transition zone (Fig. 2). An additional STRUCTURE run containing only the southernmost six populations also indicated that K = 2. For this level of K, TX1 was separated from the remaining five populations found in Texas and Oklahoma, although K = 6 showed a secondary peak (Additional files 2, 3, and 4). When the northernmost six populations were analyzed by STRUCTURE, K = 2 was again the most well-supported number of genetic groups. Similar to the result for the southern portion of the range, only a single population (ND1) in the northern portion of the range was separated from the other five populations at K = 2 (Additional files 5 and 6). Additionally, since the initial full dataset STRUCTURE analysis suggested that two of the three middle latitude populations were more southern while the other population appeared more northern we performed more STRUCTURE analyses to explore differentiation within the middle of the range. To study the middle latitude populations we added NE1 and NE2 to the southern dataset, and WY1 to the northern dataset for further testing. When we performed STRUCTURE analyses of these larger groupings, we found that K = 3 for the northern cluster. The three clusters corresponded to ND1, WY1, and the remaining populations. Additionally, we found that K = 2 for the southern cluster with the one cluster corresponding to NE2 individuals, and the other contained the remaining seven populations.

Table 3 Mean and standard error (SE) of population genetic statistics for 15 wild sunflower populations
Fig. 2
figure 2

Population genetic structure of wild sunflower individuals. a STRUCTURE bar plot of full dataset. Populations correspond to those in Table 1. b DeltaK plotted across all values of K tested. Figure constructed in STRUCTURE HARVESTER [38]

Outlier identification

Multiple outlier identification programs highlighted the existence of an overlapping set of loci that exhibit the signature of local adaptation (Table 4). Arlequin identified eight loci that were highly differentiated in a global FST calculation (all possible pairwise FST combinations; 99 % confidence intervals). These loci included: one SNP on LG4 with no annotation; two SNPs located near the distal end of LG 6, one in HaFT2 [46, 47] and the other in a gene with homology to a mitogen-activated protein kinase kinase kinase 14; one SNP on LG7 in a gene with high similarity to a gene in the armadillo repeat family of proteins in A. thaliana; one SNP on LG10 in the GRAS/DELLA transcription factor GAI; two SNPs on LG 12, one corresponding to an EF-hand-like domain-containing gene, and the other corresponding to a protein of unknown function; and one SNP located on LG 14 in a gene with high similarity to Defective Cuticle Ridges (DCR) in A. thaliana. BayeScan provided complementary outlier results by identifying three highly differentiated loci (SNPs within the DCR homolog, the GRAS/DELLA transcription factor, and the gene containing the EF-hand-like domain) already highlighted by Arlequin. Four loci had evidence of being significantly under-differentiated from both Arlequin and BayeScan. There were two under-differentiated loci on LG 13, including one SNP in a gene with an alpha-beta plait nucleotide binding role and another SNP in a gene with homology to 5′-AMP-activated protein kinase. SNPs in a glycoside hydrolase and a guanylate binding gene also had exceptionally low FST, and were found on LGs 8 and 17, respectively (data not shown).

Table 4 Summary of candidate genes involved in local adaptation. FST values were determined by Arlequin and/or BayeScan and were cross-referenced against QTL information to determine the extent of QTL co-localization

Co-localization of SNP outliers with known QTL

The locations of our eight over-differentiated loci were compared to the locations of previously mapped sunflower QTL to identify traits potentially involved in local adaptation. On LG 4, an unannotated gene co-localized with a QTL for leaf number [44]. As noted above the distal end of LG 6 contains two FST outliers: HaFT2 and a gene with a putative kinase function. Both of these co-localize with QTL related to flowering time in two sunflower mapping populations, ANN1238 × CMS 89 [32] and ANN1238 × Hopi [42]. This genomic region is actually known to contain multiple HaFT paralogs, including HaFT1, which has been shown to be important with respect to cultivated sunflower’s photoperiod response [46, 47]. In addition to co-localization with the flowering time QTL in this region, there are QTL for morphological traits (e.g., achene width, plant height, and number of ray flowers) and even a QTL for leaf fungal damage. The SNP outlier on LG 7, from an EST with homology to an ARM repeat protein, co-localizes with QTL for flowering time, plant height, leaf number, and head herbivory, as well [32, 44]. Interestingly, two loci with strong support from both Arlequin and BayeScan (the GRAS/DELLA transcription factor and the DCR homolog, which map to LGs 10 and 14, respectively), did not co-localize with any known QTL. One of the two outliers on LG12, an unannotated gene, co-localized with leaf shape and number of heads [32]. Finally, the EF-hand-domain containing gene co-localized with a QTL for head total (one way of describing the degree of branching), as well as leaf and branch traits, found on LG 12 (Table 4).

Under-differentiated loci co-localized with QTL for a variety of different traits. Of particular interest were two low FST outliers located near each other on LG 13 that co-localized with a shared set of QTL that included: number of branches, number of heads, head and leaf herbivory, stem diameter, achene length, leaf area, and stem height [32, 4244].

Discussion

Populations across the range of wild sunflower harbor an exceptional amount of phenotypic diversity. The extent to which those traits contribute to local adaptation is an important question that can be addressed in a number of ways including reciprocal transplants, common garden measurements, and population genome scans. In our analyses, many traits (e.g., flowering time, plant height, plant architecture, and seed oil composition) were differentiated in conjunction with latitude. As sunflower is a seed oil crop, there has been a considerable of research done to describe and uncover the genetic mechanism behind seed oil variation. In breeding lines, strong artificial selection has created divergent germplasm groups with vastly different oil profiles. In the wild, natural selection may act as a strong force in affecting what relative amounts of saturated and unsaturated fatty acids are most beneficial for populations living in certain environments.

Common garden phenotypic variation

Seed oil composition exhibited significant latitudinal differentiation across the range. Previous studies of seed oil composition in a variety of species have revealed a negative correlation between saturated fatty acid content and latitude and degree of saturation at a relatively coarse geographic scale [45]. By quantifying the percentage of saturated fatty acids across the range of sunflower, we were able to identify a similar trend (Table 2; Additional file 1), albeit at a finer geographic scale. Given that these plants were grown in a common garden, we can infer that the observed differences have a genetic basis, and that functional polymorphisms in the oil biosynthetic pathway exist across the range of wild sunflower. The percentage of saturated fatty acids in seed oils is of considerable evolutionary importance with respect to germination under different environmental conditions. Saturated fatty acids are known to store more usable energy per carbon as compared to unsaturated fatty acids [45], but saturated fatty acids also have higher melting points than unsaturated fatty acids; the associated energy is thus less accessible in cooler temperatures. The resulting inference is that the production of unsaturated fatty acids in higher latitudes is advantageous because it ensures energy availability at lower temperatures [45]. Conversely, saturated fatty acids are better in lower latitudes because they are more energy rich while still being available to germinating seeds due to the comparably warmers temperatures.

Observed differences in flowering time can be interpreted in a similar framework. Growing seasons tend to be shorter in higher latitudes; thus, there is a premium on flowering early to allow seed set before the end of the growing season. Alternatively, in lower latitudes, there is typically a longer growing season that may select for later flowering plants that may grow to a larger size and produce more and/or higher quality seeds. It must, however, be noted that plant height and flowering time are developmentally correlated; as such, they form a suite of inter-related traits [48, 49]. The differentiation seen in this study confirms some of the patterns of diversity documented by Blackman et al., [46], with northern populations flowering significantly earlier compared to southern populations when grown at 16 h days. While common garden approaches do isolate the effects of genotype on trait variation, it should be noted that approaches like this do preclude the study of genotype-by-environment (G × E) interactions. Reciprocal transplants across the range would thus be useful to further characterize the relevance of the aforementioned traits in local adaptation. While not the focus of this study, it should be noted that altitude is also a possible cause of differentiation in a suite of traits, as shown by Kooyers et al., [50].

Population genetic structure

The STRUCTURE analysis of the full dataset revealed an overall north/south division in the natural range of wild sunflower, with a transitional zone occurring in the vicinity Nebraska and Wyoming. Previous sampling of H. annuus genetic diversity had hinted at a similar north/south division [51], and our analysis builds on this finding by increasing the marker density and sampling density within each population. Historically, this latitudinal transect has seen similar patterns of genetic differentiation. For example, using transplant gardens, McMillian [52] showed that multiple grassland species exhibited heritable differences in flowering time in which northern populations flowered significantly earlier. When further STRUCTURE analyses were performed on northern and southern subsets of individuals, it was discovered that hierarchical structure exists in our dataset. In other words, the large north/south split identified in the full dataset may have obscured more subtle patterns that differentiate individual populations.

Candidate adaptive loci

In terms of population genetic differentiation, we identified interesting possible candidates for conferring local adaptation with respect to flowering time. We found two outlier loci on chromosome 6 with SNPs that co-localize with a gene with putative kinase activity and HaFT2. Both loci co-localize with previously identified QTL for flowering time, [32, 42] in addition to other traits (Table 4). FT2 is a gene whose Arabidopsis homolog has been shown to play a major role in promoting flowering [53]. Moreover, the region of sunflower LG 6 where this gene resides has been previously shown to influence flowering time in domesticated vs. wild sunflower [32, 42, 47]. It should be noted that the mapping parents for these crosses consisted of a wild × crop and wild × landrace. The extent of linkage disequilibrium (LD) of this region is currently unknown, although previous work indicates that, on average, LD decays quickly in wild sunflower [54]. Studies of cultivated germplasm suggested that there is variation in LD across the sunflower genome [55]. In addition to mapping information, HaFT2 is an exceptional candidate for local adaptation due to previous gene expression work across the range of wild H. annuus [46]. In short days, a cline in gene expression was seen for HaFT2 in which northern individuals exhibited higher expression than southern individuals, consistent with this gene playing a role in adaptive differentiation [46]. Our results add to the observation that HaFT2 exhibits a latitudinal cline in gene expression that is consistent with the effects of selection by providing population genetic evidence of selection on this gene, as well.

We uncovered SNPs with significantly elevated population differentiation values on other chromosomes. A strongly differentiated SNP on LG 14 resides in the sunflower homolog of Defective in Cuticle Ridges (DCR). In A. thaliana, mutants of DCR have altered trichome development during leaf growth [56, 57]. Trichomes serve a multitude of functions in plants including: reflectance of sunlight to prevent damage [58], retention of water [59], and defense [60]. As many of the aforementioned factors may correlate with growing season, it is difficult to draw any conclusions without additional data. We cannot conclude, for example, that the variants documented herein are in any way causal in nature. Rather, they provide us with a preliminary pool of candidate adaptive regions for further study. Furthermore, since we lack knowledge concerning the strength of linkage disequilibrium in these genomic regions, these SNPs may simply be linked to causal polymorphisms found in nearby genes.

These FST outliers form a list of possible candidate genes for future experiments. Importantly, the extent of linkage disequilibrium needs to be assessed in these genomic regions in order to determine the size of the region of elevated population structure. A possible explanation for the absence of co-localizing QTL for some SNPs is that no wild × wild mapping populations currently exist for sunflower. Alternatively, many subtle (trichome density or morphology) and biochemical phenotypes have not been measured and thus could not have co-localized with population differentiation. Marker density has become the main limitation in genome scan studies of local adaptation in natural populations [61]. The advent of high-throughput methods such as restriction site associated DNA sequencing (RAD-seq) and genotyping by sequencing (GBS) have allowed researchers to obtain both large numbers of markers and an even genomic distribution [6264].

Conclusions

In this study we used 246 loci to characterize the range-wide genetic diversity and structure of the wild progenitor of an economically important crop species. These markers clearly indicated a genetic disjunction between northern and southern populations that occurs around the 40° north latitude, with populations in Nebraska appearing to be admixed (Fig. 2). This study also generated multiple candidate genomic regions for local adaptation as defined by the extent of their population genetic differentiation. The extent to which these genomic intervals are associated with previous trait mapping experiments is also considered. These loci represent larger physical genomic intervals that will be the focus of future molecular evolutionary analyses, gene expression comparisons across the range, and field studies to further examine their putative role in local adaptation.