Background

Intraspecific gene flow can generate novel genetic pools and promote adaptive evolution in differentiated populations. Intraspecific variation and hybridization in natural populations with dissimilar evolutionary histories are widely acknowledged as crucial factors for different patterns of local adaptation, and understanding these evolutionary processes has become more relevant due to climate change [1]. Forest trees have been proposed to be susceptible to rapid climate change due to their long generation times. However, trees also present high levels of standing genetic diversity due to large populations sizes, extensive levels of gene flow and wide distributions across different environments, which makes them ideal systems to detect adaptive signals and to study the genetic basis of adaptation [2]. In the past, efforts to dissect the genetic basis of adaptation relied on population genetic analyses that focused mainly on the genetic differentiation of the populations without considering environmental heterogeneity among them [3]. Recent approaches in landscape genomics integrate multiple genetic, spatial, and environmental data to shed light on the genetic variants underlying adaptation to different climate scenarios.

Douglas-fir constitutes one of the world’s most important timber trees. It grows under a wide range of climatic conditions and has been part of the landscape of western North America since the Pleistocene. With warming environmental conditions, research on Douglas-fir has focused on testing the ability of the species to grow while withstanding changes in temperature (heat stress) and low soil water content (drought stress) [4,5,6,7,8]. Two varieties with adaptive variation are formally recognized: the coastal variety along the Pacific coast (P. menziesii var. menziesii), and the interior variety across the Rocky Mountains (P. menziesii var. glauca). Populations of Mexican Douglas-fir are small, extremely fragmented, and non-continuous in distribution, except in a larger area in the state of Chihuahua [9]. Compared to populations from northern Mexico, populations from central Mexico are genetically, morphologically, and phenologically more distinct from populations from the United States [10,11,12,13].

Fossil records exist for Douglas-fir from the early Miocene to the late Holocene [13, 14]. Fossils from the Miocene and Pliocene were located along the west coast (from British Columbia to California) and in the Columbia Plateau and Great Basin. Pleistocene fossils were detected in the Rocky Mountains and in the west coast; however, no fossil records were found in the Columbia Plateau and the western Great Basin [13, 14]. This change in the distribution of fossil records correlates with the Pliocene orogeny of the Sierra Nevada and Cascade Mountain ranges, which is thought to be the cause of the vicariant separation of the species into interior and coastal varieties [13, 14]. In addition, fossil records suggest that subsequent differentiation within varieties may have occurred due to climate change refugia during the Pleistocene glaciation [13, 15]. Posterior events of secondary contact in British Columbia and the Washington Cascades led to the formation of two inter-varietal hybrid zones in these geographic locations.

Demographic, phylogenetic and population molecular studies have shown differentiation between the varieties and populations of Douglas-fir [13, 16,17,18,19]. Genetic differentiation between the two varieties was reported using allozymes more than three decades ago [17]. Chloroplast and mitochondrial markers revealed Pleistocene divergence of the Mexican populations, which resulted in the classification of the interior variety into two different lineages: the Rocky Mountain and Mexican lineages [13, 16]. Population structure analyses using nuclear microsatellite markers identified refugial populations with intra-varietal and inter-varietal gene flow caused by hybridization and introgression [15, 19].

Studies that assess the relation between genetic diversity and environmental responses in Douglas-fir have been scarce. Recently, based on genome-wide sequencing approaches, genes associated with local adaptation have been identified [7, 20,21,22,23]. A comparative transcriptome analysis of Douglas-fir trees from two provenances, from a coastal and interior habitat, with contrasting natural environments was carried out to evaluate the effect of abiotic environmental factors on gene expression responses [21]. Targeted sequence capture and mixed effect models were used to detect high differentiation of drought tolerance genes between interior and coastal trees grown in experimental conditions [22]. In coastal populations, single nucleotide polymorphisms have been associated with cold-hardiness and phenology related traits [7, 20]. In this study, we genotyped trees of the two varieties to characterize the population structure across the natural, species-range distribution of Douglas-fir in North America; identify the evolutionary processes maintaining genetic variation and population structure; assess the levels of inter-varietal hybridization; identify candidate loci for climate adaptation; and predict the impacts of climate change in the future distribution of the species.

Materials and methods

Plant material and DNA extraction

Seeds and needle tissue were collected from 577 open-pollinated trees across the Douglas-fir natural distribution range, from Mexico to British Columbia, Canada (Table 1). Trees were collected at a minimum distance of 20 m to avoid relatedness among individuals. Prior to extraction, seeds were soaked in a solution of 70% water and 30% of 3% hydrogen peroxide for 12 h. Ten haploid megagametophytes for each family were pooled to infer the maternal genotype. DNA from megagametophytes was extracted using the Qiagen DNeasy mini-prep Plant kit and an Eppendorf automated pipetting workstation. When needle tissue was available, DNA was extracted using a modified CTAB method [24] or the MPBio (MPBiomedicals LLC, Ohio, USA) RapidPure DNA Plant kit. The CTAB method was modified to include a wash step of homogenized tissue with CTAB buffer prior to 65 °C incubation. DNA quality and concentration was evaluated with a NanoDrop Spectrophotometer, a Qubit 4.0 Fluorometer and agarose gels using Invitrogen’s E-Gel Power Snap system.

Table 1 Geographic location and mean environmental variables of individuals included in this study

SNP genotyping and filtering

Samples were genotyped using a custom-designed gene-based Illumina Infinium SNP array containing 16,146 Douglas-fir single nucleotide polymorphisms (SNPs) at the University of California-Davis Genome Center. This array was designed to represent genome-wide variation in the species by using 10X whole-genome re-sequencing data from individuals across the species’ geographic range as input for array construction [7, 8]. Genotypes were called using Illumina’s Genome Studio Genotyping Module v 2.0.5 (Illumina, 2016). Filtering criteria included a SNP call frequency > 0.85, individual call rate > 0.85, non-monomorphic and a minor allele frequency > 0.01.

Population structure and hybridization analyses

Principal component analysis (PCA) was carried out to infer population structure using the gdsfmt v.1.34 and SNPRelate v.1.32 packages [25]. Population structure was also inferred by a Discriminant Analysis of Principal Components using the find.clusters function of the adegenet (v.2.1.10) package [26]. The best number of genetic clusters was determined using the Bayesian Information Criterion (BIC). To estimate ancestral differences between varieties and populations, ADMIXTURE software v.1.3.0 [27] was employed using 10 independent runs for k-values ranging from 2 to 10. Admixture proportions obtained for K = 2 and the two K-values with the lowest averaged cross-validation error were chosen. CLUMPAK v.1.1 was used to produce bar plots [28]. Pie chart maps of individual admixture proportions were constructed with QGIS and the option around the point for better visualization. Posterior probabilities of hybrid categories were calculated with NewHybrids v.2.0 [29]. All individuals classified as coastal, hybrid, and interior south from Additional file 1 were used. The individuals from the interior north were not included because they presented higher variation in the admixture proportions (Q values) among the estimated ancestral populations. One hundred randomly selected SNPs were used to estimate posterior probabilities with 50,000 sweeps for 3 replicates.

Genetic diversity analyses

Population inbreeding coefficients were obtained from a kinship matrix with the popkin package v.1.3.23 [30]. Trees delimited in a region with similar topography and climate conditions as well as similar latitude or longitude were considered an independent population. Geographic positions of populations on the map were estimated from the centroid of all individuals in each population. The natural distribution map of the two varieties was based on the Atlas of United States trees [31]. The VCFtools program v.0.1.16 was used to determine the heterozygosity and pairwise fixation index (Fst) values [32]. Heterozygosity values were calculated per individual with the observed number of homozygotes [O(HOM)] and the number of non-missing genotypes [N(NM)] using the formula: [N(NM) – O(HOM)]/N(NM). Violin plots were created to visualize the results using the ggplot2 v. 3.4.2 R package [33].

Isolation by distance and environment analyses

To evaluate the association between genetic and geographic distances, a Mantel test was performed using the gl.ibd function implemented in the dartR R package v. 1.0.2 with 999 permutations [34]. Isolation by environment and association between geographic and environmental distances analyses were performed with vegan (v. 2.6-4) and geosphere (v. 1.5–18) R packages. The Spearman correlation method, 9999 permutations, and SNPs without missing data were used for Mantel tests. Climatic data for the environmental variables (BIO1 and BIO12) were obtained from the WorldClim database (www.worldclim.org) with a resolution of 2.5 arcsec using the raster and sp R packages and the coordinates of the trees collected.

Detection of loci under selection

Identification of SNPs under selection was carried out with BayeScan (v. 2.0) and pcadapt (v. 4.3.3) packages [35, 36]. The parameters used for outliers’ detection with BayeScan were the following: 20 pilot runs of 5000 iterations, burn-in length of 50,000, thinning interval size of 100 and prior odds ratio of 100. SNPs were considered outliers based on a false discovery rate q-value threshold of 0.05. Pcadapt was used to detect signatures of selection using two principal components. A q-value threshold of 0.1 was used to choose outliers. SNPs were functionally annotated using KOBAS-i and the annotation file of the Douglas-fir genome v. 1.0 [37]. KOBAS-i was used with default parameters for Gene Ontology (GO) and KEGG pathways enrichment analyses. Climate data of 25 environmental variables were extracted from the ClimateNA application [38]. All climate data are based on annual averages for the years 1962–1990. The variables used in the analysis are related to monthly, seasonal, and annual temperature and precipitation measurements (Additional file 2). To identify SNPs involved in local adaptation to climate, the Bayenv2 program was used [39]. A covariance matrix was obtained from the average of 5 independent covariance matrices generated with different seed numbers and 100,000 iterations. Five independent runs were performed for the populations of Douglas-fir and 25 climate variables with different seed sizes and 500,000 iterations. Bayes factors (BF) were averaged over the independent runs. A BF value of 10 was considered as threshold for outlier loci.

Ecological niche modeling (ENM)

The ecological niche of Douglas-fir was modeled with the Wallace2 interactive web app [40], which incorporates the Maxent algorithm for estimating potential distribution using presence-only data [41, 42]. Additionally, ecological niche shifts were forecasted (i.e., potential distribution gains and losses in response to climate change) throughout the present century (years 2050 and 2070). Douglas-fir’s presence data was derived from the 540 geo-referenced trees retained after the SNP filtering procedure. From those, 236 duplicated coordinates were further removed. The geographic database was complemented with Douglas-fir site records (n = 44) reported by Wei et al. (2011), which allowed to fill some geographic gaps, especially for Mexican populations, increasing the overall presence data across the species range. The resulting 348 non-duplicated locations were then subdivided into three major regions (ancestral populations) according to PCA and admixture results: coastal (n = 250), interior-north (n = 41), interior-south (n = 48), and a hybrid zone (n = 9). Coastal region included northern California to British Columbia; interior-north encompassed Idaho and Wyoming up north to British Columbia, and interior-south comprised high elevations from Utah and Colorado to southern Mexico. Hybrid zone is located in British Columbia and Washington.

Nineteen bioclimatic variables derived from present-day averaged temperature and precipitation data available at WorldClim [43] were used as raster layers with a resolution of 2.5 arc minutes (≈ 5 km). Occurrence data was further processed to reduce sampling bias using a spatial thinning technique, where the minimum distance between occurrence locations (i.e., nearest neighbor distance) was 10 km, resulting in a thinned dataset of 263 localities. From those points, the spatial extent for niche model building and evaluation was determined, using a bounding box with a buffer distance of 1 geographic degree and 10,000 sample background points. The environmental space occupied by each of the regional Douglas-fir populations was characterized as an approximation to their Hutchinsonian niche, defined as “the n-dimensional hypervolume where a species can persist and reproduce in a mathematical space defined by non-depletable environmental gradients” [40, 44]. Wallace2 reduces the dimensionality of the predicted niche using three modules: (1) “environmental ordination”, which is basically a Principal Component Analysis, (2) “occurrence density grid”, depicting the portion of the environmental space that is more densely occupied by the species, given the availability of environmental conditions present within the background extent, and (3) “niche overlap”, quantified as an overall index (overlap D) [45], ranging from 0 to 1, where 0 represents a null overlap and 1 is a complete ecological overlap between populations. The environmental ordination module conducts a Principal Component Analysis to maximize the variation contained in the predictor variables, allowing the identification of collinear variables, and retaining a reduced set of highly explicative variables.

A spatial partition of the occurrence data for training and testing the predictive models, was performed using the “checkerboard 1” option, with k = 2, and aggregation factor 2. Then, niche models were built using the Maxent module. Maxent is a machine learning algorithm that models the potential distribution of a species in response to a set of environmental conditions (e.g., climate), which are constrained to be relatively uniform across the geographic space encompassed by the input locations [41, 46]. Increasingly complex models were obtained using linear, quadratic, hinge, and product feature classes, and their predictive performance was evaluated using a combination of metrics included in Wallace2 such as the Area Under the Curve (AUC), omission rate, Continuous Boyce Index (CBI), and Akaike Information Criterion (AIC). The best models per species and per variety were transferred to geographic space within the range of Douglas-fir in North America, and as well to future climate scenarios over the next 50 years. The threshold we used for deciding predicted presence (potential distribution) was the 10th percentile of training presence, which roughly corresponds to a 10% omission rate. Potential distribution gains and losses were calculated using the estimated area of present-day ecological niche extent as a baseline.

Results

Data collection and genotyping

Assessments of genetic diversity, population structure and natural hybridization of Douglas-fir were based on the genotypes of 577 trees (37 populations) collected throughout the distribution of the two varieties including the hybrid zone located in British Columbia and the Washington Cascades (Fig. 1; Additional file 1; Additional file 3). After filtering steps (SNP and sample call frequency rates and minor allele frequency), 540 trees and 11,320 SNPs representing 70.1% of the SNPs in the array were kept for further analyses. The total genotype rate was 96%.

Fig. 1
figure 1

Douglas-fir sampling localities. Circles represent each of the 37 sampled localities. Natural distribution range of the species (top right inset). The red dotted circle indicates the hybrid zone

Population structure and hybridization

Due to the broad geographic distribution of the two recognized varieties and the presence of contact zones, this study began by determining the population structure of the dataset. Principal component analysis with the SNPRelate package [25] distinguished individuals from coastal, hybrids, interior-north, and interior-south groups/varieties according to the samples’ geographic location (Fig. 2A). Principal component 1, which accounted for 16.7% of the variation in the dataset, separated interior populations from coastal populations and hybrids, suggesting coastal and interior-south are the most genetically differentiated. Genetic variation between coastal and interior-south groups was higher than variation within each of the groups. Individuals from the Mexican populations (TLX and PUE) clustered with interior-south individuals. Principal component 2, which accounted for 4.15% of the variation in the dataset, separated interior-north and hybrids from most interior-south and coastal individuals (Fig. 2A, B). Discriminant analysis of principal components (DAPC) grouped individuals into four optimal clusters (Fig. 2C). Individuals from interior-north and hybrids were clustered together, while interior-south individuals were separated into two clusters that distinguished trees from Arizona (ARD, SPR, OAC, CAN), New Mexico (SVC, SAM, RIT), Puebla (PUE), and Tlaxcala (TLX), from those in Utah (BCD) and Colorado (HOR; Fig. 2D).

Fig. 2
figure 2

Population genetic structure of Douglas-fir. (A) Principal component analysis of coastal (N = 325) and interior (n = 188) varieties, including hybrids (n = 27), using 11,320 SNPs. (B) Distribution map of populations within PCA clusters. (C) DAPC for Douglas-fir populations. Black lines denote genomic distance between clusters. (D) Distribution map of clusters and populations determined by the DAPC analysis. Natural distribution range of the species (top right inset). The red dotted circle indicates the hybrid zone

Further analysis of the structure of the population with ADMIXTURE resulted in four ancestral populations (best K) and different levels of hybridization across the geographic distribution of the species (Fig. 3; Additional file 4; [27]). When considering two ancestral populations, the two varieties are mostly separated, and higher levels of admixture are observed in individuals of the interior variety from the north and in individuals from the hybrid zone (Fig. 3A). Although DAPC analyses clustered interior north and hybrids into one cluster, admixture proportions (Q values) revealed differences between the two groups (Fig. 3). For K = 3, the interior variety is separated into north and south, with an admixture zone appearing between them (Fig. 3B). On the other hand, a greater contribution in the levels of admixture by the ancestral population corresponding to the individuals of the interior variety from the north is observed in the hybrids (Fig. 3B). For K = 4, each variety was separated into two north to south ancestral populations, with regions of hybridization between them (Fig. 3C). Coastal Douglas-fir from British Columbia (BLU, TSO, PEM, MAN, and DEV) were differentiated from coastal individuals from the United States (COW, CNO, CAS, CONO, CNW, CSW, and COSO). Interior individuals from Alberta (BAN), Idaho (IDF, BOI), and Montana (MON, HEF, BIA) clustered together, whereas a different genetic cluster was composed by interior individuals in the South (BCD, PRC, HOR, ARD, SPR, OAC, CAN, SVC, SAM, RIT, PUE, and TLX). The most diverse events of admixture occurred among individuals located in the contact zones in British Columbia and the Washington Cascades (HOC, QUE, and CNW; Fig. 3C). The prevailing admixture component in these individuals was from the interior north variety, suggesting asymmetric introgression. The second most important admixture contribution in the hybrids corresponded to the northern ancestral population of the coastal variety in British Columbia, which was coincident with the geographic position of the contact zone. Coastal and interior-north individuals located near the contact zones also presented low levels of admixture among their populations (Fig. 3C). Genetic structuring of trees varied among methods. Coastal, hybrids, interior-north, and interior-south groups showed clustering variation in the PCA analysis, while in the DAPC analysis the hybrid and interior-north groups are clustered together and certain populations of the interior-south group appear largely differentiated from the rest of the populations. Nonetheless, in the admixture analysis the coastal, hybrid, interior-north, and interior-south groups presented particular admixture proportions for each of the K values evaluated (Fig. 3).

Fig. 3
figure 3

Admixture analysis of Douglas-fir varieties. Clustering of individuals and geographic distribution of admixture proportions obtained with the Admixture software for (A) K2, (B) K3, and (C) K4. Natural distribution range of the species (top right inset). The red dotted circle indicates the hybrid zone

Hybrid classes

To get further insights into the attributes of the populations and especially the hybrids of the species, the NewHybrids program was used [29]. This program estimates the posterior probability of each individual of the population falling into different hybrid classes. All 325 coastal individuals, all 127 individuals from the interior-south group, and all 27 hybrids were used as input. The six genotype groups chosen were: pure interior parent, pure coastal parent, F1 hybrid, F2 hybrid, backcross with pure interior parent, and backcross with pure coastal parent. Almost all coastal and interior-south individuals fell into the pure coastal parent and pure interior parent categories, respectively (Fig. 4). The majority (87%) of hybrids were classified as F2s, and no individual was predicted to be a first-generation (F1) hybrid.

Fig. 4
figure 4

NewHybrids analysis of Douglas-fir trees. Bar plot of posterior probabilities of category membership for each Douglas-fir individual. Categories are divided as: pure interior parent (Pure 1), pure coastal parent (Pure 0), F1 hybrid (F1), F2 hybrid (F2), backcross with pure interior parent (0 Bx) and backcross with pure coastal parent (1 Bx)

Genetic diversity, population differentiation, and isolation by distance and environment

Inbreeding, heterozygosity, population differentiation (pairwise Fst) and isolation by distance were estimated from the dataset in this study. The population inbreeding coefficients ranged from 0.08 to 0.92. The highest inbreeding coefficients were found in Mexican populations and southern populations of the interior variety (Fig. 5); while the lowest inbreeding coefficients corresponded to coastal and hybrid populations, most of them located in British Columbia (Fig. 5). Coincident with inbreeding coefficient values, the lowest heterozygosity values corresponded to the Mexican and interior-south populations (µ = 0.09) and the highest values to the coastal and hybrids populations (µ = 0.24 and µ = 0.23, respectively; Fig. 6A). The northern interior populations presented an intermediate mean of 0.17. The two populations with the lowest heterozygosity values corresponded to the populations of the interior variety from Mexico (Puebla and Tlaxcala), while the two populations with the highest values corresponded to coastal populations located ​in British Columbia (Additional file 5). Higher genetic population differentiation based on pairwise Fst values was observed between populations of the different genetic groups than within their own populations (Fig. 6B; Additional file 6). The widest range in Fst values was found when comparing populations of the interior-south genetic group. The coastal and the interior-south populations were among the most genetically differentiated ones. Comparisons involving hybrids showed that these populations were more genetically similar to the interior-north populations, and more genetically distant from the interior-south populations (Fig. 6B). Aiming to examine if nearby populations were more genetically similar, a Mantel test was conducted to explore the correlation between geographic and genetic distances. The Mantel test showed a significant strong positive correlation between genetic and geographic distance among all the populations of the species (Fig. 6C; r = 0.74 and p = 0.001). Isolation by environment analysis also showed positive correlations between genetic distance and annual mean temperature (Fig. 6D; r = 0.35 and p = 0.0001) and precipitation variables (Fig. 6E; r = 0.11 and p = 0.0001). A significant positive relationship between geographic distance and temperature (r = 0.219 and p = 0.0001) and precipitation (r = 0.213 and p = 0.0001) variables was observed as well (Additional file 7).

Fig. 5
figure 5

Inbreeding coefficient analysis of Douglas-fir populations. Each circle represents one of the 37 collected populations. Circle size is proportional to the average inbreeding coefficient. Natural distribution range of the species (top right inset). The red dotted circle indicates the hybrid zone

Fig. 6
figure 6

Heterozygosity, Fst and Mantel analyses Violin plots of heterozygosity (A) and pairwise Fst (B) between coastal, interior-south, interior-north and hybrid groups. (C, D, E) Mantel test analysis of the correlations between geographic, genetic and environmental distances of Douglas-fir populations. Statistical analysis for group comparisons was performed using the SPSS software (version 29.0) through one-way ANOVA followed by Tukey’s test (p ≤ 0.05). Mean values sharing the same letter were not significantly different

Signatures of selection

Douglas-fir varieties differ in adaptive traits such as drought tolerance, frost hardiness and bud phenology, which allow populations to differentially respond to contrasting environmental conditions [47,48,49]. Signatures of selection in loci can reveal evolutionary processes of adaptive differentiation. To detect loci under selection in the 37 Douglas-fir populations, we used BayeScan and pcadapt tools [35, 36]. In total, 390 SNPs were identified as outliers by both tools, which matched 300 predicted Douglas-fir genes (Additional file 8). Enrichment analysis based on GO and KEGG pathway terms identified metabolic process, cellular process, and primary metabolism among the most highly represented biological categories. DNA binding, catalytic activity and transcription regulation activity were the top three represented categories in the molecular function classification. For KEEG categories, the most highly represented pathways were phenylalanine metabolism, methane metabolism, and phenylpropanoid biosynthesis.

Signatures of local adaptation can be elucidated by analyzing markers that correlate with climatic data [39]. To evaluate (putative) signatures of local adaptation in Douglas-fir populations, 25 climatic variables related to temperature and precipitation were extracted from ClimateNA database [38]. Bayes factors for all pure and hybrid populations identified 971 unique SNPs associated with the 25 climatic variables. These 971 SNPs were associated with 754 unique predicted genes (Additional file 8). The three most represented KEGG categories for these genes were metabolic metabolism, phenylpropanoid biosynthesis and phenylalanine metabolism. The most represented GO categories for molecular function were catalytic activity, binding, and protein binding. Response to stimulus, response to chemical compounds and metabolic processes were the top three biological processes. The number of SNPs that were associated with environmental variables and under selection were 64 (Fig. 7A). These SNPs were assigned to 36 annotated genes (Table 2). In addition, we evaluated associations between genotypes and environments in each variety separately using only the coastal or interior populations (Additional files 910). We detected 976 unique SNPs associated with environmental variables in either the interior (596 SNPs) or the coastal populations (380 SNPs). Only 18 SNPs were common between the two varieties (Fig. 7B). The top represented GO categories of the genes corresponding to coastal populations for biological processes, cellular component and molecular function were response to stimulus, cell-part, and catalytic activity, respectively. In the case of the interior populations, the top represented GO categories of the genes associated to environmental variables for the biological processes, cellular component and molecular function categories were response to stimulus, cell, and catalytic activity, respectively.

Fig. 7
figure 7

Venn diagrams of the number of outlier SNPs and SNPs-associated with environmental variables. (A) Venn diagram showing overlap between the SNPs detected by each method using all populations of Douglas-fir. (B) Venn diagram showing overlap between the SNPs detected by Bayenv among coastal and interior populations

Table 2 Genes associated with environmental variables and detected as outliers

Environmental niche modeling

The characterization of the environmental space occupied by Douglas-fir populations was based on a pairwise comparison that allowed us to identify the climate variables driving a niche differentiation between clusters. We identified 11 highly correlated variables and 8 variables that showed a high predictive power as indicated by the explained variance on the PCA’s first two components. Accordingly, the most explicative set of variables were slightly different for each of the pairwise comparisons (listed in Additional file 11). The environmental ordination analysis identified that precipitation of the coldest quarter (BIO19), temperature annual range (BIO7) and maximum temperature of warmest month (BIO5) segregated coastal from interior north populations. Similarly, minimum temperature of the coldest month (BIO6), mean temperature of the coldest quarter (BIO11), and precipitation of the coldest quarter separated coastal from hybrid populations. The rest of the relationships are shown in Additional file 12.

The highest bioclimatic niche overlap occurred between interior north–hybrid populations (D = 0.26, p = 0.05), followed by interior north–interior south (D = 0.14, p = 0.06) and coastal–interior south (D = 0.1, p = 0.08). Interestingly, the smallest overlap was found between coastal and both interior north and hybrid populations (D = 0.02 and 0.06, respectively), despite coastal being geographically contiguous to both (Additional file 13, Additional file 14A-F). Occurrence density grids, depicting a pairwise comparison of the portion of the environmental space that is more densely occupied by each population, further clarify niche overlap patterns (Additional file 15A-F).

Projections of potential distribution

Spatially explicit projections of niche models allow us to discern the habitat suitability across the Douglas-fir range (Fig. 8A). For the coastal variety, a large tract of continuous suitable habitat occurs along the Pacific Northwest, especially in Washington, extreme southern British Columbia and Vancouver Island, and Northern California east of the Great Valley. As for interior north populations highest suitability is predicted for most of Idaho (except for the Snake River plain), southeastern British Columbia, central Montana, and eastern portions of Washington and Oregon. Interior south suitability is scattered across the mountain ranges of Colorado, Utah, New Mexico, and Arizona, as well as high elevations in Mexico (Fig. 8A).

Fig. 8
figure 8

Douglas-fir ecological niche suitability across western North America, with the inset showing eastern Mexico. (A) Present-day potential niche. (B) Potential niche at RCP8.5 scenario at year 2070

Projections of Douglas-fir potential distribution over the next 50 years showed a leading- and trailing-edge pattern. On one hand, northernmost populations (i.e., coastal, and interior north) are expected to experience a poleward shift with a concomitant overall gain in potential distribution. On the other hand, interior south, and especially Mexican populations are expected to experience severe losses of suitable habitat, particularly at lower elevations, despite some upslope advances on the highest peaks. Depending on any given Representative Concentration Pathway (RCP) scenario between 2050 and 2070, the overall distribution of Douglas-fir in North America could increase by 25–42% (Table 3). Nonetheless, on a per-population basis, gains and losses are exacerbated. Under the RCP2.6 scenario in 2070, the coastal population is projected to experience at least a one-third increase in suitable conditions, while in the event of an RCP8.5 scenario during that decade, the increase may exceed a doubling (Table 3). On the contrary, interior south populations are expected to endure a drastic reduction of the contemporary suitable habitat, with losses reaching 65–78% in the worst-case scenarios (Table 3). Changes in suitability for the interior north populations are predicted to be mostly gains although not as pronounced as for those in the Pacific Northwest (Fig. 8B). Finally, we consider that small sample size for the hybrid population precluded a robust estimation of their niche shifts across time.

Table 3 Douglas-fir potential distribution predicted shifts over the next 50 years in North America in relation to different representative concentration pathways (RCP)

Discussion

Role of selection and gene flow maintaining population structure in the species

The environmental adaptation of long-lived trees is crucial for forest survival. The distribution of the species from the genus Pseudotsuga is discontinuous, but wide. They are present in North America, Mexico, and Asia. Douglas-fir, the only species from the Pinaceae family with 26 chromosomes, has one of the broadest ranges of any conifer from North America. It has been introduced into temperate regions since the mid-19 century. There are natural populations of both varieties of Douglas-fir that have high levels of genetic diversity with potential to confer resilience to varying climates. In this study, we used a custom-designed SNP array to study the nuclear genome variation, its relation to space and the local adaptation of several natural populations of Douglas-fir.

The SNPs found in the present study revealed clear genetic structure between the two varieties as previously reported [13, 15,16,17,18,19]. The best K-value for the number of ancestral populations in our admixture analysis was four. The number of ancestral populations previously reported with different genetic markers and different methods is variable; however different ancestral populations have been reported within each variety that distinguishes north and south individuals, which indicates that the structure is geographically shaped. Indeed, we observed that Fst values in pairwise comparisons were higher between distant populations of the two varieties, suggesting reduced gene flow in these populations. Strong correlation of patterns of population genetic variation that derive from spatially limited gene flow were also detected through the isolation by distance analysis. Geographic distance was significantly correlated with genetic distance, a stronger correlation was even detected than with environmental variables; however, we also found significant correlation between geographic distance and environmental distances, highlighting spatial and environmental heterogeneous selection in this species.

Inter-varietal hybridization and introgression

Natural hybridization between the interior and coastal varieties was found in the contact zones in British Columbia and the Washington Cascades. Hybrids in both locations present a higher ancestry from the interior north variety, suggesting asymmetric introgression from the interior to the coastal variety [13, 15]. Mountain ranges are often a physical barrier to gene flow in widely dispersed species. Conifers, being wind pollinated species, may be susceptible to elevational differences between populations [50]. Douglas-fir grows from sea level to over 3000 m of elevation with northern interior populations growing at the highest altitude [8, 51]. Hybrids grow at intermediate elevations between northern coastal and northern interior populations which may explain the higher ancestry of the interior north variety we observed as gene flow from the coastal populations could be limited due to the elevational gradient [52]. Previous studies have identified asymmetric introgression in hybrids between Pinus strobiformis and Pinus flexilis where cold-resilient genes from P. flexilis were favored and maintained in hybrid populations [53]. A similar pattern may exist in our study system where asymmetric introgression from interior Douglas-fir confers a fitness advantage in natural populations of hybrids [13, 15]. Previous work in Douglas-fir suggests that adaptive introgression from the interior variety may have resulted in natural populations of hybrids with increased water-use efficiency (WUE) and heat tolerance [8].

Our results also suggest that certain regions of the hybrid zone may not be of recent origin (as inferred by the lack of F1 hybrids). All hybrids found in this study were advanced-generation hybrids, probably resulting from the cross of two hybrid parents. This suggests that the hybrid zones could be maintained by some level of hybrid fitness, in which hybrids are able to mate, and survive at higher rates than pure varieties in novel environments. Hybrid individuals may be able to occupy unique novel environments that differ from the environment that the parents experienced. This may occur because of hybridization mechanisms such as transgressive segregation which generates unique packages of alleles and can generate extreme phenotypes that exceed that of the pure varieties [54]. Future research to evaluate transgressive segregation in hybrids of Douglas-fir may shed light on the cause of the establishment of these populations in harsh conditions such as the Washington Cascades and mountains in British Columbia.

Candidate loci for local adaptation

Understanding the genomic basis of local adaptation in long-lived trees is crucial to predict their responses to the oncoming effects of climate change. Local adaptation emerges across heterogeneous environments, which as previously shown is the case with Douglas-fir due to its wide distribution. With the use of the Bayenv mixed model we found inter- and intra-varietal differences in the frequency of gene loci that indicate local adaptation in Douglas-fir. The functions of several of the genes associated with environmental variables and detected as outliers are associated with abiotic stress responses in Arabidopsis thaliana and are therefore probably relevant to local adaptation and with similar function in Douglas-fir. For example, we found that some genes (ECT2, SLO2, and FAB1A) that were associated with climatic variables respond or are related to abscisic acid (ABA), which is crucial for abiotic stress responses (Table 2). The ECT2 (EVOLUTIONARILY CONSERVED C-TERMINAL REGION 2) gene encodes a YTH domain-containing reader protein that regulates transcriptional and post-transcriptional gene expression through recognition of m6A modifications. ECT2, ECT3 and ECT4 have genetically redundant functions in ABA response regulation and their disruption destabilizes mRNAs of ABA signaling related genes resulting in ABA hypersensitivity [55]. The SLO2 gene encodes a pentatricopeptide repeat protein that functions as a mitochondrial RNA editing factor. Disruption of SLO2 function results in ABA hypersensitivity, insensitivity to ethylene and increased drought and salt tolerance [56]. The FAB1A gene encodes a 1-phosphatidylinositol-3-phosphate 5-kinase. Null mutants of phosphatidylinositol 3-phosphate 5-kinases in Arabidopsis presented delayed stomatal closure during ABA treatment and increased water loss [57]. Other genes detected as outliers respond to light stimuli (TEC3, BAH and TFIIS domain-containing protein, and DFL2), oxidative stress (PRXR1), and temperature (C2H2-like zinc finger protein). The transcription factor WRKY9 regulates salt tolerance and LAG13 is involved in hypoxia tolerance [58, 59].

As probably expected, due to its wide distribution, we also detected signals of local adaptation within populations of each variety in Douglas-fir. Most of the SNPs associated to environmental variables within each variety were not shared, which denotes differentiation and genetic diversity among them; however, the top represented GO term categories in each variety were similar, suggesting that genes associated with environmental variables in both varieties play related roles. Only 18 SNPs were shared between the two varieties. Within the genes that contain shared SNPs, we found genes relevant to the stress responses in plants. For example, one of these genes is the previously mentioned SLO2 gene that encodes a pentatricopeptide repeat protein [56]. Also, within this group of genes, we found the gama tonoplast intrinsic protein 2 (TIP2) gene, which is involved in ABA and salinity stress responses [60]. The D6 protein kinase like 2 and RAD1, involved in phototropism and resistance to UV radiation, respectively were detected as well [61, 62]. An AP2/ERF transcription factor, member of the ethylene signaling pathway, that confers resistance to heat, and hydrogen peroxide stresses was also detected with shared SNPs [63].

Future distribution of the species under climate change

A few studies have assessed the ecological niche of Douglas-fir, among other conifers, and the potential shifts in distribution in response to future scenarios of climate change [64,65,66]. In general, the evidence we found in our study supports the general findings of those investigations, which can be summarized in two major aspects. First, the ample separation regarding the environmental space occupied by the two main Douglas-fir varieties (i.e., coastal, inland), which in turn is reflected in a geographic clustering on a regional scale, coherent with their genetic structuring; and second, the contrasting impacts of climate change over the next decades on the distribution of the distinct populations. As global climate gets increasingly warmer, leading-edge populations of temperate tree species such as Douglas-fir are expected to migrate poleward [67,68,69], either to track suitable conditions or to colonize new available areas, while trailing-edge populations will be forced to engage in upslope migrations, which nonetheless will be greatly constrained by local geomorphological features. As a result, over the next 50 years, the Douglas-fir suitable habitat is expected to increase at least 25% in relation to their current extent. The potential gains are even larger when considering the extreme warming that could take place in an RCP8.5 scenario. Nevertheless, the considered timespan (5 decades) is certainly small considering the longevity and generation time of the Douglas-fir. Beyond that period, erratic oscillations in global climate patterns could disrupt any temporary equilibrium in ecological communities, including temperate coniferous forests.

Conclusions

Our range-wide population structure and genetic diversity analyses of Douglas-fir varieties indicate high genetic variation in their populations with clear structured populations and differential adaptive potential to different climate scenarios as result of spatially heterogenous selection and dissimilar evolutionary histories. The genetic variation associated with climate data found here helps our understanding of the molecular mechanisms controlling long-living trees adaptation. The future distributions predicted in this study revealed different evolutionary paths for Douglas-fir varieties, which constitutes valuable information regarding the conservation and management of the species.