Association genetics of phenolic needle compounds in Norway spruce with variable susceptibility to needle bladder rust

Key message Accumulation of phenolic needle metabolites in Norway spruce is regulated by many genes with small and additive effects and is correlated with the susceptibility against fungal attack. Abstract Norway spruce accumulates high foliar concentrations of secondary phenolic metabolites, with important functions for pathogen defence responses. However, the molecular genetic basis underlying the quantitative variation of phenolic compounds and their role in enhanced resistance of spruce to infection by needle bladder rust are unknown. To address these questions, a set of 1035 genome-wide single nucleotide polymorphisms (SNPs) was associated to the quantitative variation of four simple phenylpropanoids, eight stilbenes, nine flavonoids, six related arithmetic parameters and the susceptibility to infection by Chrysomyxa rhododendri in an unstructured natural population of Norway spruce. Thirty-one significant genetic associations for the flavonoids gallocatechin, kaempferol 3-glucoside and quercetin 3-glucoside and the stilbenes resveratrol, piceatannol, astringin and isorhapontin were discovered, explaining 22–59% of phenotypic variation, and indicating a regulation of phenolic accumulation by many genes with small and additive effects. The phenolics profile differed between trees with high and low susceptibility to the fungus, underlining the importance of phenolic compounds in the defence mechanisms of Norway spruce to C. rhododendri. Results highlight the utility of association studies in non-model tree species and may enable marker-assisted selection of Norway spruce adapted to severe pathogen attack. Electronic supplementary material The online version of this article (doi:10.1007/s11103-017-0589-5) contains supplementary material, which is available to authorized users.


Introduction
The chemical composition of tissues plays important roles in the defence of plants against herbivory and pathogens. Of the defence-related plant secondary metabolites, phenolic compounds are particularly important for quantitative resistance to fungal pathogens, which are the most important group of tree pathogens (Hammerschmidt 2005;Witzell and Martín 2008). Phenolic compounds comprise a structurally and functionally diverse group of metabolites characterised by aromatic hydrocarbon ring(s) and usually at least one attached hydroxyl group. Phenolics can have fungicide or antioxidant properties, can be involved in resistance mechanisms as precursors of defence-related compounds or polymers, and can modulate the activity of other phytochemicals (Schultz and Nicolas 2000;Treutter 2006). In addition, they can be incorporated into the cell wall and form mechanical barriers (Cvikrová et al. 2008). Phenolic metabolites are considered an important part of both constitutive as well as inducible defence mechanisms (Chong et al. 2009).
The fungus Chrysomyxa rhododendri (DC.) de Bary (De Bary 1879;Gäumann 1959;Ganthaler et al. 2014) is a frequent and serious pathogen affecting trees in large areas of the European Alps. The rust fungus undergoes a host shift between rhododendrons and Norway spruce and infects the current-year needles in the first weeks after flushing, causing an intensive yellow discoloration after 3-4 weeks and defoliation at the end of summer. In the past decade, following a long stable period, infection intensity and affected forest areas increased significantly (Ganthaler et al. 2014), and are thought to be promoted by the expansion of the telial host rhododendron, and by global warming and more favourable conditions for the pathogen (Ganthaler and Mayr 2015). In the investigation area Tyrol, more than 20,000 ha of spruce forest were repeatedly infected since 2009 (Fuchs et al. 2016). Infections cause anatomical, morphological and physiological modifications of attacked trees, leading to reduced timber yield and notable problems with natural regeneration and afforestation (Ganthaler et al. 2014). Dry mass accumulation in 3-year-old seedlings, for example, was reduced by 58% when infected in two consecutive years (Plattner et al. 1999). These problems are serious, as Norway spruce is a widespread and socioeconomically and ecologically important tree species in European subalpine forests, with important protective function. However, high variation of susceptibility of Norway spruce to C. rhododendri infection was repeatedly reported (Dufrénoy 1932;Oechslin 1933;Mayr et al. 2010) and interestingly, even in years with severe outbreaks individual trees with distinctly lower degree of infection than the surrounding trees were observed, suggesting that they have enhanced pathogen resistance. Importantly, this lower susceptibility clearly benefits the trees, which have higher net photosynthesis, lower cuticular conductance, and higher growth rates compared to highly infected trees (Mayr et al. 2001(Mayr et al. , 2010. The underlying resistance mechanisms, including metabolic background and genetic control, are not understood, but there is evidence that phenolic secondary compounds may limit the growth of rust fungi immediately after infection and prevent the development of infection symptoms (Hakulinen et al. 1999;Hjältén et al. 2007). Besides a direct fungicidal effect, phenolics may be incorporated in the plant cell wall and influence the interaction with the biotrophic fungus (Matern and Kneusel 1988).
Variation of plant secondary metabolites depends upon both, genetic and environmental factors (Hamilton et al. 2001;Andrew et al. 2007;Külheim et al. 2011). However, several investigations have demonstrated considerable heritability for constitutive and induced phenolic concentrations (Witzell and Martín 2008) and suggested a regulation by allelic variants of multiple genes, but few studies have addressed the molecular basis of quantitative differences (Keeling et al. 2008;Chan et al. 2010). Association mapping is a powerful tool to identify marker-trait associations (MTAs) in model as well as non-model tree species (Neale and Savolainen 2004;Budde et al. 2014). Natural populations adapted to extreme environments, like the alpine timberline, are ideal for the identification of ecologically relevant genetic variation. In addition, coniferous forest trees exhibit several important advantages such as showing high levels of genetic diversity, random mating and large populations, which lead to low inbreeding, highly efficient gene flow, low population structure and rapid decay of linkage disequilibrium (Neale and Savolainen 2004). In coniferous forest trees, associations have been reported for several phenotypic traits, including wood properties (González-Martínez et al. 2007;Dillon et al. 2010;Beaulieu et al. 2011;Westbrook et al. 2013), growth and wood chemistry (Lepoittevin et al. 2012), serotiny (Budde et al. 2014), carbon isotope discrimination (González-Martínez et al. 2008;Cumbie et al. 2011), cold hardiness and bud set timing (Eckert et al. 2009;Holliday et al. 2010), but rarely for cellular phenotypes such as metabolite concentrations (Eckert et al. 2012) or disease resistance (Quesada et al. 2010). For Norway spruce, only studies focusing on chlorophyll fluorescence, frost resistance, height, diameter, bud burst (Romsakova et al. 2012) and bud set (Chen et al. 2012a) are available. However, the genetic basis of variation in metabolite concentrations and its relation to foliar pest infections is largely unknown.
The present study was based on an initiative of forest authorities to identify trees with distinct lower degree of C. rhododendri infection and should provide deeper insights into the relationship between genetic background, phenolic composition and infection. Therefore, a genome-wide SNP array set with 3257 polymorphic SNPs was used to conduct association analysis of Norway spruce phenolic needle compounds and susceptibility to needle bladder rust in an unstructured population of 63 trees from different provenances in Tyrol, Austria. To the best of our knowledge, this is the first study reporting genetic markers for quantitative variation of stilbenes and flavonoids in Norway spruce. This work may help identifying favourable alleles for marker-assisted selection of Norway spruce adapted to severe pathogen attack.

Association population and sampling
Sixty-three Norway spruce trees from different locations in Tyrol, Austria were used (Supplemental Table S1). The individuals were identified by the Tyrolean Forest Department in cooperation with the Institute of Botany of the University of Innsbruck in order to have an association population with a high variation in the susceptibility to C. rhododendri, ranging from heavily damaged to nearly unaffected trees. Number of trees was limited by the complex logistic planning of contemporaneous sampling in impassable subalpine terrain. Trees were between 25 and 120 years old and all were located next to the alpine timberline between 1401 and 1814 m above sea level. From each of these trees, ten twigs of the five uppermost whorls were harvested in April 2013 and transported to the laboratory. The upper crowns were not shielded by neighbour trees from airborne spores and thus observed infection degree was expected to reflect the trees' susceptibility. Previousyear fully developed and healthy needles were cut randomly from the twigs and stored immediately at 80 °C for metabolic and genetic analyses.

Assessment of the infection degree
Chrysomyxa rhododendri infection degrees for all trees were determined by assessing the percentage of needle loss due to infection on sampled twigs on a scale of 1-5 (1: 0-20%, 2: 21-40%, 3: 41-60%, 4: 61-80%, 5: 81-100% needle loss) for the 4 years -2012(compare Oberhuber et al. 1999. Mean infection values for the individual trees were calculated and used for association analysis. In addition, the binary trait low/high susceptibility was applied: trees with at least 20% lower infection degree over the four analysed years compared to the immediately surrounding trees in the forest stand were defined as showing 'low susceptibility', trees with continuous high infection intensities representative for the infection degree in the study area as showing 'high susceptibility' (compare Table S1). Relative assessment of susceptibility in the forest stand and monitoring of infection degrees over several years in this context is important, as the percentage of infected needles is influenced by local spore densities and weather conditions during the infection period (Ganthaler and Mayr 2015). From each forest district involved in the study, at least one tree with high and one with low susceptibility were included.

Identification and quantification of phenolic compounds
Healthy 1-year-old needles of the year 2012 were freezedried for 72 h and homogenized for 8 min at 2000 rpm in a microdismembrator (Mikro-Dismembrator U, Braun Biotech International, Melsungen, Germany) using 7 ml Teflon grinding capsules and one agate ball of 1 cm diameter. To avoid warming of the sample, capsules containing the needles were submerged in liquid nitrogen for 2 min before grinding, and to avoid sublimation of water on the powder, the capsules were allowed to equilibrate with room temperature in a desiccator over silica gel before opening (Bailly and Kranner 2011). The powder was transferred into Eppendorf vials and 10 mg was extracted two times for 20 min each at 50 °C on a thermo mixer with 600 rpm, with 1 ml 95% (v/v) ethanol, containing 2 µMol l −1 orientin, pinosylvin and naringin as internal standards for quantification, followed by a centrifugation for 10 min at 12,000×g. The supernatants were merged and diluted 1:2 and 1:50 with ethanol and water to obtain a 50:50 ethanol/water (v/v %) extract and to consider the different concentration ranges of the metabolites in the extract. Twenty-one phenolic compounds (Table 1) were identified and quantified by liquid chromatography-mass spectrometry (LC-MS), using an ekspert ultraLC 100 UHPLC system coupled with a QTRAP 4500 mass spectrometer (both from AB SCIEX, Framingham, MA, USA). Individual metabolites were detected and quantified using calibration curves of authentic standards. For compounds separation, a reversed-phase UHPLC column (NUCLEODUR C18 Pyramid, EC 50/2, 50 × 2 mm, 1.8 µm, Macherey-Nagel, Düren, Germany) with a 4 × 2 mm guard column was used. Run time was set to 8 min and mobile phases were 0.1% formic acid (v/v) (A) and acetonitrile (B), starting with 5% B followed by a gradient to 70% B (5 min), rinsing at 100% B (5:01 to 6 min) and equilibration at 5% B (6:30 to 8 min). The injection volume was set to 1 µl, the flow rate to 0.5 ml min −1 and column temperature to 30 °C. Compounds were detected by the mass spectrometer operated in negative ion mode using multiple reaction monitoring (MRM; Supplemental Table S2). Ion spray voltage was set to 4.5 kV, gas 1-40 psi and gas 2-50 psi at a temperature of 500 °C. Both quadrupole mass analysers were operated at unit resolution. Peaks were automatically detected based on retention time and MRM transition. Peak areas were normalized relative to the internal standards to account for variations during sample preparation and analysis, and concentration was calculated according to the compound-specific calibration curves established with authentic standards using the software Analyst and MultiQuant (AB SCIEX, Framingham, MA, USA).
Correlations were analysed using the Pearson (normally distributed data) or the Spearman Rank (not normally distributed data) correlation coefficient. Comparisons of the subgroups showing high and low susceptibility were performed, after testing normality with the Kolmogorov-Smirnov test, with t test (normally distributed data) or the Mann-Whitney U test (not normally distributed data). All tests were performed at a probability level of 5% using SPSS (version 21; SPSS, IL, USA). All values are given as mean ± SE.

SNP microarray design
A custom Illumina InfiniumHD iSelect BeadChip comprising 3257 SNPs (assays) was developed by merging SNPs from a number of different resequencing and genotyping projects. The majority of SNPs (1742) were originally identified in and designed for Picea glauca and later also tested on and found to be variable in a small number of P. abies individuals (Pavy et al. 2013). Additional 583 SNPs came from a mRNA sequencing of a single individual using Illumina technology (Chen et al. 2012b), and 311 SNPs were chosen from an mRNA sequencing approach (Heer et al. 2016). Further 228 SNPs stemmed from the sequencing of pooled PCR products using Illumina nextgeneration sequencing technology (Chen et al. 2016), 178 SNPs originally identified in Picea glauca were tested on P. abies using Illumina's Golden Gate technology (Chen et al. 2012a), and 141 SNPs came from Sanger resequencing efforts, which had been sequenced and analysed in Heuertz et al. (2006), Chen et al. (2010), and Källman et al. (2014). In addition, 57 SNPs derived from the CRSP project headed by David Neale (http://dendrome.ucdavis.edu/ NealeLab/crsp/), and 17 SNPs were designed based on loci available at Genbank from a population resequencing study of different conifer species (Guillet-Claude et al. 2004). A detailed list of the compiled assays is given in Supplemental Table S3.

SNP genotyping
InfiniumBeadChips were manufactured by Illumina in a 24 × 1 format. For each sample, genomic DNA was extracted from freeze-dried needle tissue using a CTAB protocol (van der Beek et al. 1992) with minor modifications made for the processing of 96-well deep well plates. DNA concentrations were quantified on a 0.8% agarose gel. At the IMGM Laboratories GmbH, SNP genotyping was conducted according to the manufacturer's recommendations and the microarray signals were detected on Illumina's iScan System. All SNP data analyses were conducted using GenomeStudio v. 2011.1 (Illumina).

Population structure and relatedness
Three hundred and fifty-six polymorphic neutral SNPs were used to investigate population stratification and relatedness between individuals as they can lead to false positive detection during association analysis. The selection of neutral SNPs followed a two-step procedure: first, only those SNPs located outside of genes or within gene introns were selected; second, these markers were filtered for a minimum call rate of 60 trees out of 63 and a minor allele frequency (MAF) above 0.2. Population stratification was first investigated with the Bayesian model-based software STRUCTURE (Falush et al. 2003) which is used to infer distinct populations and to assign individuals to the identified populations. The model allows admixture and correlated allele frequencies and was run with a burn-in period of 10 4 and 50 5 of Markov chain Monte Carlo replications after burn-in (run length). Ten independent runs (iterations) were conducted for each putative number of cluster K. Sampling location information was considered by applying the prior model parameter (LOCPRIOR) to the population model (Hubisz et al. 2009) and possible K's tested ranged from 1 to 15 (number of locations). Alternative scenarios without LOCPRIOR were also tested. For each scenario, the Structure Harvester (Earl and vonHoldt 2012) was used to estimate the most probable number of K's using Evanno's method (Evanno et al. 2005). Population stratification was also studied by principal component analysis (PCA) using TASSEL (Bradbury et al. 2007), where the correlation matrix of genotype data was applied as a basis for analysis. In order to have a general overview on the relatedness between individuals showing high and low susceptibility, a cladogram was built up using neighbour joining implemented in TASSEL. Archaeopteryx plugin was used to draw the tree.

Association test
Association analysis was performed using TASSEL (Bradbury et al. 2007). Markers with MAF less than 15% and more than 10% of missing data were excluded for the association test. A final set of 1035 SNP was selected to calculate MTAs using three models to evaluate the effects of population stratification: first, a model without correction (Generalized Linear Model: GLM), then models correcting for population stratification estimated by STRUC-TURE (Q) and by PCA (P). Due to multiple testing, p value threshold was corrected with the standard Bonferroni procedure (*0.1 < p = 9.66 × 10 −5 ; **0.05 < p = 4.83 × 10 −5 ; ***0.01 < p = 9.66 × 10 −6 ). The amount of variation explained by a SNP (Rsq_Marker) was calculated for each significant association using a simple general linear model. The q-value for each marker was calculated to adjust for the false discovery rate (Storey and Tibshirani 2003) using "qvalue" version 1.40.0 (Dabney, A. and Storey, J) with R (http://www.r-project.org/). Q-value threshold of 10% was used to declare significant associations. Q-Q plots were used to assess the number and magnitude of observed associations between SNPs and traits under study, compared to the association statistics expected under the null hypothesis of no association. This procedure resulted in −log10 p values that were ranked in the order from smallest to largest on the y-axis and plotted against the distribution that would be expected under the null hypothesis of no association on the x-axis. Deviations from the identity line suggest either that the assumed distribution is incorrect (population structure not included in the model) or that the sample contains values arising due to other manner, as most likely by true associations (Burton et al. 2007;Pearson and Manolio 2008).
Finally, to create an overall composite measure for all chemical traits, a correlation matrix of the chemical data was used as a basis for a PCA performed in TASSEL. Numerical imputation by computing mean of respective traits was used to fill missing values. In order to explore the linkage disequilibrium (LD) among markers, we calculated the correlations between alleles at two SNP loci r 2 within contigs using TASSEL.

Variation in infection degree and phenolic needle metabolites
Within the association population, C. rhododendri infection degrees during the four observed years varied from Norway spruce trees with single infected needles to individuals with a needle loss of more than 65%. On average, trees classified as highly susceptible exhibited twice as high infections as trees with low susceptibility (Table 2). No correlation of the degree of infection with tree age or elevation was found (data not shown).
All analysed phenolic compounds were detected in the needle samples, but seven showed concentrations below the quantification limit of about 0.02 µMol g 1 dry weight: chlorogenic acid, gallic acid, kaempferol, quercetin, quercitrin, naringenin and taxifolin. Consequently, these compounds were excluded from further analysis. The remaining eight stilbenes, four flavonoids and two simple phenylpropanoids showed large variation in concentration within the association population, apparent by the minimum and maximum values given in Table 2, and several trees showed extremely high levels of individual compounds. The most abundant metabolites were shikimic acid, picein, astringin and catechin (the complete phenotype dataset is given in Supplementary Table S7).

3
The concentrations of several compounds were highly correlated to each other (Table 3), most of them positively. The highest correlation coefficients were found within stilbenes, where the compounds astringin, piceid, isorhapontin, resveratrol and piceatannol were positively correlated to each other, and within the cis-and trans-forms and arithmetic parameters of each compounds. Similarly, the flavonoids were correlated to each other; kaempferol 3-glucoside and catechin in addition with shikimic acid.

Population structure
When considering tree sampling locations as prior in population structure analysis, the likelihood (LnP) of K decreased with increasing K without reaching a continuous plateau that would be expected in the presence of a genetic structure (Supplemental Figure S1a). The number of genetic groups was also investigated with Evanno's method, where the Delta K plot showed low values in all K tested (Fig. S1a). Bar plots demonstrated that all individuals are admixed and none of them was clearly assigned to one group. The alternative scenario, without considering sampling locations, revealed similar results (Fig. S1b), suggesting that the most probable K is one. This conclusion is reinforced by the PCA using SNP data (Supplemental Figure S2), where no group was clearly defined. Finally, the cladogram showed that individuals with high and low susceptibility are evenly distributed (Fig. 1).

Association analysis model choice
As STRUCTURE (Q) and PCA (P) did not show any structure in our population, models correcting for population stratification estimated by both methods are not presented and the GLM model without correction was selected for association analysis. Q-Q plots (Fig. 2) show the quality of the model for each single and multivariate trait. The GLM performed well for all traits, and no substantial deviations from the identity line were found, underlining our assumption of an unstructured population. Some markers with the lowest p values are lower than expected as a true association can be underneath.
Within each associated marker, chemical concentrations or ratios differed between allelic variants (Fig. 3).  1 Cladogram built with 356 SNP markers using neighbour joining as clustering method. Trees with high susceptibility (with asterisk) and trees with low susceptibility show no grouping No significant genetic marker for the degree of infection or high and low susceptibility was detected, but the scatterplots of gallocatechin and isorhapontin ratio vs infection degree with marginal boxplots of the associated marker genotypes (Fig. 4a) revealed that some specific genotypes seem to have both the lowest infection degree and chemical concentration (gallocatechin − TT GQ03011-J06.1.333 ) or ratio (isorhapontin ratio − GA PBB-PF00847-11-1 ). Both markers are widely distributed in the studied population (Fig. 4b).
Markers significantly associated were mainly located within exons (14 markers). Others (5 markers) belong to introns, downstream regions or gene-empty regions (Table 5). Variation in seven exon-located markers affects the amino acid sequence (nonsynonymous substitution; missense change) as they are located mainly in the 1st or 2nd position of a codon. Nevertheless, variation in the remaining seven exon-located markers does not affect the amino acid sequence (synonymous substitution) as they are located mainly in the 3rd position of a codon.
Within all contigs, linkage disequilibrium decreased rapidly with physical distance (Supplemental Figure S3). However, within individual genes and contigs, LD showed strong heterogeneity as shown in the Fig. S3b, c in agreement with previous studies (Pavy et al. 2012).

Discussion
This is one of the so far very few available studies identifying DNA sequence variants correlated with concentration levels of secondary metabolites and pathogen susceptibility in a coniferous forest tree species. The important preconditions for association mapping, a population that lacks genetic structure and large phenotypic variability, were fulfilled. A total of 31 trait associations were found, explaining 22-59% of phenotypic variation, and several markers were shared by different traits. The phenolics profile of healthy needles differed between trees showing high and low susceptibility to the fungus.  (1-6) show markers associated to several traits, superscript letters (A-D) show markers associated to a multivariate trait b Bonferroni corrected threshold for multiple test, ***p value <0.01; **p value <0.05; *p value <0.1; NS not statistically significant c Rsq_marker, total explained phenotypic variation d ******q-value <1e-04; *****q-value <0.001; ****q-value <0.01; ***q-value <0.025; **q-value <0.05; *q-value <0.  percentiles. Dots represent outliers. Dotted lines frame traits associated to the same marker. Asterisks indicate markers significantly associated below the q-value

Different phenolic composition of trees showing high and low susceptibility
Trees with low susceptibility to C. rhododendri showed statistically significant lower concentrations of the compounds gallocatechin and cis-piceid as well as lower cis/transratios of astringin, isorhapontin and piceid in the needles compared to highly susceptible trees (see Table 2). Higher levels of piceid were found also in Norway spruce susceptible to E. polonica (Brignolas et al. 1995) and in cultivars of grapevine susceptible to downy mildew compared to resistant plants (Pezet et al. 2004). The higher levels of phenolic compounds in trees challenged with the fungus could be based on an induced systemic accumulation of phenolics due to repeated infection, as found for example in spruce infected by Sirococcus conigenus (Bahnweg et al. 2000), E. polonica (Brignolas et al. 1995;Evensen et al. 2000;Krajnc et al. 2014) or Heterobasidion spp. (Danielsson et al. 2011). Contrariwise, trees showing low susceptibility may incorporate soluble phenolics into the cell wall to isolate the biotrophic fungus (Matern and Kneusel 1988;Fossdal et al. 2012) by preventing nutrient uptake by haustoria in the initial infection phase, thereby avoiding the development of disease symptoms. In addition, they may activate a rapid modification of phenolics by isomerization, de-glycosylation, methoxylation or oligomerization (Chong et al.     Direct toxicity of hydroxystilbenes can be related to the capacity of metabolites to disrupt cell membranes, nuclear and mitochondrial membranes in fungal germ tubes (Pezet and Pont 1990). Furthermore, compounds may confer resistance as a group by synergy effects (Wallis et al. 2008) and the variability of the phenolic composition in time and space may challenge the fungus, like similarly suggested for leaf-eating invertebrates (Edenius et al. 2012).

Extraordinary high concentrations of several compounds in individual trees
Several analysed trees showed extraordinary high levels of individual compounds, among them piceatannol (5.63 µMol g −1 ), resveratrol (0.64 µMol g −1 ) and quercetin 3-glucoside (1.59 µMol g −1 , compare also maxima values in Table 2). As phenolic secondary metabolites mediate interactions with several pathogenic and herbivory organisms, these extreme phenotypes could be candidate trees for future experiments and forestation strategies. Furthermore, due to the broad pharmacological effects and difficult chemical synthesis of these compounds (see e.g. Lin and Yan 2014) the use of spruce material for extraction and economic exploitation could be considered. For example, trans-resveratrol was produced by the tree GRI-O in extraordinarily high amounts (0.64 µMol g −1 dried needle tissue), compared to grapes (0.22-0.44 µMol g −1 ) and red wine (up to 0.07 µMol ml −1 ), which are considered major sources of resveratrol (Lekli et al. 2010).

Correlations between compound concentrations
High correlations between the concentrations of individual phenolic compounds (see Table 3) reflect similarities of the chemical structures and shared steps in the metabolic pathway (compare Hammerbacher et al. 2011;Laboratories 2013). These findings are also reflected by shared genetic markers of related compounds (see Table 4).

Population structure
In Europe, Norway spruce genetic resources exhibit differences mainly between the Baltico-Nordic and the Alpine domain, while genetic differentiation among populations within these domains was found to be low (Heuertz et al. 2006). In particular, this is true for the central and western alpine range of Norway spruce, which was found to originate from a single phylogeographic lineage mainly (Gugerli et al. 2001;Tollefsrud et al. 2008). Our present analysis with individuals from the central Tyrolean Alps confirms the low genetic structure as no stratification could be found for 356 putatively neutral SNPs. This high number of SNPs was used to avoid ascertainment bias that could induce cryptic population structure (Lachance and Tishkoff 2013). Also, accounting for the geographic location of the trees (see Fig. S1a) did not affect the association results, as shown for other studies with weak population structure signals (Porras-Hurtado et al. 2013). Therefore, the association study could be done without taking any structure into account and the risk for detection of false positives was low. Furthermore, cladogram and PCA underlined that groups of trees showing high and low susceptibility did not belong to genetically related groups and do not originate from one common ancestor or a certain provenance (see Fig. 1), but that lower susceptibility occurs broadly across various locations and populations. A similar model was already explored by Jourdan et al. (2015) in an unstructured population. McKown et al. (2014) found in a P. trichocarpa association population that in all cases the simple P or Q models were sufficient and no traits required the more stringent K, Q + K or P + K models. Thus, McKown et al. avoided overcorrection by more stringent models that control for stratification such as MLM (Allwright et al. 2016). This is very comparable to our case, were no structure could be detected by several methods.

Significant genetic markers
Thirty-one significant genetic associations for phenolic compounds were identified, yet none for the traits susceptibility or infection degree that were both based on the phenotypic assessment in the field (see Table 4). The latter may be partly due to the difficult assessment of field susceptibility, which might be affected by local spore densities and weather conditions (Ganthaler and Mayr 2015), thus obscuring more direct relations of phenotypes showing low susceptibility to the set of SNP markers tested. However, respective gallocatechin, genotypes TT GQ03011−J06.1.333 seemed to have both the lowest infection degree and chemical concentrations (see Fig. 4). Complex traits, including phytochemicals and quantitative disease resistance, are usually regulated by many genes with small and additive effects and associated markers span coding as well as non-coding portions of genes (Gonzalez-Martinez et al. 2007;Quesada et al. 2010;Eckert et al. 2012). Accordingly, several genetic markers for stilbene and flavonoid concentrations were found and some were associated with several compounds (compare Table 4), likely due to the complexity and interconnection of phenolic pathways (Vogt 2010). In order to capture these correlations and complexity, multivariate traits were constructed by PCA, such as successfully done for other complex traits in conifers (González-Martínez et al. 2007;Eckert et al. 2009). As expected, most of the markers associated with PC4 were also associated with its main loading factors kaempferol 3-glucoside and quercetin 3-glucoside. However, shikimic acid had a negative contribution, as this product is upstream of the main flavonoid synthetic pathway (Vogt 2010). Genetic control by many loci is connected with relatively small individual effects (phenotypic variation R 2 explained by markers), similar or partly higher compared to other conifer studies (Table 4; González- Martínez et al. 2008;Eckert et al. 2009Eckert et al. , 2012Beaulieu et al. 2011). There are indications that in conifers stilbene synthesis is based on multiple copies of the same genes that are under the control of different promoters and can be regulated in response to different internal and external factors (Hammerbacher et al. 2011). Therefore, analysis is complicated by the influence of both genetic and environmental factors on metabolites (Fiehn 2002).
Significant SNPs were located in coding as well as non-coding regions, with partly synonymous and nonsynonymous polymorphisms (see Table 5). Markers located in untranslated regions (introns, downstream regions or geneempty regions) should also be considered due to their possible impact on gene expression through differential transcription, siRNA targeting or mRNA stability (Webb et al. 2009). Therefore, most of the SNPs are not directly responsible for the phenotypic variation but probably in LD with the causative change. To better understand the molecular function of marker linked genes, we examined their P-FAM domains. Pleiotropy could explain the diversity of genes detected, as most genes of interest are putatively involved upstream of the defence pathways and could enhance, modify or disrupt chemical profiles (Porth et al. 2014).

Signal transduction
Marker PGLM2-1119 associated with quercetin 3-glucoside is located on the gene MA_79559g0010 with a GHKL domain (Gyrase, Hsp90, Histidine Kinase, MutL), an evolutionary conserved protein domain (Dutta and Inouye 2000) that represents the structurally related ATPase domains of histidine kinase, DNA gyrase B and HSP90. Protein kinases play a central role in signalling during pathogen recognition and the subsequent activation of plant defence mechanisms (Romeis 2001). Moreover, marker PabiesPRR1-240 associated with trans-resveratrol is located in MA_71728g0010, a gene that could be involved in the defence pathway as a response regulator.

Transcriptional regulation
Transcription factors (TFs) are key players in plant innate immunity and activation of defence pathways. Marker GQ0015-B03.1.189, associated with both flavonoids and stilbenes, is located in the gene MA_804816g0010 that encodes a protein with a leucine-rich repeat (LRR) domain, frequently involved in plant defence (Shanmugam 2005;Tameling et al. 2006). In our study a nonsynonymous substitution results in an amino acid change from S (small and polar) to Y (big and aromatic), potentially altering the phenotype. Seven related Arabidopsis thaliana disease resistance protein genes (AT1G33560; AT5G66900; AT4G33300; AT5G04720; AT5G47280; AT5G66910; AT5G66890; AT3G26470) make this candidate interesting. Marker PBB-PF00847-11-1, associated with isorhapontin ratio, is located in the coding sequence of MA_37766g0010, a putative member of the Apetala2/ Ethylene-responsive element-binding protein family (AP2/ EREBP). In A. thaliana, ethylene response factors (ERF) are directly responsible for the transcriptional regulation of several jasmonate/ethylene-responsive defence genes (Pré et al. 2008). In our study, GA PBB-PF00847-11-1 genotypes seem to have both the lowest percentage of infected needles and isorhapontin cis/trans-ratio (see Fig. 4). Although these allelic variants represent a synonymous mutation, transcription, splicing, mRNA transport, and translation could be affected, possibly altering the phenotype and rendering the synonymous mutation non-silent (Goymer 2007).

Modification of compounds
One of the most promising markers found is MA_57678g0010-2862-[G_T], located in an intron of a gene belonging to the cytochrome P450 superfamily, the largest enzymatic protein family in plants and key players in plant development and defence (Xu et al. 2015). They are involved in multiple metabolic pathways and are important for breeding and biotechnology due to their capacity to modify and activate diverse secondary metabolites with ecological and pharmacological properties, including most terpenes, flavonoids and alkaloids (Villa-Ruano et al. 2015). In our study, trans-resveratrol and trans-piceatannol production were associated with this gene.

Conclusions
Association mapping of forest trees in their natural environment enables a deeper understanding of genetic adaptation, despite the complex genetic architecture of most analysed traits. This has potentially important implications for plant material selection, for example for highalpine afforestations and climate change adaptation strategies. Considering the broad ecological function of these compounds (compare Levin 1971), further research will benefit from the genetic knowledge gained. Results are likely to be useful in molecular marker-assisted selection and breeding for enhanced resistance and phytochemical production by increasing selection intensity, identifying more trees with low susceptibility, reducing the breeding cycle and overcoming temporal impediments such as age to trait expression. However, markers should be validated at least for the Alpine populations to exclude non-stable MTAs and reduce the risk of a break down with successive rounds of sexual reproduction and recombination. The present study utilized SNP markers that were found to be variable within Norway spruce and related conifers. Rare point mutation that could potentially cause pathogen resistance might have been withdrawn in this selection procedure. To date, the applied SNP array was the most affordable tool to explore the genetic background of the studied phenotypes. Further exploration should make use of a candidate gene approach or a high coverage of the whole genome. Moreover, further studies may benefit from a direct determination of the trait susceptibility and genotype environment interactions, for example by controlled inoculation tests on genetically identical clonal cuttings (Neale and Savolainen 2004), and the consideration of variations in the concentration of phenolic compounds during needle development.