The usefulness of multi-parent multi-environment QTL analysis: an illustration in different NAM populations

Commonly QTL detection in multi-parent population (MPPs) data measured in multiple environments (ME) is done by a single environment analysis on phenotypic values ‘averaged’ across environments. This method can be useful to detect QTLs with a consistent effect across environments but it does not allow to estimate environment-specific QTL (QTLxE) effects. Running separate single environment analyses is a possibility to measure QTLxE effects but those analyses do not model the genetic covariance due to the use of the same genotype in different environments. In this paper, we propose methods to analyze MPP-ME QTL experiments using simultaneously the data from several environments and modelling the genotypic covariances. Using data from the EU-NAM and the US-NAM populations, we show that these methods allow to estimate the QTLxE effects and that they give a more precise description of the trait genetic architecture than separate within environment analyses. The MPP-ME models we propose can also be extended to integrate environmental indices (e.g. temperature, precipitation, etc.) to understand better the mechanisms behind the QTLxE effects. Therefore, our methodology allows to exploit the full potential of MPP-ME data: to estimate QTL effect variations a) within the MPP between sub-populations due to different genetic backgrounds; and b) between environments.


I. Introduction
The use of multi-parent populations (MPPs) to investigate biological questions becomes progressively a regular practice in plant genetics and plant breeding. Different MPPs have been developed like the nested association mapping (NAM) populations (McMullen et al., 2009) or the multi-parent advanced generation inter-cross (MAGIC) populations (Cavanagh et al., 2008). The collections of crosses between a set of parents used in breeding programs can also be analyzed as MPPs (Würschum, 2012;Parisseaux and Bernardo, 2004). Different statistical approaches have been proposed to detect QTLs in NAM populations (Xavier et al., 2015), in MAGIC designs (Verbyla et al., 2014) or in MPPs composed of crosses (Jourjon et al., 2005).
The plant phenotype is the result of cumulative interactions between the genotype and the environment (Malosetti et al., 2013). Therefore, researchers have developed statistical procedures to detect QTLs taking the genotype by environment (GxE) interactions into consideration Korte et al., 2012). Several MPPs have been tested in multiple environments (MPP-ME) (Buckler et al., 2009;Giraud et al., 2014;Saade et al., 2016) but only few studies have proposed a proper MPP GxE QTL detection methodology (Piepho and Pillen, 2004;Verbyla et al., 2014). Most of the researches average the phenotypic values by calculating adjusted means or predictions across the environments that represent an average phenotypic value (Giraud et al., 2014;Poland et al., 2011;Buckler et al., 2009). In other articles, people performed separate analyses in each environment (e.g. Saade et al. (2016)).
We consider that the main interest of an MPP-ME QTL experiment is to estimate genetic (QTL) variations at two levels: a) within the MPP between sub-populations due to different genetic background; and b) between environments. In previous researches, we developed a framework for QTL detection in MPPs composed of crosses between a set of parents with different assumptions about the QTL effects (Garin et al., 2017(Garin et al., , 2018. The QTL effects were assumed to be more or less diverse/consistent in the different genetic backgrounds, which allowed to estimate QTL allelic variations within the MPP. In the present paper, we extended our methodology to the multi-environment context. We allowed the QTL effects to also vary between the environments to estimate QTL by environment (QTLxE) interactions. We further extended our models to integrate environmental information like temperature or precipitation to get a deeper understanding of the mechanisms behind the QTLxE effects. In the following sections, we present different methods for analyzing MPP-ME QTL experiments. We illustrate the usefulness of our methodology with examples coming from the EU-NAM and the US-NAM populations.

Statistical methodology
In Table 1, we present a list of models for analyzing MPP-ME data. The first set of models allows to model phenotypic variation. The second category represents genotypic models used to detect QTLs. The last section presents four methods for QTL detection in MPP-ME experiments. The models and their components are described in the following sections assuming that the data come from two environments.

Phenotypic models
Models 1 and 2 model phenotypic variations accounting for experimental design factors (des) such as replicates, incomplete blocks, row-columns, etc. In model 1, y icp is the plot data phenotypic value of genotype i from cross c. G ic is the genetic effect of genotype that can be separated into the tested genotypes and the check entries like in Boer et al. (2007). Finally, icp is the plot error. Model 1 allows to calculate the genotype best linear unbiased estimates (BLUEs) treating G ic as fixed. In model 1, we considered the data from each environment separately and calculated within environment genotype BLUEs (y E1 , y E2 ).
In model 2, we model the phenotypic variation of the two environments jointly. y icep is the phenotypic plot measurement of genotype i from cross c in environment e. The intercept µ e and the design term des (e) become environment specific. The term GE ice represents the genotype by environment interaction. In model 1 and 2, all terms were fitted as random except the genotype term G ic . The genotype BLUEs (y E(1,2) ) obtained with model 2 represent phenotypic values across environments.

Genotypic models
Single environment analysis

M1: MPP QTL analysis on BLUEs across environments
Genotypic models Models 3 and 4 are MPP QTL detection models for single and multi-environment data, respectively. In model 3, y ic is the phenotypic value of the i th individual in cross c. µ c is a cross intercept. The term G ic (3.1) describes the genotypic effect. G ic can be partitioned into a fixed QTL part ∑ q x T iq * β q and a residual part g ic . x T iq represents the expected number of QTL alleles received by individual i at position q and β q are the QTL allelic substitution effects.
The vector x T iq is of dimensions [1 × n al ] and varies according to the number of alleles (n al ) assumed at the QTL position. A first model called parental assumes that each parent contributes a unique allele to the MPP (n al = n p ). A second option called ancestral model assumes that genetically similar parents inherit their allele from a common ancestor. Parents are grouped in ancestral classes based on genetic similarity. We assume that each group represents one ancestral allele (n al = n a ). The final possibility is a bi-allelic model assuming that genotypes with the same SNP score transmit the same allele.
The elements of x T iq take values between 0 and 2. For the parental model, they represent the expected number of parental allele copies estimated using IBD probabilities computed by the package R/qtl (Broman et al., 2003). For the ancestral model the vector x T iq specifying the parental allele distribution is modified taking identical by state (IBS) parental genetic relatedness into consideration. We used the R package clusthaplo (Leroux et al., 2014) to evaluate the parent genetic similarities along the genome and infer common ancestral classes. For the bi-allelic model, x T iq is a scalar taking values 0, 1 or 2 corresponding to the number of IBS copies of the minor SNP allele. The model is estimated fixing one QTL allele as reference. In NAM populations, it is convenient to set the central parent allele as reference and to interpret the additive allelic substitution effect β q as the deviations with respect to the central parent.
G ic (3.1) is also composed of a residual genetic effect g ic that was not accounted by the QTLs. In the single environment analysis, g ic is not directly modelled. The residual genetic variation is modelled by the error term ic (3.2), which also contains the residual plot error. ij follows a normal distribution N(0, R). We assume that R = n c c=1 σ 2 c which means that the variance of the error term is different in each cross to take into consideration the heterogeneity that could exist between crosses.
Model 3 can be expressed in matrix notation (3.3) with y being the phenotypic values vector of dimension [N × 1]. X c is a [N × n c ] cross-specific intercept matrix with β c representing the vectors of cross intercepts. X Q is a [N × n al ] QTL incidence matrix and β Q the corresponding vector of QTL additive allelic substitution effects.
Model 4 is a modification of model 3 to analyze jointly the data from several environments. Compared to model 3, the cross effects µ ce , the QTL effects β qe (4.1), and the residual genetic effect g ice (4.2) are now indexed per environment. In expression 4.1, we modelled the environmental specific QTL allelic effects. These QTL allelic effects could also be partitioned into a main effect across environments and environment-specific components. This is a difference with respect to model 3. Another important difference between model 3 and 4, is the explicit modelling of the genotypic covariance between phenotypic observations measured on the same genotype in different environments via the term g ice (4.2). g ice is normally distributed N(0, σ 2 g ), which corresponds to a compound symmetry model assuming a uniform covariance between genotypes in all environments. More sophisticated variance covariance (VCOV) structures are possible like the unstructured model using a specific covariance term for each pair of environments.
The error term ice (4.3), is normally distributed N(0, R) with within environment cross-specific variance error terms R = n c c=1 σ 2 c(e) . For an illustration purpose, the VCOV matrix of phenotypic observations coming from two different crosses in two different environments take the following form: It represents a compound symmetry with heterogeneous cross-specific environment variances. In Model 4, the genotype by environment variance (σ 2 ge ) can not be distinguished from the error term (σ 2 c(e) ). Model 4.4 is a matrix expression of model 4. The vector of phenotypic values y = [y 1 y 2 ] T includes the phenotypic observations of the two environments. The cross term X c β c and the QTL term X Q β Q are extended to model environment-specific cross and QTL effects.

MPP-ME QTL detection methods
Here we present four methods to perform a QTL detection in MPP-ME experiments. The first three methods are two-stage analyses using first a phenotypic model to calculate genotype BLUEs that are used afterwards in a genotypic QTL detection model. The last method is a one-stage analysis. Method M1 performs a QTL analysis on genotype BLUEs calculated across environments. It uses the BLUEs calculated with model 2 as response variable in QTL model 3. Method M1 tests for association between the genotype and y E(1,2) , which represents a main phenotypic effect across environments. Therefore, M1 does not allow to estimate the QTLxE effects.
Method M2 allows to model the QTLxE interactions by performing separate QTL analyses in each environment. M2 uses the BLUEs calculated within each environment (y E1 , y E2 ) in QTL model 3. The methods M1 and M2 do not take the advantage of analyzing jointly the phenotype data measured in different environments, which has been shown to provide a greater understanding of the GxE interactions (Malosetti et al., 2004;Alimi et al., 2013). A proper MPP GxE QTL detection using mixed model allows to model the heterogeneity of genetic variance across environments and the genetic covariance between environments (Malosetti et al., 2013).
Method M3 analyzes jointly the within environment genotypes BLUEs (y E1 , y E2 ) using model 4 taking the covariance between the same genotype measured in different environments into consideration. The last method (M4) is a one-stage analysis on the plot data. For the one-stage analysis, we followed Boer et al. (2007) and separated the genetic effect terms of model 2 (G ic + GE ice ) into a check entries term (G i (e) ) and two terms to model the tested genotypes by the QTLs (∑ q x T iq * β qe ) and the residual genetic effect (g ice ). Method M4 allows to simultaneously estimates the non-genetic effects due to the experimental design (des (e) ) and the QTL variations.
The variance of the error term ic(e)p of models 1, 2 and M4 can be modelled by different VCOV structures taking for example spatial variations into consideration. In models 3, 4, and M4, the cross (µ ce ) and the QTL terms (β qe ) were fixed. The Wald test (McCulloch and Searle, 2001, 5.39) tests the global null hypothesis of all QTL allelic substitution effects being equal to zero. In models 4 and M4, the null hypothesis will be rejected if one allele is different from zero in at least one environment. The combination of four methods (M1-4) and three types of QTL effects (parental, ancestral, bi-allelic) represents 12 models for analyzing MPP-ME QTL experiments.

Plant material
To illustrate our methodology, we focused on two examples showing significant and observable QTLxE interactions. The examples came from the Flint EU-NAM and the US-NAM populations tested for a single trait in two environments.

EU-NAM data
The maize EU-NAM Flint population was composed of 811 double haploid (DH) lines coming from 11 crosses between UH007 and 11 peripheral parents representative of North Europe maize diversity (Bauer et al., 2013;Lehermeier et al., 2014). The EU-NAM Flint population was evaluated in six European locations for five traits. We used the raw phenotypic data provided by Lehermeier et al. (2014) http://www.genetics.org/content/198/1/3/suppl/DC1. We used the raw genotypic data provided by Bauer et al. (2013) available here http://www.ncbi.nlm.nih.gov/ geo/query/acc.cgi?acc=GSE50558, and the consensus map from Giraud et al. (2014) available at: http://maizegdb.org/data_center/reference?id=9024747. After quality control, we kept 5949 markers spread on 10 chromosomes for a total map length of 1584 cM. For the ancestral model, we clustered the parental lines at each marker position using a two cM window around the marker with the R package clusthaplo (Leroux et al., 2014). We detected an average of 6.7 ancestral classes along the genome.
From the possible combinations of traits and environments, we focused on dry matter yield (DMY, decitons per hectare, dt ha ) measured at La Coruña (Spain -CIAM) and at Roggenstein (Germany -TUM). Within a location, the trials were laid out as augmented p-rep designs with one third of the genotypes replicated. The genotypes were laid out with parents and checks in 160 incomplete block consisting of eight plots. Therefore, in model 1, 2 and M4 des(e) = rep l(e) + block m(le) to account for the l th replicate, and the m th block within replicate effects. The model 2 was equivalent to model 1 in Lehermeier et al. (2014). In M4, the variance of the error term was environment and cross specific (4.3). The average heritability on a line mean basis within cross over environments wash 2 = 47% (supplementary material S1). The Pearson correlation between the two within environment BLUEs was equal to 0.35.

US-NAM data
The maize US-NAM population was composed of 4421 recombinant inbred lines (RIL) coming from 25 crosses between B73 and 25 peripheral parents representative of the international maize diversity (McMullen et al., 2009). It was evaluated for 19 traits in up to 11 environments (Hung et al., 2012). We used the phenotypic data provided by Hung et al. (2012). The genotypic marker data and map data came from Ogut et al. (2015). We downloaded the data from http://www.panzea.org. We used the map with 1478 markers (1 mk/cM) and a total map length of 1434 cM. For the US-NAM data, we did not have the IBS marker data. Therefore, we did not calculate the ancestral We focused on days to anthesis (DTA) measured in 2007 in the North Carolina (NC) and the New York (NY) environments. Within an environment, the experimental design was a set design where each set contained all lines of a cross. Each set was randomized across environments as an α-design. The α-design was augmented by including the two parental lines of the cross within each incomplete block. The row and column effects were defined at the environment level (Hung et al., 2012). Therefore, in models 1, 2 and M4 des(e) = set l(e) + block m(le) + row n(e) + col o(e) to model the l th set, m th block within set, n th row and o th column effects. Model 2 was similar to the one used by Hung et al. (2012) removing the cross term. The set and the cross effects were not confounded because the parents and the checks were considered as being part of an 'extra' cross.
For the BLUEs computation in model 1 and 2, the VCOV structure of the error term was modelled by an environment-specific autoregressive correlation in the rows and columns (AR1 x AR1) (Gilmour et al., 1997). For the one-stage M4 analysis, the modelling of the spatial trend was however computationally too intensive, so we only used within environment cross-specific error terms (4.3). The average heritability on a line mean basis within cross over environments wash 2 = 78% (supplemental material S1). The Pearson correlation between the two within environment BLUEs was equal to 0.25.

QTL detection procedure
The QTL detection procedure was composed of a simple interval mapping scan to select cofactors followed by a composite interval mapping (CIM) scan to build a multi-QTL model. The final list of QTLs was evaluated using a backward elimination. The cofactors were selected with a minimum in between distance of 50 cM to avoid model overfitting. The QTLs were selected with a minimum distance of 20 cM. We fixed the cofactor and QTL detection thresholds to −log10(p − value) = 4. We performed the QTL detection extending the R package mppR (Garin et al., 2018) to the multienvironment situation. The mixed models were calculated using asreml-R (Butler et al., 2009). The R package programmed can be found here https://github.com/vincentgarin/mppGxE.

Cross-validation
We adapted the cross-validation (CV) procedure described by Utz et al. (2000) to the MPP-ME context to evaluate the performances of the QTL detections. For each combination of methods (M1-4) and type of QTL effects (parental, ancestral, bi-allelic), we ran three times a three-fold CV procedure, which gave us nine estimates for each parameter. We did not perform the CV for the one-stage (M4) QTL analysis of the US-NAM data due to computational limitations. We partitioned the full dataset at the within-cross level into an estimation set (ES) and a test set (TS). We used the estimated effects of the QTLs detected in the ES to predict phenotypic values of the ES and TS (e.g.ŷ TS = X TSβTS ).
We calculated the proportion of genetic variance explained in the ES by the QTLs (p ES ) using the Pearson correlation between the reference values y ES and the predicted valuesŷ ES . We calculated the proportion of genetic variance predicted in the TS by the QTLs (p TS ) using the Pearson correlation between the reference values y TS and the predicted valuesŷ TS . We used as reference values (y ES and y TS ) the within environment BLUEs (model 1) to compare our results across methods. Thep ES andp TS were computed within crosses per environment. We estimated thep ES andp TS at the whole MPP level by calculating the average within cross values.

Modelling QTL effect in relation to environmental information
The methods M2, M3 and M4 allow to detect environmental differences of the QTL effects but they do not allow to understand how environmental characteristics influence the QTL effects. A natural extension is to integrate environmental information to understand better the QTLxE interaction. Inspired by the model 16 proposed by Malosetti et al. (2013), we can reformulate the QTL part of G ijk (4.1) to describe the QTL effect of a single QTL (q * ) in term of an environmental covariate (Z e ).
In this formula the QTL part is the same as in 4.1 for all QTLs except for q * . For the QTL q * , we decompose its environmental effect (β q * e ) into a main effect component α q * and a component β q * describing the sensitivity to the environmental covariate Z e . In the MPP context, α q * and β q * will be vectors of dimension [1 × n al ] containing one element per QTL allele l. α q * l represents the QTL effect of allele l when Z e is equal to zero. β q * l is the environmental sensitivity of QTL allele l and represents the amount of change in trait quantity for one extra unit of Z e . α q * l and β q * l are both defined with respect to a reference allele (e.g. the central parent). Z e is the value of the environmental covariate (e.g. temperature). Finally, a eq * ∼ N(0, σ 2 aq * ) is the residual unexplained QTL effect. Table 2 contains the CV results for the two populations over the different combinations of methods (M1-M4) and QTL effects (parental, ancestral, bi-allelic). We did not detect large differences in terms of prediction power (p TS ) between the different methods. For example, for the US-NAM data in the first environment (NC), the averagep TS across crosses was equal to 15, 16 and 17 for the M1, M2 and M3 methods, respectively. Similarly, for the EU-NAM ancestral model in the second environment (TUM), thep TS was equal to 11, 11, 15 and 12 for the M1 to M4 methods, respectively. Concerning the number of QTLs, we observed that, on average, the separate single environment M2 method detected the lowest number of QTLs. The one-stage M4 analyses detected less QTLs than M1 and M3. In some cases, method M1 detected the largest number of QTLs like in the three EU-NAM complete data analyses.

Cross-validation results
Concerning the type of QTL effect (parental, ancestral, bi-allelic), we also did not detect any difference in terms of prediction power (p TS ). For the EU-NAM data, the average within crosseŝ p TS of the parental, of the ancestral, and of the bi-allelic model varied between: 8-16, 10-15, and 9-14, respectively. We noticed however, that we detected on average more QTLs with the bi-allelic and the ancestral models. -log10(p-values) scatter-plots In Figure 1, we plotted the -log10(p-values) of the CIM profiles obtained with the method M4 with respect to the methods M1 to M3 for all complete data QTL analyses. We could observe that, in general, the -log10(p-values) were larger in M4. The differences between the M4 and the M2 profiles were the largest. Concerning M4 versus M1, an important fraction of the -log10(p-values) were superior in the M4 profiles with respect to the M1 profiles. However, for the most significant -log10(p-values), the M1 method gave sometimes slightly larger -log10(p-values) than the M4 method, for example for the EU-NAM parental model. The -log10(p-values) obtained with M3 and M4 were similar.

Detected QTLs
In Figure 2, we plotted the -log10(p-values) CIM profile of the US-NAM M4 parental QTL analysis with a representation of the genetic effect significance per parent and per environment along the genome. On that plot, we observed that the QTLs on chromosomes eight, nine, and ten were detected with a large significance. Looking at the genetic effect distribution along the genome, we also noticed that these QTLs potentially had a different parental allelic series between the environments. The information about the significance of the QTL genetic effect along the genome should however be taken with caution because it is based on an incremental and conditional Wald test that can change given the order of the tested parameters (Butler et al., 2009).
In Figure 3, we plotted the -log10(p-values) CIM profile of the EU-NAM M4 ancestral QTL analysis with a representation of the genetic effect significance per parent and per environment along the genome. We noticed that the QTL on chromosome six had an interesting allelic series. Indeed, many parents were grouped in the same ancestral class and the QTL had a genetic effect specific to the second environment (TUM). The rest of the detected positions for all methods and type of QTL model combinations can be found in the supplementary material S2.

Estimation of the QTL allelic substitution effects
We represented the QTL allelic series of the QTL detected on chromosome six in the EU-NAM ancestral model and the QTLs detected on chromosomes eight, nine and ten in the US-NAM data in Figure 4. The estimated QTL effects were conditioned on the list of cofactors detected with the corresponding final model. The numerical QTL allelic substitution effect values can be found in the supplementary material S3. In the text, the standard errors of the allelic effects are given in parentheses. In Figure 4, we observed the differences in terms of QTL effect estimation between method M1 using BLUEs representing an average phenotypic effect across environments, and the method M4, which allows to estimate environment-specific QTL effects. The results obtained with method M4 were comparable to the ones obtained with methods M2 and M3. A full comparison between the estimated QTLs effects obtained with all the methods can be found in the supplementary material S4.
The QTL detected on chromosome six with the ancestral model in the EU-NAM population (Figure 4-A), was the most illustrative example of environment-specific QTL effects. In the second environment (TUM), an ancestral allele inherited by parents D152, EC49A, F03802, F2, F283, UH006 and DK105 had a strong negative effect of −7.2(0.8) dt ha with respect to the ancestral group containing parents UH007, EZ5, and UH009. In the first environment (CIAM), the effect of the main ancestral group was substantially reduced (−1.2(0.8) dt ha ) and non-significant. In the M1 method, the main ancestral allele took an average values (−4.1(0.7) dt ha ) across the two environments.
In example 4B, we observed that the three most significant parental alleles had an effect that was stronger in the second environment (NY) compared to the first one (NC). The estimated allelic substitution effects (in days) of parental alleles IL14H, MS71, and P39 were -1.3(0.1) vs -0.8(0.1), -1.2(0.1) vs -0.5(0.1), and -1.6(0.2) vs -1.0(0.2), respectively. The estimated QTL effects obtained with M1 were again approximately averaged between the two environments with -1.1, -1.1 and -1.3 days respectively. Similarly, in example 4C, we observed that several parental alleles had a stronger positive expression in NY compared to NC. For example, the allelic substitution effect of Ki11 was equal to +1.9(0.2) days in NY and +0.3(0.2) days in NC. The allelic substitution effect of CML277 was equal to +1.9(0.2) days in NY while it was equal to +0.8(0.2) days in NC. Finally, in example 4D, the most significant parental alleles had a stronger effect in the second (NY) environment. For example, the estimated allelic substitution effect of CML277 was equal to +3(0.2) days in NY and +1.4(0.2) days in NC. Once again, in the examples 4C and 4D, the allelic substitution effects estimated with method M1 were approximately averaged across the two environments.

QTL effect in relation to environmental information
To illustrate the extension of our models with environmental covariates, we re-analyzed the effect of the QTL detected on chromosome six (84.2 cM) in the EU-NAM with the M3 ancestral model including the effect of water precipitation (Z e ). This QTL had five alleles: allele A (UH007-central parent), allele B (D152, EC49A, EP44, F2, F64, UH006), allele C (F03802, F283), allele D (UH009, DK105), and allele E (EZ5). We used the final QTL model and the average water precipitation in mm (Wat.(mm)) at each location between July and August obtained from https://en.climate-data.org/. To increase the range of the environmental covariate, we included four environments (La Coruña, Roggenstein, Einbeck, Ploudaniel). We considered the precipitation of the driest location (La Coruña -35 mm) as the reference level. The precipitation in the other environments were expressed as the difference with respect to the reference.
In Table 3, we can observe the estimates of the QTL main effects (α) and QTL x environment effects (β) on the trait. We noticed that, in the driest environment (La Coruña), the allele B reduced the yield by 0.75 dt ha compared to allele A. When the level of precipitation increased, this difference was accentuated by 0.06 dt ha * Wat.(mm). In Table 4, we calculated the difference in yield between an homozygous genotype with allele A versus B in the four environments (2 * (α B + Z e * β B )). In this table, we observed that the difference was equal to 1.5 in the driest reference environment (La Coruña). The extra yield given by allele A increased with more precipitation (e.g. 6.6 dt ha at Roggenstein).

IV. Discussion
Several MPPs have been characterized in multiple environments but, most of the time, the QTL analyses were performed on genotype BLUEs calculated across environments, which represent average phenotypic values (Giraud et al., 2014;Buckler et al., 2009). We called that method M1. M1 does not estimate the QTLxE effects. Therefore, M1 does not allow to use the full information potential of MPP-ME QTL experiments, measuring QTL variations: a) within the  MPP between sub-populations due to different genetic backgrounds; and b) between environments.
An alternative to estimate the QTLxE effects in MPPs is to perform separate QTL analyses within each environment (M2). However, this method does not model the covariance due to the repeated measurements on the same genotype. Therefore, we proposed two methods (M3 and M4) that allowed to model properly the QTLxE effects in MPPs. Those methods used simultaneously the phenotypic data from multiple environments taking into consideration the covariance existing between the same genotype measured in different environments. With respect to M3, M4 also integrated the sources of variation due to experimental design elements performing a one-stage analysis on the plot data. Using M3 and M4 we were able to detect important QTLs and we showed that these QTLs had environmental specific allelic effects. Finally, we extended our models to integrate environmental information and better characterize the QTLxE effects.
Comparison of the detected QTLs with the existing literature Using the different methods presented, we detected several interesting QTLs for DMY and for DTA. The QTL on chromosome six at 82.1 cM detected by the ancestral model in the EU-NAM population ( Figure 4A) was also detected by Giraud et al. (2014). They detected a QTL at 83.5 cM, 90.5 cM and 90.7 cM using the connected, the 2cM-LDLA and the 1mk-LDLA models, respectively.
The QTLs detected for DTA in the US-NAM population on chromosomes eight, nine, and ten, at 66, 47, and 42 cM, respectively ( Figure 4B, C and D), were also detected by Buckler et al. (2009). They detected corresponding QTLs on chromosomes eight, nine, and ten, at 67, 45.2, and 42.9 cM, respectively. According to Buckler et al. (2009), the QTL on chromosome eight is close to the vegetative to generative transition 1 (vgt1) gene (Salvi et al., 2002). Concerning the QTL on chromosome ten (42 cM), Giraud et al. (2014) also detected a QTL with a strong effect on flowering time between 45 to 50 cM. According to them, it corresponds to the ZmCCT gene Ducrocq et al. (2009).

Estimation of the QTLxE effects
In Figure 4 and appendix S4, we showed that important QTLs had environment-specific allelic substitution effects. In those cases, the QTL effects estimated with method M1 were inaccurate. Those QTL effects were overestimated in one environment and underestimated in the other. Therefore, in presence of QTLxE effect, the use of method M2, M3, or M4 is necessary to estimate properly the QTL environmental differences.
The QTL detected on chromosome six at 82.1 cM in the EU-NAM population ( Figure 4A) is a good illustration of the possibility to estimate QTL effect variations at two levels: within the MPP between sub-populations, and between environments. At that position, the QTL effect was rather consistent within the MPP because an important group of parents showed a negative effect that could be due to a common ancestral allele. For that QTL, we could also observe environmental variation because the allelic effects were only significant in the second environment. This example illustrates the interest of using a sound MPP GxE QTL detection methodology.

QTL effect in relation to environmental information
In Tables 3 and 4, we showed that methods M3 or M4 could be extended to integrate environmental information to better characterize the QTLxE effects. The estimation of the water precipitation effect on a single QTL was a simple case with a unique environmental covariate but we could imagine more complex models with more QTLs and/or environmental covariates (Malosetti et al., 2004). We assumed a linear relationship between the QTL effect and the environmental covariate. More complex relationships such as a quadratic form or splines could also be assumed . The ultimate goal of such an approach is to unravel the physiological mechanisms behind the QTL effects. This possibility to integrate environment information make methods M3 or M4 more attractive than M2.

Full data analyses and cross-validation results
In terms of prediction power, the CV results (Table 2) did not allow to make a difference between the four methods. Thep TS were mostly similar. We showed in the full data analyses and the CV that, on average, method M2 detected less QTLs. The prediction power (p TS ) of M2 was also reduced with respect to the other methods, even if the differences were small. According to our results, the use of separate within environment analyses is therefore not an optimal strategy. The joint analysis of multi-environment data, as performed in M3 and M4, accounted better for the shared effects across environments, which was beneficial for the QTL analysis.
In the full data analysis, we also noticed that in all EU-NAM analyses (parental, ancestral, bi-allelic), method M1 detected the largest number of QTLs. In those cases, the extra power of M1 could be explained by the fact that this method uses a reduced number of degrees of freedom (df) to estimate the QTL effect. Indeed, in M1, the QTL term uses n al − 1 df, while in M2, M3 or M4, the QTL term uses N Env * (n al − 1) dfs. When the QTLxE effect is strong, the loss in power due to the extra df for the QTLxE term is compensated by a better modelling of the GxE effect. However, when the QTL effect is consistent across the environments, the additional modelling of the QTLxE variation penalizes the test for a QTL. With consistent QTL effects, method M1 is more parsimonious.
We would like to emphasize that we selected examples with significant observable GxE interactions but that these situations did not represent the majority of the cases we tested. This observation is in agreement with Buckler et al. (2009) who found that, for flowering time traits in the US-NAM population, the environment-specific QTL effects were small compared to the main QTL effects across environments. The potential weakness of QTLxE effects with respect to the main QTL effects in the US-NAM and EU-NAM populations, could explain why, in some cases, method M1 obtained better results than the GxE analyses (M2-M4). The existence of GxE effects could be more important in MPP-ME QTL experiments where the environmental conditions are more contrasted, for example, in experiments testing the same population in control versus heat or salt stress conditions (Saade et al., 2016). Table 2, method M3 detected on average more QTLs than M4. Looking at the list of detected QTLs (appendix S2), we noticed that, in the EU-NAM analyses, the positions of the QTLs in M3 and M4 were consistent. In the US-NAM analyses, the positions of the QTLs with a large significance were consistent between M3 and M4. However, several QTLs with a low or medium significance were either only detected in M3, or distant by 10-15 cM between the two methods.

As observed in
The difference between M3 and M4 could be explained by the modelling of the experimental design variation in M4. Piepho and Pillen (2004) showed that even if the variance of the experimental design factors were smaller than other elements like the genetic covariance, it could still reduce substantially the QTL effect when it was used in a one-stage analysis. This reduction of the QTL effect could explain why we detected less QTLs in M4 with respect to M3. We should still remember that in the US-NAM analysis we used an AR1 x AR1 covariance structure for the error term to calculate the within environment BLUEs used in the M3 analyses. Due to computational limitations, we did not use the AR1 x AR1 covariance structure in M4. This could also explain the differences between M3 and M4 in the US-NAM analyses.

Other extensions
To present our methodology, we used examples from NAM populations but our methods and the reasoning behind it are also valid for any MPP composed of crosses like diallel population or factorial designs used in breeding programs. Our methods can also be adapted to the multi-trait situation. The analysis of longitudinal traits measured at different time points using a VCOV reflecting the time dependence could be a possibility. For an illustration purpose, we only used data coming from two environments but we could increase this number. However, the fitting of mixed models on large datasets can be computationally intensive. For example, it took us four days to perform the M4 one-stage QTL detection in the US-NAM population on a personal computer (Intel Core i7-3770 CPU 3.4 GHz).

Conclusions
We proposed mixed model methods to detect QTL in MPP-ME data. These models analyzed jointly data from several environments and modelled the genetic covariance due to repeated measurements on the same genotype. Our methods allowed to estimate the QTLxE effect while the methods using genotype BLUEs calculated across environments did not. However, the methods focusing on the main QTL effects remain useful if the QTL effects are consistent across the environments. Moreover, we showed that our methods could be extended to integrate environmental information and understand better the mechanisms behind the QTLxE effects. The methods we proposed are therefore an interesting tool to exploit the full information potential of MPP-ME data. They allow to estimate the QTL variations: a) within MPP between sub-populations due to different genetic backgrounds; and b) between environments.