Multi-parent multi-environment QTL analysis: an illustration with the EU-NAM Flint population

Key message Multi-parent populations multi-environment QTL experiments data should be analysed jointly to estimate the QTL effect variation within the population and between environments. Abstract Commonly, QTL detection in multi-parent populations (MPPs) data measured in multiple environments (ME) is done by analyzing genotypic values ‘averaged’ across environments. This method ignores the environment-specific QTL (QTLxE) effects. Running separate single environment analyses is a possibility to measure QTLxE effects, but those analyses do not model the genetic covariance due to the use of the same genotype in different environments. In this paper, we propose methods to analyse MPP-ME QTL experiments using simultaneously the data from several environments and modelling the genotypic covariance. Using data from the EU-NAM Flint population, we show that these methods estimate the QTLxE effects and that they can improve the quality of the QTL detection. Those methods also have a larger inference power. For example, they can be extended to integrate environmental indices like temperature or precipitation to better understand the mechanisms behind the QTLxE effects. Therefore, our methodology allows the exploitation of the full MPP-ME data potential: to estimate QTL effect variation (a) within the MPP between sub-populations due to different genetic backgrounds and (b) between environments. Electronic supplementary material The online version of this article (10.1007/s00122-020-03621-0) contains supplementary material, which is available to authorized users.


Introduction
The use of multi-parent populations (MPPs) becomes progressively a regular practice in plant genetics and plant breeding. Different MPPs have been developed like the nested association mapping (NAM) populations (McMullen et al. 2009) or the multi-parent advanced generation inter-cross (MAGIC) populations (Cavanagh et al. 2008). The collections of crosses between a set of parents used in breeding programs can also be analysed as MPPs (Parisseaux and Bernardo 2004;Würschum 2012). Different statistical approaches have been proposed to detect QTLs in MPPs composed of biparental crosses (Jourjon et al. 2005), in NAM populations (Li et al. 2011) or in MAGIC designs (Verbyla et al. 2014).
The plant phenotype is the result of cumulative interactions between the genotype and the environment (Malosetti et al. 2013). Therefore, researchers have developed statistical procedures to detect QTLs taking the genotype by environment (GxE) interactions into consideration (Boer et al. 2007;Korte et al. 2012). Several MPPs have been tested in multiple environments (MPP-ME) (Buckler et al. 2009;Giraud et al. 2014;Saade et al. 2016), but only few studies have proposed a proper MPP GxE QTL detection methodology (Piepho and Pillen 2004;Verbyla et al. 2014). Most of the researches average the phenotypic values by calculating adjusted means across the environments that represent an average genotypic value (Buckler et al. 2009;Giraud et al. 2014;Poland et al. 2011). In other articles, the authors performed separate analyses in each environment (e.g. Saade et al. 2016).
We consider that the main interest of an MPP-ME QTL experiment is to estimate genetic variation at two levels: (a) within the MPP between sub-populations due to different genetic backgrounds and (b) between environments. Therefore, in this article, we extended the framework we developed Communicated by Laurence Moreau.
1 3 for QTL detection in MPPs composed of biparental crosses (Garin et al. 2017(Garin et al. , 2018 to the multi-environment scenario. We allowed the multi-allelic QTL effects to vary between the environments to estimate QTL by environment (QTLxE) interactions. We further extended our models to integrate environmental information like temperature or precipitation to get a deeper understanding of the mechanisms behind the QTLxE effects. In the following sections, we present different methods to analyse MPP-ME QTL experiments and illustrate our methodology with examples coming from the EU-NAM Flint population.

Statistical methodology
In the next sections, we present four methods for QTL detection in MPP-ME experiments with increasing complexity. Methods one to three are two-stage analyses combining genotype adjusted means computation and a QTL detection model. Method four is a one-stage analysis on the plot data that simultaneously estimates experimental design and QTL effects. The underlined terms are considered as random with normal distribution and proper variance, the others as fixed.

M1: average effect across environments analysis
Method M1 performs a QTL analysis on the genotype best linear unbiased estimates (BLUEs) calculated across environments with the following model where y icep is the phenotypic value of genotype i from cross c in environment e measured in plot p. is the intercept. E e is the environment effect. D (e) represents the within environment experimental design factors such as replicates and blocks. G ic is the genotype effect. The term GE ice represents the genotype by environment interaction. Finally, icep is the plot error. Depending on the modelling objective or the assumptions about the design, G ic and D (e) can be considered as fixed or random. Model 1 models jointly the phenotypic measurements of the N e environments and calculates the genotype BLUEs ( Ḡ ic ) treating G ic as fixed. Those BLUEs represent the genetic main effect across environments.
Then, we can further analyse the genotype BLUEs Ḡ ic using the following QTL model: where Ḡ ic is the genotype BLUE of individual i in cross c. It can be partitioned into a cross effect C c , and a QTL x ia is the number of QTL alleles (a) received from a parent or ancestor (see below) by individual i at the QTL position. a represents the allelic substitution effect. The number of alleles ( n a ) at the QTL varies according to the user assumption. A first model called 'parental' assumes that each parent contributes a unique allele to the MPP (Blanc et al. 2006) ( a = 1, … , n p ). A second 'ancestral' model assumes that genetically similar parents inherit their alleles from a common ancestor (Bardol et al. 2013) ( a = 1, … , n anc ). Finally, a bi-allelic model assumes that genotypes with the same single-nucleotide polymorphism (SNP) score, transmit the same allele ) ( a = 1, 2).
x ia takes values between 0 and 2. For the parental model, they represent the expected number of parental allele copies estimated using identity by descent (IBD) probabilities. The ancestral model is an haplotypic IBD model where parental alleles corresponding to the same ancestral class are merged. We used the R package clusthaplo (Leroux et al. 2014) to determine ancestral classes along the genome based on local identical by state (IBS) genetic similarities. For the bi-allelic model, x ia is equal to the number of SNP minor alleles. We estimated the models setting the NAM central parent allele as reference, and we interpreted the allelic substitution effects a as deviations from the reference allele.
G ic is also composed of a residual genetic variation term g ic and a non-genetic variation and plot error term ic . For BLUEs, g ic and ic are confounded. Therefore, g ic is modelled as part of the non-genetic and plot error variance. We assume that ic follows a normal distribution N(0, 2 c ) . The error variance 2 c is assumed to be cross-specific to account for the heterogeneity that could exist between crosses (Garin et al. 2017;Xu 1998). In M1, since Ḡ ic represents the main genotypic effect across environments, it is not possible to estimate the QTLxE effects. M1 was used in Buckler et al. (2009), Poland et al. (2011), or Giraud et al. (2014.

M2: within environment analyses
Method M2 models the QTLxE effects by performing separate QTL analyses in each environment on the genotype BLUEs calculated with model 1 using single environment data. Therefore, the terms E e , and GE ice are dropped. The design and error terms are not anymore indexed per environment. Using this model, it is possible to calculate within environment genotype BLUEs Ḡ ice . In a second step, we can further analyse each environment-specific BLUEs vector by performing N e QTL analyses using model 2. Such separate within environment analyses are a first possibility to estimate the QTLxE effects (e.g. Saade et al. 2016).

M3: two-stage multi-environment analysis
By analyzing environment data, separately M2 does not model the covariance due to repeated measurements on the same genotype. According to Piepho (2005), ignoring the between environment genetic covariance can increase the false positive rate. From a general point of view, using a correct variance covariance (VCOV) structure improves the quality of the test to detect QTLs (van Eeuwijk et al. 2010). Therefore, in method M3, we analyse jointly the within environment genotype BLUEs Ḡ ice by extending model 2 to the multi-environment situation Compared to model 2, the cross effects C ce , the QTL allelic effect ae , the residual genetic variation GE ice , and the nongenetic variation plus plot error term ice are indexed per environment. Here, we model environment-specific QTL allelic effects. For example, parental allele a in environment e and e ′ . The QTL effects could also be partitioned into a main effect across environments and environment-specific components, an important difference with respect to M2.
Another important difference between models 2 and 3 is the possibility to model both GE ice and ice by the use of N e within environment genotype BLUEs vectors. GE ice and ice are confounded and are modelled jointly. The confounded term ( GE ice + ice ) can be modelled by assuming a genotypic main effect across environments ( GE ice ∼ N(0, 2 g ) ) and a homogeneous or heterogeneous residual term, making the variance covariance matrix across environments of the compound symmetry (CS) class (Boer et al. 2007) (Boer et al. 2007). When we assume cross-specific residual terms ( ice ∼ N(0, 2 ce ) ), which contain constant genotype by environment interaction contributions and a plot error-related cross-specific error term, we take into consideration covariances between environments and heterogeneities within environments.
The combination of the CS and the environment crossspecific errors (ECSE) is a first possibility to model the VCOV structure in an MPP GxE QTL model. It is a simple extension of the VCOV of model 2 accounting for the environment effect on the error term and for the genotypic covariance between environments. The VCOV matrix of phenotypic observations coming from two different crosses in two different environments takes the following form:

M4: one-stage multi-environment analysis
The last method M4 is a one-stage analysis on the plot data using an extension of model 1 where the genotype effect G ic is replaced by the environment-specific cross effect ( C ce ) and QTL effect ( x ia * ae ) The definition of the QTL effect and of the VCOV (CS + ECSE) is the same as in model 3. The extension of models 2, 3, and 4 to a multi-QTL model can be done replacing the QTL term by ∑ q x ia(q) * ae(q) . M4 allows the simultaneous estimation of the non-genetic effects due to the experimental design ( D (e) ) and of the QTL variation. Similar one-stage analysis models for a two crosses NAM population and a MAGIC population were presented in Piepho and Pillen (2004) and Verbyla et al. (2014), respectively.

Significance of the QTL effect
The Wald test (W) (McCulloch and Searle 2001, 5.39) can be used to test the null hypothesis of all QTL allelic substitution effects being equal to zero. W ∼ 2 df with the degrees of freedom (df) equal to n a − 1 . In models 3 and 4, the null hypothesis will be rejected if one allelic substitution effect is different from zero in at least one environment. The combination of four methods (M1-4) and three types of QTL effects (parental, ancestral, bi-allelic) represents 12 models to analyse MPP-ME QTL experiments.

QTL detection procedure
The QTL detection procedure was composed of a simple interval mapping scan to select cofactors followed by a composite interval mapping (CIM) scan to build a multi-QTL model. The final list of QTLs was evaluated using a backward elimination. The cofactors were selected with a minimum in between distance of 50 cM to avoid model overfitting. The QTLs were selected with a minimum distance of 20 cM. After QTL detection, we determined the number of unique QTL positions across methods. We considered that QTL positions detected by different methods represented the same QTL if they were separated by less than 10 cM. We performed the QTL detection and the estimation of the QTL effects using the central parent as reference.
We fixed the cofactor and QTL detection thresholds to − log 10(p value) = 4. The choice of a fixed value for the QTL detection threshold allowed us to avoid computationally intensive permutation tests. Even though there exist alternative strategies for threshold determination using simulation-based critical values (Lander and Botstein 1989) or analytical methods (Rebai et al. 1994), those methods have been mostly used in biparental crosses or in association panels. According to Rebai et al. (1994), the MPP situation is more complex. Therefore, in most of the MPP QTL detection studies, authors have used permutation tests or Bonferroni correction. The choice of − log 10(p value) = 4 as threshold was influenced by values obtained by permutation in similar models. For example, in Giraud et al. (2014), the thresholds calculated for the same population and trait took values between 3.65 and 4.36. In our example, the use of a Bonferroni correction would give a threshold of 5.08. Varying the threshold values between 3 and 5 would have a small impact on the final results by only changing the list of positions with small or medium effects.

Modelling QTL effect in relation to environmental information
When a QTL position showing QTLxE effect is identified, it is possible to extend methods M3 and M4 to better understand the QTLxE effect by integrating environmental information. Inspired by model 16 from Malosetti et al. (2013), we can reformulate the QTL part of models 3 and 4 to describe the QTL effect in terms of sensitivity to the environmental covariate Z e (e.g. temperature) The environmental allelic effect ( ae ) is decomposed into a main effect a and a component a describing the sensitivity to the environmental covariate Z e . In the MPP context, a and a will be vectors of dimension [1 × n a ] containing one element per QTL allele. a represents the QTL effect of allele a when Z e is equal to zero. a is the environmental sensitivity of allele a and represents the amount of change in trait quantity for one extra unit of Z e . a and a are both defined with respect to a reference allele (e.g. the central parent). Finally, ae is the residual unexplained QTL effect.

Computation
To perform the QTL detection, we extended the R package mppR (Garin et al. 2018) to the multi-environment situation https ://githu b.com/vince ntgar in/mppGx E. The mixed models were calculated using asreml-R (Butler et al. 2009). To (5) x ia * ae = x ia * ( a + Z e a + ae ) ensure transparency and the reproducibility of this research, all data files, scripts and software are available at https :// githu b.com/vince ntgar in/mppGx E_data.

Plant material
To illustrate our methodology, we focused on examples showing significant and observable QTLxE interactions coming from the EU-NAM Flint population tested in two environments. The maize EU-NAM Flint population is composed of 811 double haploid lines coming from 11 crosses ( N c ∈ [17 − 133] ) between UH007 and 11 peripheral parents representative of North Europe maize diversity (Bauer et al. 2013;Lehermeier et al. 2014). The EU-NAM Flint population was evaluated as testcross in six European locations for five traits. We used: (a) the raw phenotypic data provided by Lehermeier et al. (2014) available at http:// www.genet ics.org/conte nt/198/1/3/suppl /DC1; (b) the raw genotypic SNP marker data provided by Bauer et al. (2013) available at http://www.ncbi.nlm.nih.gov/geo/query /acc. cgi?acc=GSE50 558; and (c) the consensus map from Giraud et al. (2014) available at http://maize gdb.org/data_cente r/ refer ence?id=90247 47.
We performed a quality control by removing the markers with a minor allele frequency < 0.05 or > 10% missing values. In the situations where several markers were located at the same position, we kept the most polymorphic one. After quality control, we kept 5949 markers spread on 10 chromosomes for a total map length of 1584 cM. For the parental and the ancestral models, missing genotypic scores were imputed during the computation of the IBD probabilities. For the bi-allelic model, we imputed randomly the 1.2% missing values. For the ancestral model, we clustered the parental lines at each marker position using a two cM window around the marker with the R package clusthaplo (Leroux et al. 2014). We detected an average of 6.7 ancestral classes along the genome.
From the possible combinations of traits and environments, we focused on biomass dry matter yield at the whole plant level (DMY, decitons per hectare, dt ha −1 ) measured at La Coruña (Spain -CIAM) and at Roggenstein (Germany -TUM). This combination of trait and pair of environments was the one that showed the largest potential GxE effect with the lowest Pearson correlation between environmentspecific BLUEs.
Within a location, the trials were laid out as augmented p-rep designs with one-third of the genotypes replicated. The genotypes were laid out with parents and checks in 160 incomplete block consisting of eight plots. Therefore, in models 1 and 4, D (e) = rep l(e) + block m(le) to account for the lth replicate, and the mth block within replicate effects. The replicate and block terms were considered as random with a unique variance term. To manage the check entries, we followed Boer et al. (2007) and partitioned the genotype term of models 1 and 4 into check and entries.
For the heritability computation (supplementary material S1), we assumed a cross-specific GxE random term for GE ice (model 1). We noticed that for two crosses (EZ5 and F283), we could not get an estimate of the cross-specific GxE variance. This is due to the partial genotype replication between environments. Therefore, it seems more appropriate to assume a homogeneous genotype by environment variance term for GE ice as we did in model 1, 3, and 4. The average heritability on a line mean basis within crossover environments was 47% . The Pearson correlation

QTL analyses
In Fig. 1, we plotted the CIM profile of the EU-NAM Flint DMY M4 ancestral analysis. Figure 1b shows the QTL allelic effect significance per parent and environment along the genome. The significance of the QTL genetic effect along the genome should, however, be taken with caution because it is based on a conditional Wald test that can change given the order of the tested parameters (Butler et al. 2009). We noticed that the QTL on chromosome six had an interesting allelic effect series. Indeed, many parents were grouped in the same ancestral class and the QTL had a genetic effect specific to the second environment (TUM). Figure 2 represents the number of specific and common QTLs for DMY between methods (M1-M4) for each type of QTL effects (parental, ancestral, bi-allelic). The lists and plots of all detected QTL positions can be found in the supplementary material S2. We detected 11, 13 and 22 unique QTL positions across methods using the parental, ancestral and bi-allelic models, respectively. We detected the largest number of QTLs with M1, testing on QTL main effects across environments. Taken separately, the within environment M2 analyses detected the lowest number of QTLs. However, considered together, the two within-environments M2 analyses detected as many or more QTLs than M1. For example, in the parental model, M1 detected eight unique QTLs, while M2-E1 and M2-E2 detected nine QTLs. M3 and M4 detected less QTLs than M1 and M2. For example, in the bi-allelic model, we detected 12, 11, 10, and nine unique QTLs with M1, M2, M3 and M4, respectively.
From a general point of view, the methods showed a moderate level of consistency concerning the detected QTL positions. The number of positions common to at least two QTLs detected by two methods were the same if they were separated by less than 10 cM (colour figure online) methods decreased from the parental to the bi-allelic model. For example, the percentage of common QTLs between M1 and M2 decreased from 55 to 23 from the parental to the bi-allelic model. The overlap was stronger between M3 and M4 with percentage of common QTLs between 85 and 58 from the parental to the bi-allelic model. Few positions were detected consistently across methods and models. For example, a QTL on chromosome 10 around 45 cM was detected by the four methods in all QTL type models. The QTL on chromosome 6 between 82 and 86 cM was also detected by all methods and QTL types.
Looking at the environment-specific method M2, we noticed that more QTLs were detected in the CIAM (M2-E1) compared to the TUM (M2-E2) environment. For example, the parental model detected six QTLs in CIAM and only three QTLs in TUM. The QTLs detected with M2 were mostly environment-specific. Only two QTLs were common to M2-E1 and M2-E2, for example the QTL on chromosome 10 at 45.2 cM detected by the bi-allelic model.

− log 10(p values) scatter plots
In Fig. 3, we plotted the − log 10(p values) of the CIM profiles obtained with M4 with respect to methods M1 to M3. We could observe that, in general, the − log 10(p values) were larger in M4. The differences between the M4 and the M2 profiles were the largest. Concerning M4 versus M1, an important fraction of the − log 10(p values) were superior in M4 compared to M1. However, for the most significant − log 10(p values), M1 gave sometimes larger − log 10(p values) than M4, for example for the parental model. The − log 10(p values) obtained with M3 and M4 were similar.

Estimation of the QTL allelic substitution effects
In Fig. 4, we represented the allelic effect series of the QTL detected on chromosome six at 82.1 cM in the EU-NAM Flint ancestral model. The estimated QTL effects were conditioned on the cofactors detected with the corresponding final model. The allelic substitution effect values can be found in the supplementary material S3. In the text, the standard errors (SE) of the allelic effects are given in parentheses. In Fig. 4, we observed the differences in terms of estimated QTL allelic effects between M1 using BLUEs representing a main genotypic effect, and M4, which estimates environment-specific QTL effects. The QTL allelic effects were standardized by using the ratio to their standard deviation. A similar comparison between the QTL allelic effects obtained with all methods can be found in the supplementary material S4. The results obtained with M4 were comparable to the ones of M2 and M3.
The QTL detected on chromosome six (Fig. 4) was the most illustrative example of environment-specific QTL effects. In the second environment (TUM), an ancestral allele inherited by parents D152, EC49A, F03802, F2, F283, UH006 and DK105 had a strong negative effect of − 7.2 dt ha −1 ( SE = 0.8 ) with respect to the reference ancestral group. In the first environment (CIAM), the effect of the main ancestral group was substantially reduced to − 1.2 dt ha −1 ( SE = 0.8 ) and non-significant. In the M1 method, the main ancestral allele took an average value ( − 4.1 dt ha −1 , SE = 0.7 ) across the two environments.

QTL effect in relation to environmental information
To illustrate the extension of our models with environmental covariates (5), we re-analysed the effect of the QTL detected on chromosome six (84.2 cM) with the M3 ancestral model including the effect of water precipitation ( Z e ). This QTL had five alleles: allele A (UH007, central parent), allele B (D152, EC49A, EP44, F2, F64, UH006), allele C (F03802, F283), allele D (UH009, DK105) and allele E (EZ5). We used the final QTL model and the average water precipitation in mm at each location between July and August obtained from https ://en.clima te-data.org/. To increase the range of the environmental covariate and estimate more precisely the QTL sensitivity, we calculated our model including the yield performances of two extra environments (Einbeck and Ploudaniel). We considered the precipitation of the driest location (La Coruña, 35 mm) as the reference level. The precipitation in the other environments was expressed with respect to the reference. Table 1 contains the estimates of the QTL main effects ( ) and QTLxE effects ( ) on DMY. We noticed that, in the driest environment (La Coruña), the allele B reduced the yield by 0.75 dt ha −1 compared to allele A. When the level of precipitation increased, this difference was increased by 0.06 dt ha −1 mm −1 . In Table 1, we also notice a large effect and SE for allele E, which is probably due to its low frequency. In Table 2, we calculated the difference in yield between an homozygous genotype with allele A versus B in the four environments ( 2 * ( B + Z e * B ) ). We observed that the difference was equal to 1.5 in the driest reference environment (La Coruña). The extra yield given by allele A increased with more precipitation (e.g. 6.6 dt ha −1 at Roggenstein).

Discussion
Several MPPs have been characterized in multiple environments but, most of the time, the QTL analyses were performed on genotype BLUEs calculated across environments representing average genotypic values (M1). M1 ignores the QTLxE effects. Therefore, it does not use the full information potential of MPP-ME QTL experiments. An alternative to estimate the QTLxE effects in MPPs is to perform separate QTL analyses within each environment (M2). However, this method does not model the covariance due to the repeated measurements on the same genotype. Therefore, we proposed methods M3 and M4, which used simultaneously the phenotypic data from multiple environments taking into consideration the genotypic covariance between environments. With respect to M3, M4 integrated the variation due to experimental design elements performing a onestage analysis. M3 and M4 can also integrate environmental information to characterize the QTLxE effect. We evaluated the methodology by analyzing DMY for the EU-NAM Flint population characterized in two environments. As we could observe in Fig. 2, the differences concerning the list of QTL positions detected by the different methods were important. A significant amount of QTL positions were only detected by a single method. Common QTLs were generally detected in a narrow windows of 0-2 cM. A closer look to the environment-specific analyses (M2) also showed differences between the two environments concerning the QTL locations. The method variability could be explained by their different properties and could reveal some complementarity.

M1: QTLs with consistent effects across environments
In Fig. 4, we illustrated the main limitation of M1: the inaccurate estimation of environment-specific QTL effects. For that particular QTL, assuming that the environmental specificity detected by M2, M3 and M4 was true, we could show that M1 estimated an 'average' effect between the two environment-specific allelic effects. In such a case, the QTL effects would be overestimated in one environment and underestimated in the other. Therefore, in the presence of QTLxE effect, the use of methods M2, M3 or M4 is necessary. The QTL detected on chromosome 6 is a good illustration to estimate QTL variation between environments and between MPP sub-populations.
The use of M1 could, however, still be interesting to detect QTLs with a consistent effect across environments. In Fig. 2, we observed that M1 detected more or an equal number of QTLs than the other methods. In Fig. 3, we also noticed that for the most significant QTLs, M1 obtained the largest − log 10(p values). M1 could have an increased detection power because it uses a reduced number of df to estimate the QTL effect. Indeed, in M1, the QTL term uses n a − 1 df, while in M3 or M4, the QTL term uses N e * (n a − 1) df. When the QTLxE effect is strong, the extra df are necessary to capture the environmental variation. In that case,  The QTL allelic effects (A-E) are expressed in terms of main effect across environments ( a ) and sensitivity ( a ) to water precipitation (mm) with their standard errors (SE). The Wald statistics of each effect and their significance (P(Wald)) are also listed The previous argument can be illustrated by calculating the W significance of an environment-specific QTL and a consistent QTL with a main effect and an environment-specific effect. We performed those calculations for the QTL detected on chromosome 6 at 84.2 cM (Q6, environmentspecific), and the QTL detected on chromosome 10 at 42.3 cM (Q10, consistent). For Q6, W increased from 44.3 to 89.2 when we moved from the main effect ( df = 4 ) to the environment-specific ( df = 8 ) model and the − log 10(p value) increased from 8.3 to 15.2. For Q10, we could also observe an increase of W from 64.1 to 78.4 by moving from the main effect ( df = 9 ) to the environment-specific ( df = 18 ) model. However, in that case, the − log 10(p value) decreased from 9.7 to 8.8. This shows that using extra df to capture QTLxE variation penalizes the statistical test of QTLs with consistent effects across environments.

M2: separate GxE analyses
M2 is a first attempt to estimate the QTLxE effects in MPPs. We noticed that M2 detected the largest number of individual QTL positions in the parental and ancestral models. We could also observe that those QTLs were mostly specific to one environment. Therefore, M2 can be informative concerning the QTLs environment specificity. However, the separate analyses prevent from borrowing information between environments, which decreases the inference power concerning the GxE effects.

M3 and M4: joint GxE analyses modelling the VCOV
The main advantage of methods M3 and M4 over M2 is the possibility to model the VCOV taking into consideration the genotypic covariance existing between environments. This strategy can improve the quality of the QTL detection and provide a greater understanding of the GxE interactions (Alimi et al. 2013;Malosetti et al. 2004).
On the one hand, modelling the genotypic covariance can change the structure of the VCOC, which can make it more sensitive to picking up QTLs. The use of a multivariate test (M3) is also considered to be generally more powerful than a univariate one (M2) when phenotype are positively correlated (Stephens 2013). For example, in the M3 or M4 joint analyses, some QTLs that were considered to be environment-specific or non-significant in M2 showed some significant allelic effects in both environments. For example, the QTL on chromosome 7 at 128 cM detected in the M4 ancestral model (Fig. 1) was not significant in the two environment-specific analyses. The use of a multivariate analysis accounting for the genotypic VCOV could increase to power to detect those QTL effects in both environments.
On the other hand, removing the between environments genotypic covariance can also reduce the QTL effect by modelling some variation that would otherwise be wrongly considered as QTL variation. This could explain the reduced number of unique QTLs detected with M3 and M4. Indeed, Piepho and Pillen (2004) demonstrated that including the genotypic covariance between environments could substantially reduce the QTL effect estimate. They also showed that modelling simultaneously the variation due to the experimental design and the genotypic covariance could further reduce the QTL effect estimate. Such a result could explain that M4 detected slightly less QTLs than M3.
Another important advantage of M3 and M4 over M2 is the possibility to increase the inference power of the MPP-ME QTL analysis. The realization of a unified QTL analysis compared to N e separate analyses makes the interpretation of the results easier. The possibility to estimate both the QTL main effect across environments and the QTL environment specificity is another important difference between M3-M4 and M2. Finally, as illustrated in Tables 1 and 2, M3 and M4 can be extended to integrate environmental information to better characterize and understand the QTLxE effects. The estimation of the water precipitation effect on a single QTL was a simple case, but we could imagine more complex models with more QTLs and/or environmental covariates. Introducing environmental covariate to perform the genome scan will increase the computation time. Therefore, in such a case, we advise to adopt the strategy of Millet et al. (2016) by selecting first QTLs and then consider candidate environmental indices calculated from an eco-physiological model by a backward elimination on possible QTL-environmental covariate combinations.

QTL allelic model
In the results of Fig. 2, we noticed that the bi-allelic model detected a larger number of unique QTL positions. Those QTLs were also more specific to a single method. For example, 27% of the QTLs detected with the bi-allelic model were

Other extensions
For M3 and M4 VCOV, we used a compound symmetry with environment cross-specific errors. More sophisticated structures are possible like the unstructured model using specific genotypic covariance terms for each pair of environments. We could also use a VCOV with cross-specific genotypic covariance terms ( 2 gc ), given that there is enough between environment replications to estimate each cross-specific component.
To present our methodology, we used examples from a NAM population but our methods and the reasoning behind are also valid for any MPP composed of biparental crosses like the designs used in breeding programs. Our methods could also be adapted to the multi-trait situation. The analysis of longitudinal data measured at different time points using a VCOV reflecting the time dependence could be a further possibility.
For an illustration purpose, we used data coming from two environments, which corresponds to the situation of several MPPs tested in contrasting (Herzig et al. 2018) or treatment versus control environments (Garin 2019). The number of environments could be increased, but the fitting of mixed models on large datasets can be computationally intensive. For example, it took us seven hours to perform a M4 parental one-stage QTL detection on a personal computer (Intel Core i7-3770 CPU 3.4 GHz). The use of parallel or cloud computing could be a solution to reduce the computation time. In our article, we privileged an 'exact' (full REML) mixed model computation at each position. To reduce the computational time, 'approximate' mixed model computation like the EMMA algorithm (Kang et al. 2008;Millet et al. 2019) could also be developed.

Conclusions
We compared four methods to detect QTLs in MPP-ME data. M1 performed QTL detection on genotypic BLUEs 'averaged' across environments, which prevents from estimating the QTLxE effects. However, M1 stays interesting to detect QTL with consistent effects across environments. M2 environment-specific analyses are a first possibility to get information about the QTLxE effects, but they have a reduced inference power. To address the weaknesses of M1 and M2, we proposed M3 and M4, which model MPP-ME data jointly accounting for the genotypic covariance between environments. Those methods could increase the QTL detection power by reducing the error term or reduce the false positive detection by modelling genotypic variation that is wrongly assigned to the QTLs. M3 and M4 can also be extended to integrate environmental information and better understand the mechanisms behind the QTLxE effects.
Author contribution statement All authors developed the theoretical framework, edited the manuscript, read and approved the final version. VG wrote the software used for data analysis, analysed data and wrote the manuscript. FvE coordinated the research.

Compliance with ethical standards
Conflict of interest The authors declares that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.