Assessing the between-country genetic correlation in maize yield using German and Polish official variety trials

Key message We assess the genetic gain and genetic correlation in maize yield using German and Polish official variety trials. The random coefficient models were fitted to assess the genetic correlation. Abstract Official variety testing is performed in many countries by statutory agencies in order to identify the best candidates and make decisions on the addition to the national list. Neighbouring countries can have similarities in agroecological conditions, so it is worthwhile to consider a joint analysis of data from national list trials to assess the similarity in performance of those varieties tested in both countries. Here, maize yield data from official German and Poland variety trials for cultivation and use (VCU) were analysed for the period from 1987 to 2017. Several statistical models that incorporate environmental covariates were fitted. The best fitting model was used to compute estimates of genotype main effects for each country. It is demonstrated that a model with random genotype-by-country effects can be used to borrow strength across countries. The genetic correlation between cultivars from the two countries equalled 0.89. The analysis based on agroecological zones showed high correlation between zones in the two countries. The results also showed that 22 agroecological zones in Germany can be merged into five zones, whereas the six zones in Poland had very high correlation and can be considered as a single zone for maize. The 43 common varieties which were tested in both countries performed equally in both countries. The mean performances of these common varieties in both countries were highly correlated. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-022-04164-2.


Introduction
Variety examination offices evaluate the performance of newly bred varieties for their value of cultivation and use (VCU) before their addition to the national list and admission for commercial use in a country. The main objective of testing is to assess the relative phenotypic performance of the new varieties and only release the best varieties for commercial use. For example, every year on average 20 new maize varieties are registered in Germany. Before registration, they are tested for 2 to 3 years at up to 25 locations in the main growing area of maize in Germany. Similarly, in Poland, on average 30 new maize varieties are registered, after testing in official trials at up to 32 locations over 2 to 3 years. According to Laidig et al. (2008), 50% of the candidate varieties are usually withdrawn by the breeder after the first testing year, and only 25-30% reach the last year and are able to get registration.
Maize is grown across a wide range of environments, and the growth of plants and the harvested product depend on the climatic and environmental conditions. An important objective of plant breeders is to develop broadly adapted varieties for a wider target region. On the basis of similarity in the agroclimatic conditions, a larger region can be subdivided into zones. These zones can extend beyond the political or national boundaries and can be regarded as mega-environments in which genotypes perform relatively homogeneously (Gauch and Zobel 1997). Kleinknecht et al. (2013) and Piepho and Möhring (2005) presented an approach showing how best linear unbiased prediction (BLUP) can be used for selection of genotypes for zones or mega-environments. The BLUP method allows the borrowing of information across zones or mega-environments to exploit genetic correlation between zones (Buntaran et al. 2019(Buntaran et al. , 2020, thereby providing more accurate estimates of genotype performance as compared Communicated by Daniela Bustos-Korts. 1 3 to best linear unbiased estimation (BLUE), which cannot exploit such correlation.
The predictive ability of models can be improved by incorporating soil or environmental covariates (van Eeuwijk et al. 2016). Usually, in the analysis of multi-environment trial (MET) data, the variety-specific regression terms for covariates are taken as fixed effects (Denis 1988). However, when the target region is divided into zones and variety effects are modelled as random to borrow strength across zones, then variety-specific regression coefficients must be modelled as random, giving rise to random coefficient models (Longford 1993;Buntaran et al. 2021).
This paper aims to assess the genetic correlation between the maize agroecological zones of Germany and Poland using official variety trials of maize from 1987 to 2017. The 43 common varieties allow the assessment of the genetic correlation between the two countries and among agroecological zones in these countries. Environmental covariates are also incorporated into the statistical models to achieve better fit and predictions. The rest of the paper is structured as follows. First, we describe the datasets from German and Poland official registration trials for maize. The models are then outlined in detail, followed by an illustration of results for maize trials from Germany and Poland. Finally, a discussion on the results is presented, focusing on the correlation between agroecological zones of Germany and Poland.

Datasets
In this study, datasets from the official grain maize variety trials of the Bundessortenamt (Hannover) in Germany from 1987 to 2016 and the Research Centre for Cultivar Testing (COBORU) in Poland from 1994 to 2017 were used. The German trials were conducted for assessing the value for cultivation and use (VCU), whereas the Polish trials were both VCU and post-registration trials (PDO). The trait yield in dt/ha is considered for the analysis. All trials from Germany were laid out as randomised complete block designs with three replications, while trials from Poland were laid out in a 1-resolvable design with three replicates. The crop management in the trials included standard fertilisation adapted to conditions in each location. The applied levels of nitrogen fertilisation are given in Fig. 1.
The datasets were non-orthogonal because new varieties were tested for two or three years, and after each year, some varieties were withdrawn, whereas new varieties entered the trials. In any particular year, the same set of varieties was tested at each location in that year. Therefore, data within years are balanced for varieties and location; however, data are unbalanced for varieties by years. Some basic information about the datasets is given in Table 1, whereas the year-wise yields for both countries are plotted in Fig. 2. The lines in Fig. 2 represent year mean yield for each country. An increasing trend can be seen in both countries. Between 1987 and 2016, there were 43 varieties tested in both countries. A list of common varieties along with testing year is given in Table 2.
Based on the agroclimatic condition and soil type, the maize-growing area in Germany has been classified into 22 agroecological zones (Graf et al. 2009;Maiskomitee 2022). Similarly, the maize-growing area in Poland is classified into six zones (Fig. 3). Each location used for official maize trials was assigned to one of these zones. Zones with fewer locations were merged with the neighbouring zone because some zones were represented by only a few locations, as depicted in Fig. 3. Also, the merging of the zones was performed after analysing the data and looking into the genetic correlations between zones (results not shown). For example, Zone 1 and Zone 3 in Poland have a higher correlation as compared to the correlation between Zone 1 and Zone 2. Therefore, Zone 1 and Zone 3 were merged to form one zone. This zone merging resulted in five zones in Germany and four zones in Poland, which were subsequently used in the analysis.

Country and agroecological zone levels analyses
Since locations within countries are taken to be a representative sample within countries, Model 1 can be extended as follows, including a country main effect and its interaction with other main effects.
where c l is the effect of the lth country (l = 1, … , n l ) , s jl is the effect of the jth location nested within the lth country, (gc) il is the interaction effect of the ith genotype and the lth country, (yc) kl is the interaction effect of the kth year and the lth country, (gyc) ikl is the interaction effect of the ith genotype, kth year and the lth country, and (gsy) � ijlk is the sum of the interaction effect of the ith genotype, jth location within the lth country and the kth year, and the residual.
In general, to get the genotype-by-country or genotypeby-zone means, we can model the main effects for genotype and country, and their interaction as fixed effects, and the rest of the effects as random, i.e. location, year and any interaction effects with year and location. Stacking effects of the same type into vectors (e.g. all location main effects, etc.), we may assume that these are independently normally distributed with zero mean and variance-covariance structures, s , y , sy , yc , gs , gy , gyc and gsy , respectively.
In the case of exploiting the genetic correlation between countries, the vector g of genotype main effect is modelled as random with the assumption g ∼ N(0, g ) . In this case, the variance of the main effect is equivalent to the genetic covariance between the two countries. For the country level analysis, the fixed genotype effect (FG) model and the random genotype effect (RG) model were fitted. (2) The different agroecological zones for maize in Germany and Poland can be used for the agroecological zonelevel analysis. In this case, the country effect (c l ) in Model 2 is replaced by the zone effect (z l ) , and all other effects stay the same. Thus, Model 2 becomes where z l is the effect of the lth zone. As some agroecological zones were represented by only a few locations, we grouped a few of the neighbouring zones into mega-environments, which resulted in five zones in Germany and four zones in Poland (see Fig. 3). Another extension of Model 3 can be made by considering location nested within zones and zones nested in countries. However, we are not considering this model because our objective here is to determine the correlation between the zones in the countries without considering the effect of the country. For zone-level analyses, only the RG model was fitted since the aim is to exploit the genetic correlation between agroecological zones. Furthermore, in the RG model, different variance-covariance structures for genotype and genotype × country (gc) or genotype × zone (gz) effects were fitted. The models with fixed genotype and random genotype effects are given in Table 3,  while Table 4 contains an overview of variance-covariance structures for each term in the models.

Genetic and non-genetic trend analysis and incorporation of nitrogen application as a covariate in the country level analysis
Model 2 can be augmented for conducting genetic and non-genetic trend analysis as demonstrated by Piepho et al. (2014), Laidig et al. (2014) and Hadasch et al. (2020). The (3) Table 3 Two fixed genotype effect (FG) and six random genotype effect models The index i refers to the ith genotype, j refers to the jth location, and l refers to the lth country or zone. The country term, c , is replaced with z for the agroecological zone level analysis

Model
Fixed part Random part Same random part as in FG Same random part as in RG RC1 Same fixed part as in RGC Same fixed part as in RGC Same fixed part as in RGC (p il + q il x 1jl ) + y k + (yc) kl + (gy) ik + (gyc) ikl + s jl + (sy) jlk + (gs) ijl inclusion of covariates for nitrogen application and year of trial allows a correction for non-genetic trend. Equally importantly, fitting a covariate for the first trial year of a variety allows the estimation of genetic correlation not overly driven by large variation across years generated by genetic gain in the trial period, but more driven by the genetic correlation of common varieties released in a single year. The extended model is where x 1jl is location-specific nitrogen application, x 2i is the first trial year of a variety, x 3i is the calendar year of testing of a variety. The notation for fixed regression terms involving the covariate is 1 x 1jl + l x 1jl Table 4 Variance-covariance structures for each random term in the models The index i refers to the ith genotype, j refers to the jth location, and l refers to the lth country or zone. The country term, c , is replaced with z for the agroecological zone level analysis. The symbol ⊗ represents a Kronecker product of matrices, while symbol ⊕ represents a direct sum of matrices Random term Variance-covariance structure Remarks g var(g) = g = 2 g Identity gc var(gc) = gc = 2 gc Identity g(c) = g + gc var(g(c)) = (g(c)) = Factor analytic order 1 without diagonal variances Heterogeneous country-specific residual variance Random coefficient regression for the genotype-by-country term Random coefficient regression for the genotype and genotype-by-country terms 1 , 2 and 3 are the fixed effects for the slopes of nitrogen application, first trial year of a variety and calendar year of testing of a variety, whereas l is the country-specific slope of the nitrogen application of the lth country. The nitrogen covariate was mean-centred and then divided by 500, since this scaling resulted in non-negative variance estimates for the random coefficient models. The first year of testing and calendar year were mean-centred.
The fixed and random effects genotype models are used in Model 4. The FGC is Model 4 with fixed genotype effects, and the RGC is Model 4 with random genotype effects. For the random genotype effects, the model can be extended to a random coefficient model, which includes the interaction of genotype × nitrogen application and genotype × country × nitrogen application. In this paper, four random coefficient models are fitted. RC1 is a model with random coefficients. A random coefficients model was fitted for the genotype term, g i , and the genotype × country term, (gc) il . Hence, this model has genotype and genotype × country-specific coefficients for the intercepts and slopes. RC2 is a reduced model of RC1, where the random regression coefficients in the genotype main effect are dropped. RC3 is another modification of RC1, in which the random regression coefficients in the genotype × country effect are dropped. RC4 is a modification of RC1 with random regression coefficients for the genotype and genotype-by-country terms with Σ reg ⊗ c l variance-covariance structure (where symbol ⊗ represents the Kroneker product of matrices). Thus, the variance structure of the intercepts and slopes is countryspecific and allows for covariances between the slopes and intercepts for each of the countries. The details of variance-covariance structures for random coefficient models are explained in the next section and summarised in Table 4. All models were fitted in R (R Core Team, 2022) using ASReml-R package version 4.1.0.130 (Butler et al. 2017).

Variance-covariance structures in relation to the random effects of genotype
Genotypes in trials can be regarded as a random sample from a population of genotypes and to exploit the genetic correlation between countries and to extend the variance-covariance structure implied by the Model 2, we merged the genetic main effect g i and the genotype × country interaction (gc) il into a composite genetic effect (g(c)) il nested within countries, that is, If we consider agroecological zones to exploit the genetic correlation between zones, the factor ( c ) is replaced by the factor ( z).
The variance-covariance structure in Eq. 5 is based on the common genotypes in both countries, i.e. (g i(ger) , g i(pol) ) , and can have different structures. The compound symmetry (CS) structure in Eq. 6 is implied by Model 2 if both g i and (gc) il have constant variance. The term 2 g + 2 gc on the diagonal of the matrix in Eq. 6 is the variance of genotypes within one country, and the covariance of the same genotype between countries is 2 g , which is on the off-diagonal of the matrix.
The CS structure assumes a common variance within countries and a common covariance between countries, which is very restrictive and, in many cases, is an unrealistic assumption. The unstructured (UN) variance-covariance given in Eq. 7 is more flexible than the CS structure in the sense that it allows different variances and a separate parameter ll ′ that specifies the correlation between countries l and l ′ .
In the case of two countries, the UN structure is easy to fit. However, this variance-covariance structure becomes very complex if there are several countries or agroecological zones. For example, with five zones, 15 variance-covariance component estimates need to be estimated in the unstructured model. A less computationally expensive structure allowing for heterogeneity of variances and covariances is the factor analytic (FA) model. The first-order FA (FA1) model is composed of covariance terms that are defined as the product l l ′ , where l is the loading for the lth country/ zone. The variance for the lth country/zones is represented by the term 2 l and an additional variance component l . It is also possible to have an FA structure by omitting the term l , which here is called reduced rank FA or FA-0. The FA1 structure for five zones needs only five l terms and five l terms as shown in Eq. 8, which is less expensive than the UN structure. When Eq. 8 uses the reduced rank FA order 1 structure (FA-01), then it needs only five l terms. In Model 4, we include the nitrogen application as a covariate. Thus, when the genotype effect is random, the genotype-specific regression must be modelled as random effects as well, as demonstrated by Buntaran et al. (2021). Therefore, the term g i can be expanded as a random coefficient of genotype × nitrogen application, g i = a i + b i x 1jl , where a i is the random intercept for the i th genotype and b i is the random slope for the i th genotype. In this case, the variance-covariance structure of g has variance estimates for the random intercept and random slopes with a covariance between random intercept and random slope. Furthermore, in the random coefficient model, it is important to have the UN structure for g to ensure invariance with respect to translation and scale transformation of the covariate (Longford, 1993;Wolfinger, 1996;Piepho and Ogutu, 2002). The random coefficient can also be applied for the genotype × country × nitrogen application term, which is expanded as (gc) il = p il + q il x 1jl , where p il is the random intercept for the i th genotype in the l th country and q il is the random slope for the i th genotype in the l th country.

Country level analysis
The fit statistics for all models listed in Table 3 are reported in Table 5. The Akaike information criterion (AIC) based on the full maximum likelihood method is used to compare different fixed effects terms in the models, while the AIC based on the restricted maximum likelihood (REML) was used to compare models with different variance-covariance structures for the random effects. For all models, the AIC was calculated (the smaller AIC is a better fit) using the infoCriteria function from the asremlPlus library.
The infoCriteria function could not compute the full likelihood of FG and FGC models (as the iterations did not converge), so we could not compare these two models. Furthermore, the main purpose of the analysis is to borrow strength between countries, so the comparison of the RG and RGC models is more essential. The complex variance-covariance structure in the RG model did not improve the fit statistics since the AIC based on REML was slightly higher for FA-01 and UN structures than for CS. Therefore, the simpler CS structure was sufficient to explain the variations of gc interaction effects.
For the RGC model, the UN structure had the smallest REML-based AIC, although the value was only slightly smaller than that for the CS structure. On the other hand, the FA-01 structure had a far larger AIC compared to the UN and CS structures. Compared to the RG model, the RGC model had a smaller AIC-full maximum likelihood based on the same variance-covariance structure for gc . Thus, the covariates improved the fit statistics. Among the random coefficient models, the RC4 model had the smallest AIC based on both REML and full maximum likelihood. The RC4 model fitted slightly differently from the other random coefficient models because it had the Σ reg ⊗ c l structure as shown in Table 4, which combined the random coefficient regression for the g and gc terms in a single term g(c) . The AIC between RC1 and RC2 were the same, which showed that dropping the random coefficient term in the genotype did not change the fit statistics. However, when the random coefficient was only retained for the genotype term, the AIC increased.  Table S1). In these two models, the genetic correlations between the two countries, g(c 1,2 ) , were very similar, i.e. 0.890 and 0.884, for the RGC-UN and the RC4 models, respectively. This suggests that the performance and the rank of the overlapping genotypes between the two countries were quite similar. Moreover, the genetic and non-genetic trends and the nitrogen application effects were similar for the two models. Both the regression coefficient of the nitrogen application and the genetic trend were positive. The regression coefficient for non-genetic trend was negative but non-significant. However, the genetic trend and nitrogen application coefficients were positive and larger than the non-genetic trend, so overall the yield was still increasing. Figure 4 depicts the country pair-wise scatterplots of genotype estimates of genotye × country interaction effects from the FG, FGC, RG-UN, RGC-UN and RC4 models. There is a clear distinction between the fixed genotype effect models (FG and FGC) and the random genotype effect models (RGs and RC models). The estimates in the random genotype effect models were more similar between the two countries compared to the fixed genotype effect models, as a result of borrowing information, i.e. the genetic correlation exploited between the two countries. Moreover, we can see that the correlations between mean yield from the two countries from random genotype effect models were close to 1. The high correlation implies that the performance and ranking of the common genotypes were very similar between Germany and Poland.

Agroecological zone level analysis
The zone-based analysis per country was performed prior to the joint analysis of agroecological zones between countries which examines the correlations between zones within a country. The analysis was performed using the RGC model and replacing country effect ( c l ) with zone effect ( z l ). The estimates of variance components from the analysis are given in Table 7. The variance estimate of the genotype × zone effect in Poland was zero, which means that there is no necessity for agroecological zonation or division for maize. However, a small variation in genotype × zone effect can be seen in the German data. The estimates of genotype variance in each zone and zone correlations for German data are given in supplementary Table S2. The small genotype × zone variance estimate in Table 7 for both countries can be explained by the fact that the genotype variance estimates were very similar across zones and there were strong correlations between zones. Table 6 Estimates of covariates and variance components of two best model (RGC and RC4) and associated standard errors (s.e.) Significance level: P < 0.05 (*) and not-significant (ns) † The subscript in random effects "1" represents Germany and "2" represents Poland ‡ The residual is the sum of the three-way interaction (gsy) and the residual As there were not enough genotypes tested for several years across zones in the German data to fit more complex models, the heterogeneous diagonal zone-specific variance structure for genotype × year × zone effect was used. By comparison, in the Polish data, the unstructured variance could be fitted for the genotype × year × zone effect. However, to be consistent for both countries, we also fitted the heterogeneous diagonal zone-specific structure for the genotype × year × zone effect in the Polish data.
Since the variance estimate of genotype × zone effects in the Polish data was zero, in the joint analysis of German and Polish agroecological zones, we merged all zones of Poland into one zone. Thus, the dataset for the joint analysis consisted of five zones in Germany and Poland as one zone. The analysis based on agroecological zones was performed using the RGC model. The estimates of variance components from zone-based analysis using the RGC model are given in supplementary Table S3. The genetic correlations between German  Fig. 5. In most of the cases, the genetic correlations are high between the German zones; however, only German zone D3 has a high correlation with Poland. The genetic correlation between zones was based on the performance of the common genotypes. Although the zones are far from each other on the map, the performances of the common genotypes between these zones were very similar, making the genetic correlation high.

Discussion
A high genetic correlation was observed between Germany and Poland (Table 6), and this is reflected in the similarity of genetic ranking based on the mean yield of common varieties from Germany and Poland reported in Fig. 4. The fitting of heteroscedastic variance-covariance structures to the genotype × country classification provides a view of the similarities between countries. The fit in terms of AIC was not improved for the heterogeneous UN model; however, higher-order models should be preferred to avoid underfitting and the resulting bias (Piepho, 2008). The incorporation of covariates allows for improving models as given in Table 5. The random coefficient models enable the user to fit genotype and genotype × country regressions. However, the results show that random coefficient models could not improve the fit because of the high similarity between Germany and Poland, and genotypes performing similarly in both countries. Because of the high genetic correlation between Germany and Poland, one country can benefit from using maize trial data from the other country and vice versa.
The stratification of locations into zones and the use of random effects models for the genotype × zone classification allows for the borrowing of strength across zones when estimating mean yield per zone. However, to attain reliable estimates of genotype effects, it is necessary for the zone to be represented by a suitable number of locations (Kleinknecht et al. 2013). Many zones in Germany and Poland were not represented by enough locations (Fig. 3); therefore, these zones were merged with the neighbouring zones after assessing the genetic correlations between zones. This resulted in five zones in Germany and four zones in Poland, subsequently used for zonespecific analysis. The heteroscedastic variance-covariance structures used for the genotype × zone classification provided estimates of similarities between zones. The variance estimate of the genotype × zone effect in Poland was zero (Table 7), which means that there is no necessity for agroecological zonation or division for maize. However, a small variation in the genotype × zone effect can be seen in German data. The estimates of genetic correlation between Germany and Poland zones (Fig. 5) indicate that the German zone D3 is highly correlated with Poland. The German zones are also very similar to one another in terms of the mean yield comparisons among genotypes. The high genetic correlation between German and Polish zones does not necessarily mean that zones are agroecologically similar, although climatic and soil conditions in these zones are distinct from one another (Graf et al. 2009). It is well possible that the mean yield of genotypes in the zones are not significantly affected by agroecological differences among the zones, so a high genetic correlation can occur despite such differences, especially when the genetic variance is not very large compared to variance components for genotype × environment interaction effects.