Plant material
The three landraces Kemater Landmais Gelb (KE, Austria), Petkuser Ferdinand Rot (PE, Germany), and Lalin (LL, Spain) were chosen for the production of DH lines because they showed phenotypic variation for early development as well as low levels of linkage disequilibrium (LD) and population structure within populations. They were selected from a set of 35 European maize landraces covering a broad geographical region of Europe that was described in detail by Mayer et al. (2017). Together, they represented 95.0% of the molecular variance of the full set of 35 landraces. From the selected landraces, 1015 DH lines (516 KE, 432 PE, 67 LL) were produced and multiplied using the in vivo haploid induction method (Röber et al. 2005). Phenotyping of lines per se (LP) was conducted in 2017 and 2018. Testcrosses (TC) of a subset of 378 DH lines from landraces KE and PE were evaluated in 2018. To warrant successful TC evaluation, the shortest, earliest, and late maturing lines as well as lines with a high score for lodging were not included in the TC production. The dent line F353 (Institut national de la recherche agronomique, INRA) was used as the female parent in TC production to ensure uniform seed quality across DH lines and because variation in tassel architecture of DH lines hampered detasseling.
Analysis of genotypic data and population structure
The 1015 DH lines and 144 S0 plants (48 per landrace) from the landraces KE, PE, and LL were genotyped using the 600 k Affymetrix® Axiom® Maize Array (Unterseer et al. 2014). Only markers assigned to the best quality class (Unterseer et al. 2014), with a call rate of ≥ 0.9 and with a known physical position in the B73 reference sequence [AGPv4, (Jiao et al. 2017)], were used for the analyses. One S0 plant from landrace PE was excluded due to an insufficient call rate (≤ 0.9). Assignment of lines to their respective landrace was performed using the ADMIXTURE software tool (Alexander et al. 2009) in supervised mode with three pre-defined groups (KE, PE, and LL) that were determined from S0 plants. DH lines with less than 75% concordance with the landrace to which they were assigned by pedigree records were excluded from further analysis. Markers and individuals with > 10% missing values were removed. In DH lines, markers and individuals with > 5% heterozygous genotype calls were discarded, and all remaining heterozygous calls were set to missing values. Missing values in the DH lines were imputed separately for each landrace using BEAGLE 5.0 (default parameters) (Browning et al. 2018). Missing values in the S0 plants were imputed, and two gametes were phased from each S0 plant separately in each landrace using BEAGLE 5.0 (iterations = 50, phase-segment = 10, phase-states = 500) and a reference panel consisting of the corresponding DH lines. Pairwise modified Rogers’ distances [MRD; (Wright 1978)] were calculated, and DH lines showing a pairwise MRD of < 0.05 were assumed to be duplicates and excluded from further analyses. Markers were identified which overlapped between DH lines and S0 gametes. Quality filtering and imputation resulted in 941 DH lines (501 KE, 409 PE, and 31 LL) and 286 S0 gametes (96 KE, 94 PE, and 96 LL) genotyped with 499,574 common markers.
We performed a principal coordinate analysis [Gower (1966), R-package ape] based on MRD for DH lines and S0 plants. The MRD matrices of DH lines and S0 plants were hierarchically clustered using the unweighted pair group method with arithmetic mean (UPGMA) implemented in the hclust function in R and are displayed as 1-MRD. In order to estimate the proportion of molecular variance explained by the three landraces under study, an analysis of molecular variance [AMOVA; Excoffier et al. (1992)] was performed to partition the molecular variation into within- and between-landrace components. This analysis used the panel of 35 European landraces described by Mayer et al. (2017) for comparison. In addition, a second AMOVA decomposing the variance within and between DH lines and S0 gametes was performed to investigate how much of the molecular variance lies within and between those two groups.
Field experiments and phenotypic analysis
Line per se (LP) performance was evaluated in Germany during 2017 using ten separate 10 × 10 lattice designs in four locations (1000 entries: 958 DH lines plus checks) and during 2018 using eight 10 × 10 lattice designs in three locations (800 entries: 756 DH lines plus checks). A randomly chosen subset (five 10 × 10 lattice designs, 458 and 468 DH lines plus checks in 2017 and 2018, respectively) was evaluated in two locations in Spain in both years. The trial locations were Einbeck (EIN, Germany, 2017 + 2018), Roggenstein (ROG, Germany, 2017 + 2018), Bernburg (BBG, Germany, 2017), Klein Wanzleben (KLW, Germany, 2018), Oberer Lindenhof (OLI, Germany, 2017), Golada (GOL, Spain, 2017 + 2018), and Tomeza (TOM, Spain, 2017 + 2018). See Table S1 for a detailed description of the test locations [geographical coordinates, elevation, precipitation, temperature; the climate data was obtained from the Bavarian State Research Center for Agriculture, Landwirtschaftliches Technologiezentrum Augustenberg, and Menne et al. (2012)]. Each combination of year and location was considered to be one environment in later analyses. The number of lines tested had to be reduced between 2017 and 2018 due to seed shortage and the exclusion of lines that did not pass the quality control described above for the genotypic data analysis. In 2017, 14 flint (CH10 provided by Agroscope Changins-Wädenswil (Switzerland); D152, DK105, UH006, UH007, and UH009 provided by the University of Hohenheim (Germany); EP1 and EP44 provided by Misión Biológica de Galicia, Consejo Superior de Investigaciones Científicas, (CSIC, Spain); F03802, F2, F283, F64, and F7 provided by Institut national de la recherche agronomique (INRA, France); EC49A provided by Centro de Investigaciones Agrarias Mabegondo, Instituto Galego da Calidade Aumentaria (CIAM-INGACAL, Spain) and one dent (F353, INRA, tester in testcross evaluation) inbred line served as checks and were included as duplicate entries. The checks were chosen in order to exhibit variation in plant development at early growth stages and flowering time. In 2018, the number of checks was reduced to four lines (DK105, EP1, F2, and F353) included in each lattice design per location (eight in Germany, five in Spain). In both years, the three landraces were included as quadruplicate entries. Plots were single rows 3 m in length with a distance of 0.75 m between rows and twenty plants per plot, corresponding to a sowing density of about 9 plants m−2.
The testcrosses (TC) were evaluated in four 10 × 10 lattice designs in four locations in Germany in 2018 (EIN, KLW, ROG, OLI). In the TC trials, testcrosses of lines DK105, EP1, and F2 as well as testcrosses of the two landraces KE and PE and two commercial hybrid varieties (CH1 = KWS Stabil, CH2 = KWS Figaro) were planted as checks. The testcrosses of landraces KE and PE were planted in one lattice only, while all other checks were planted in every lattice. In TC, plots were double rows 5 m in length at locations ROG and OLI and 6 m in length at locations KLW and EIN, in both cases with 0.75 m distance between rows. Sowing density followed local practice at the experimental stations and varied between 9 and 11 plants m−2. Fertilization and plant protection were carried out according to standard agricultural practices in both the LP and the TC trials.
In the LP trial, a total of 25 morphological, agronomic, and early-development-related traits were measured (Table S2 provides detailed information on trait × environment combinations). The traits that were scored in ≥ 10 environments included emergence (EME, ratio of emerged plants to sown seeds, %), early vigor (EV, at three different growth stages V3, V4, and V6, 1–9 score, 1 = very poor vigor, 9 = very vigorous), early plant height (PH, at V4 and V6, average over three measured plants per plot, cm), final plant height (PH_final, cm), and female flowering (FF, d). Root lodging at the R6 stage (RL, 1 = no lodging, 9 = all plants showing severe lodging) was scored in six environments; tillering (TILL, 1 = no tillers, 9 = all plants showing many and long tillers) and male flowering (MF, d) were scored in five environments. The anthesis-silking interval (ASI, d) was calculated for the environments in which both MF and FF were scored. Ear height (EH, cm) was measured in four environments. In the Spanish environments, physiological traits like the maximum efficiency of photosystem II [Fv/Fm, using a fluorometer (OS-30p, Opti-Sciences Inc., USA)] were measured at stages V4 (2017 + 2018) and V6 (only 2017), and leaf greenness (SPAD) was measured by a chlorophyll content meter (CCM-200, Opti-Sciences Inc., USA; V3, V4 in both years, V6 only 2017). Reaction to stress was scored as cold tolerance (CT, 1–9 score, 1 = low cold tolerance, 9 = high cold tolerance; symptoms were chlorosis and necrosis on the leaves) after a very cold night with a slight frost at OLI 2017, drought/heat tolerance (DT, 1–9 score, 1 = low drought/heat tolerance, 9 = high drought/heat tolerance; symptoms were dry leaves and tassels) at EIN 2018, and rust susceptibility (binary) at TOM 2018. Traits related to tassel architecture were measured in ROG 2018. Tassel length was measured from the lowest tassel branch to the tassel tip (TL, cm), spike length was measured as the length of the top spike (SL, cm), the number of branches was counted (NB), and the tassel angle was scored on a 1–9 scale (TA, 1 = completely upright, 9 = branches horizontal). In the TC trial, EME, EV, PH, EH, PH_final, FF, TILL, and RL were scored as was described for LP. In addition, TC plots were harvested with a forage harvester to measure total dry matter yield (TDMY, dt/ha) and dry matter content (DMC, through near infrared spectroscopy or drying, in %).
The statistical model for estimating genotype and genotype × environment interaction variance components for lines derived from the same landrace was
$$y_{ijkopst} = \mu + m_{i} + \delta_{ij} l_{j} + g_{{k\left( {ij} \right)}} + u_{o} + \delta_{ij} lu_{jo} + gu_{{ko\left( {ij} \right)}} + k_{p\left( o \right)} + r_{{s\left( {op} \right)}} + b_{{t\left( {ops} \right)}} + \varepsilon_{ijkopst}$$
(1)
where i = 1, 2, 3 denotes three groups, i.e., DH lines from landraces (DHL), checks (CH), and landrace populations (LR_S0); j = 1, 2, 3 denotes the three landraces KE, PE, and LL; µ is the overall mean; \(m_{i}\) is the effect of group i; \(l_{j}\) is the effect of landrace j in group i = 1; \(\delta_{ij}\) is a dummy variable with \(\delta_{ij}\) = 1 for i = 1 and j = 1, 2, 3 and \(\delta_{ij}\) = 0 otherwise; \(g_{{k\left( {ij} \right)}}\) is the effect of line k nested in group i and landrace j; \(u_{o}\) is the effect of environment o; \(lu_{jo}\) is the interaction of landrace j and environment o; \(gu_{{ko\left( {ij} \right)}}\) is the interaction effect for genotype k and environment o. The effects \(k_{p\left( o \right)}\), \(r_{{s\left( {op} \right)}}\), \(b_{{t\left( {ops} \right)}}\), and \(\varepsilon_{ijkopst}\) refer to the effect of the lattice (nested in environments), replicate (nested in lattices in environments), incomplete block (nested in replicates in lattices in environments), and the residual error, respectively. All effects except \(m_{i}\) and \(l_{j}\) were treated as random. Genotype and genotype × environment (\(gu_{{ko\left( {ij} \right)}}\)) variance components were modeled individually for the three landraces (j = 1, 2, 3), assuming that DH lines across and within landraces were unrelated. Residuals were assumed to be normally distributed with mean zero and two heterogeneous variances, one for \(\delta_{ij} = 1\) and one for \(\delta_{ij} = 0\) assigning the same residual variance to all three landraces in all environments. Raw data and outliers were manually curated by inspection of residual plots. Since genotyping and the first year of phenotyping were carried out in parallel, some lines were evaluated in the field during 2017 that did not pass quality control in the genotypic data analysis. Measurements for those entries were treated as missing values in the data analysis. The same model was used for the analysis of TC experiments, except that i = 1, 2 referred to DHL and CH and j = 1, 2 referred to landraces KE and PE. Restricted maximum-likelihood estimation implemented in the ASReml-R package (Butler et al. 2009) was used for estimating variance components and their standard errors. Differences among means \(l_{j}\) were tested with pairwise t-tests using the R-package asremlPlus. Trait heritabilities were calculated on an entry-mean basis within landraces (Hallauer et al. 2010), and standard errors of heritability estimates were derived from standard errors of corresponding variance components using the delta method (Holland et al. 2010). Heritabilities and variance component estimates exceeding twice their standard errors were considered significant. Best linear unbiased estimates (BLUEs) of the genotype mean for each trait and DH line were obtained from a simplified version of the model in Eq. (1), dropping factors \(m_{i}\), \(\delta_{ij} l_{j}\) and \(\delta_{ij} lu_{jo}\) and treating genotype (\(g_{k}\)) as a fixed effect. This model was also used to form linear contrasts used to test for significant differences (t-tests) between original landraces and the mean of the corresponding DH library (LP and TC) and between the mean of the two check hybrids and the mean of the DH library (TC only). We calculated the predicted response from selection within DH libraries (LP and TC) according to Falconer and Mackay (1996) as \(\Delta G_{\left( \alpha \right)} = i_{\left( \alpha \right)} h\sigma_{G}\), where \(i_{\left( \alpha \right)} =\) selection intensity for selection with \(\alpha = 10\% \left( {i_{{\left( {10\% } \right)}} \approx 1.76} \right)\), \(h =\) square root of heritability, and \(\sigma_{G} =\) genetic standard deviation. To account for mean differences and different selection responses, we calculated the usefulness criterion (Schnell 1983) as \(U_{{\left( {10\% } \right)}} = \bar{x} \pm \Delta G_{{\left( {10\% } \right)}}\) where \(\bar{x} =\) mean of the respective DH library. Phenotypic correlations among traits were calculated from BLUEs as Pearson correlation coefficients within libraries in LP and TC, respectively. For evaluating the prospects of selection on LP performance in this material, we calculated Spearman rank correlations for same traits across LP and TC. To adjust for multiple testing, Bonferroni–Holm correction was applied for significance tests of phenotypic correlations in each DH library (Holm 1979). For estimating genetic covariances and genetic correlations between traits, the model in Eq. (1) was expanded to a bivariate model with pairs of traits. Genetic correlations were considered significant if they exceeded twice their standard error. The same method was applied for estimating genetic correlations between LP and TC performance.
In summary, high-quality phenotypic line per se data are available from up to 11 environments for 899 DH lines (471 KE, 402 PE, and 26 LL) and for a subset of 378 lines (190 KE, 188 PE) that were evaluated as testcrosses in four environments. For all lines, data on almost 500,000 SNP markers are available.