Long-term breeding progress of yield, yield-related, and disease resistance traits in five cereal crops of German variety trials

Key message Considerable breeding progress in cereal and disease resistances, but not in stem stability was found. Ageing effects decreased yield and increased disease susceptibility indicating that new varieties are constantly needed. Abstract Plant breeding and improved crop management generated considerable progress in cereal performance over the last decades. Climate change, as well as the political and social demand for more environmentally friendly production, require ongoing breeding progress. This study quantified long-term trends for breeding progress and ageing effects of yield, yield-related traits, and disease resistance traits from German variety trials for five cereal crops with a broad spectrum of genotypes. The varieties were grown over a wide range of environmental conditions during 1988–2019 under two intensity levels, without (I1) and with (I2) fungicides and growth regulators. Breeding progress regarding yield increase was the highest in winter barley followed by winter rye hybrid and the lowest in winter rye population varieties. Yield gaps between I2 and I1 widened for barleys, while they shrank for the other crops. A notable decrease in stem stability became apparent in I1 in most crops, while for diseases generally a decrasing susceptibility was found, especially for mildew, brown rust, scald, and dwarf leaf rust. The reduction in disease susceptibility in I2 (treated) was considerably higher than in I1. Our results revealed that yield performance and disease resistance of varieties were subject to considerable ageing effects, reducing yield and increasing disease susceptibility. Nevertheless, we quantified notable achievements in breeding progress for most disease resistances. This study indicated an urgent and continues need for new improved varieties, not only to combat ageing effects and generate higher yield potential, but also to offset future reduction in plant protection intensity. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-021-03929-5.


Introduction
Cereals are the most grown crops in Europe. They covered 53.8% of annual crops in the EU-28 in 2020 (https:// ec. europa. eu/ info/ news/ eu-agric ultur al-outlo ok-arable-landarea-conti nue-its-decli ne_ en). Also in Germany, more than 50% (2014-2018) of total arable land is cultivated with cereals (StatJ 2019). In the German market, 29% of cereal grain is used for human food, 44% for livestock feeding, 13% for industrial uses, 10% for bioenergy, and 2% for seed (BLE 2019). Breeding progress has substantially increased potential yields of cereals in Germany by about 0.6%-1.3% per year in variety trials over the last 30 years, while on-farm yields only increased by about 0.5%-1%, leading to increasing yield gaps . In this regard, insufficiently controlled fungal pathogens are one of the reasons for hampering higher on-farm yield progress.
Despite intensive crop protection measures and progress in resistance breeding, considerable yield losses for cereals due to diseases and lodging are reported by numerous Communicated by Hermann Buerstmayr.
While most of the studies evaluated yield loss due to diseases, we found only few studies evaluating the breeding progress for disease resistance and lodging. Most of them focused on winter wheat (Ahlemeyer and Friedt 2011;Leisova-Svobodova et al. 2020;Voss-Fels et al. 2019;Zetsche et al. 2020;Zhang et al. 2020), a few on barley (e.g. Grausgruber et al. 2002;Ahlemeyer et al. 2008), one on triticale (Losert et al. 2017) but none on rye. Only few studies on breeding progress built on historic variety trial time series data (e.g. Mackay et al. 2011;Laidig et al. 2021). Most studies were conducted using historic varieties released over a period of several decades and growing them over 2 to 3 years under natural infection (e.g. Losert et al. 2017;Ahlemeyer et al. 2008;Berry et al. 2015) or artificial inoculation (e.g. Zetsche et al. 2020).
In this regard, one needs to note that there is a major difference in assessing the breeding progress of diseases from trials with historic varieties (also called 'vintage trials') and breeding progress based on historic data. In general, results from trials with historic varieties showed a stronger improvement of disease resistance compared to the progress assessed by historic data (Laidig et al. 2021). This difference can be ascribed to the dynamic virulence of biotrophic pathogen populations. Varieties tend to partially or fully lose their resistance against specific pathogens with increasing age. In extreme cases, the resistance against a specific disease can break down completely (Mackay et al. 2011), as was the case for yellow rust in some cultivars due to the advent of the aggressive Warrior race (Hovmøller et al. 2016). Piepho et al. (2014) described the quantitative assessment of loss of resistance with increasing variety age as "age effect".
Breeding towards varieties with higher resistance against biotic stress is gaining more and more importance (Figueroa et al. 2018;Audenaert et al. 2014;Singh et al. 2019; Miedaner and Wilde 2019). On the one hand, climate change is expected to alter the relative importance of the fungal diseases strongly (Miedaner and Juroszek 2021). On the other hand, the global demand for agricultural products is continuously increasing, while the public debate on the use of pesticides and the negative environmental impact of intensive crop production in the industrial countries is increasing. Therefore, it is a persistent challenge to realize higher yields with less negative environmental impact, i.e. a sustainable intensification of crop production. It is an important goal of agricultural policies in the European Union, including Germany, to reduce the surplus of nitrogen fertilizer and the application of pesticides to mitigate the negative environmental impact and improve the sustainability of plant production (BMEL 2019; EU 2019). The European Commission aims to reduce the overall use of chemical pesticides and especially the use of more hazardous pesticides by 50% by 2030 (EU 2020), which will most likely go along with a continuous reduction of available effective plant protection compounds.
Against this background, new improved varieties are one of the most important and economically favourable resources to cope with the demand for increased food production, for a growing world population and the challenges of climatic change and the agro-political goals for a more sustainable plant production. Numerous studies on breeding progress for higher yield and improved resistance against biotic and abiotic stress have been reported as described above, mostly for winter wheat, but only few for other cereal crops. To the best of our knowledge, so far, no long-term study was published that quantified and compared long-term trends in breeding progress and ageing effects for cereal crops over that many agronomical traits, diseases, years, sites, and genotypes under such a wide range of pedo-climatic conditions as included in this study.
We provide a comprehensive overview of the long-term breeding progress and ageing effects of yield, yield-related and disease resistance traits from five cereal crops based on data from official variety trials carried out across Germany from 1988 to 2019 under two treatment intensities. In particular, we investigated (i) the trends in yield-related parameter, (ii) the trends for ageing effects of varieties, (iii) the genotypic and environmental variation and (iv) we compared the results across crops.

Cereal variety trials
The study was based on data from official variety trials for cereal crops carried out across Germany from 1988 to 2019 under two treatment intensities. The underlying variety testing procedure will be explained in the following. In the European Union, newly bred genotypes have to be officially tested for their value for cultivation and use (VCU) before they can be released for commercial use. In Germany, all cereal crops are tested by the Federal Office of Plant Varieties (German Bundessortenamt; bundesortenamt.de) for three years at multiple locations covering all pedo-climatic conditions of their typical growing regions. Additionally, at least three reference varieties are included in each series, which are identical over sites and within each series. Wellestablished varieties are chosen as references, representing the actual state of breeding progress. The references are updated on a regular basis, ensuring at least partial overlap of sets of references used in successive years. For the 1 3 period before 1990, only data from West German locations are available. After each trial year, varieties, which do not prove additional VCU, are withdrawn from the series. In consequence, only about one third to one fourth of the varieties that originally entered the testing scheme, will be finally tested over the entire three years. Trials are conducted with up to three intensities of different application rates for nitrogen, fungicides and growth regulators, where intensity 1 (hereinafter referred to as I1) generally receives no fungicides or growth regulators and less nitrogen; only in winter rye growth regulators are also applied in I1, but at a lower rate than in intensity 2 (hereinafter referred to as I2). From 1991 onwards, varieties are tested with only two intensities. From 2005 onwards, the nitrogen application rates were standardized and both intensities received the same nitrogen rates (See Fig. 1). In order to be able to consider two intensities in every year over the entire time series, we averaged data of intensity 2 and intensity 3 for the period 1988 to 1990 and used this average as intensity 2. Herbicides and insecticides are applied for all intensities at the same level according to demand. In I2, fungicide application, fungicide selection and application rate are based on the average disease severity of the majority of varieties. The basis for treatment is thus neither the most susceptible nor the most resistant variety. Good local agricultural practice is used for treatment of ears, which are often not yet showing symptoms. Weather conditions are much more decisive for the timing of treatment. Trials are laid out in split-plot designs with main plots arranged in complete blocks. The treatments are applied to main plots, and the varieties are arranged in subplots. Subplots within main plots are either laid out as randomized blocks, or as alpha-lattice designs. The harvested average plot size is about 10 m 2 . Winter barley two-row and six-row varieties are grown in separate trials until 2006 and afterwards together in the same trial. Nevertheless, we considered two-and six-rowed types from 1988 on as grown in separate trials. Winter rye hybrid and population varieties are grown in the same trial and also treated identically during 1988-2019. However, we analysed both types as separate trials.
We evaluated data over a time period of 32 years from 1988-2019 for five cereal crops, i.e. winter wheat (WW), winter rye hybrid (WR Hyb) and population (WR Pop) varieties, winter triticale (WTI), winter barley (WB) tworowed (2r) and six-rowed (6r) varieties and spring barley (SB). Varieties in all crops were line varieties except for WR. These crops account for 48% (2015-2019) of total arable land in Germany. WW is the most important crop (26%), followed by WB (11%), WR (5%), WTI and SB, with 3% each. For WTI, the first separate trial series started in 1988.

Fig. 1
Applied average nitrogen rate in intensity 1 (I1) and 2 (I2) and treatment frequency index (TFI) for herbicides applied to I1 and I2, fungicides and growth regulators applied in I2 and for WR in I1. WW Winter wheat, WTI Winter triticale; WR Winter rye; WB Winter barley, 2r two-row, 6r six row varieties; SB Spring barley 1 3 To have a comparable time period in all crops, we started the entire study in this year.
In each crop, we assessed all traits for I1 and I2 as described in Table 1. Grain yield (YLD), ear density (EAD), single ear weight (EAW), plant height (HGT), days from sowing to ear emergence (EAE), lodging (LDG) and powdery mildew (MLD) were available in each crop. The other traits were crop-specific: in WW brown (leaf) rust (BNR), Septoria leaf blotch (Septoria tritici blotch, STB), Septoria nodorum blotch (SNB), yellow (stripe) rust (YLR) and tan spot (DTR); in WR stem buckling (SBL), BNR and Rhynchosporium (RYS); in WTI BNR, STB, YLR and RYS; in WB and SB SBL, ear buckling (EBL), RYS, net blotch (NTB) and dwarf leaf rust (DLR). Pathology for lodging and disease traits were described in Supplementary Material SM1; for more detailed information on cereal diseases see, e.g. AHB (2020) and Bailey et al. (2003). All evaluated traits were considered important for determining VCU and for the description of released varieties in the annually published "Descriptive Variety List" (BSL 2020). In the early years of the data used in this study, both Septoria diseases were estimated in trials as one joint rating, because the visual symptoms were hard to differentiate. Later on, STB became predominant while SNB lost its importance in leaf and ear infections.
We used only data from varieties tested for at least three years to achieve a good representation of the trial conditions and build on a solid database (Mackay et al. 2011). Data included in this study are shown in Table 2. The data set was highly non-orthogonal with respect to variety-year combinations, whereas the variety/location combinations were orthogonal within year and trial series, i.e. all varieties were grown together at all locations within the same year and trial series. The data were checked for recording errors and outliers by calculating standardized residuals based on Eqs. (1) and (2). We excluded observations with standardized residuals greater than + / − 5.0 from further analysis.
Stem stability and disease severity was assessed visually at a 1-to-9 scale by crop experts in the field according to the guidelines of the Federal Plant Variety Office (Bundessortenamt 2000). For LDG a score of 1 corresponded to "no lodging, all stems upright", 3 to "inclination of stems by 30 degrees from upright position or one fourth of the plot shows stronger lodging", 5 to "inclination of stems by 45 degrees from upright position or one half of the plot shows clustered lodging", 7 to "inclination of stems by 60 degrees from upright position or total lodging on three quarters of the plot" and 9 to "total lodging". SBL had to be observed at the lower two third part of the stem before harvest, and EBL buckling at the upper third part before harvest. For both traits score 1 corresponded to "missing", 3 to "low", 5 to "medium", 7 to "strong" and 9 to "very strong" expression. Even values of scores corresponded to intermediate states.
Diseases were scored at two to three different growth stages for each disease following the BBCH-code (Hack et al. 1992). It should be stated clearly that in the VCU trials of this study, plants were not inoculated with diseases, but disease severity was caused by natural field infection. If disease severity was assessed before BBCH 32 (stem elongation, two nodes detectable), it was scored at the whole-plant level. After growth stage BBCH 32, severity was assessed at the two adjacent leaves showing the highest severity. For all plants throughout the trial, severity was rated from the same two leaves. Then, the whole-plot severity was assessed taking into account diseased leaf area and frequency of diseased plants. The transformation of the average percentage of disease-infected leaf area of a plot into a 1 to 9 scale corresponded approximately to a logarithmic transformation according to following scale: score 1 0%, 2 0% -2%, 3 2% -5%, 4 5% -8%, 5 8% -14%, 6 14% -22%, 7 22% -37%, 8 37% -61%, 9 61% (Bundessortenamt 2000, Sect. 4.1). The recorded score represented the average disease severity of the plot.
Disease severity was scored individually for each replication, intensity level and variety. For the variety registration process, only observations of that growth stage were used, which showed the most clearly visible severity differentiation (i.e. not necessarily the maximum) among varieties. According to Zadoks and Schein (1979, p. 64), this method could be considered as the "critical time method." Observed disease severity scores at this growth stage were averaged over replications. Only these average values were available for the present study's analyses. Throughout this paper we used the following terms: (i) "disease severity" to describe each individual variety's actual visually observed diseased leaf or spikelet area, expressed in the above described "scores", (ii) "trial disease severity (TSv)" to describe the average disease severity within a trial, calculated as the mean severity score over all varieties in the specific trial, (iii) "variety disease susceptibility (VSc)" to refer to the average disease severity of a variety, calculated as the mean severity score across all diseased trials in which this specific variety was grown. Disease susceptibility of a variety might also be considered as the inverse of its "disease resistance". A trial was considered as non-diseased with respect to a specific disease, if no disease symptoms for the specific disease were visible for all varieties in this trial, or, if only a few varieties showed a severity score of at most 2 and the others of 1. We further categorized the traits into two groups, (i) the measured traits YLD, EAD, EAY, HGT and EAE as "continuous" traits and (ii) the visually observed traits as "score" traits. For the score traits, the number of trials from which observations were available could be notably smaller than for continuous traits, because scores were only recorded from those trials which were diseased or showed lodging (Fig. 2). The fertilizer, fungicide and growth regulator rates were recorded in detail for each individual trial and intensity. Nitrogen application rates were accumulated as total kg N ha −1 . The preceding crops' residual nitrogen supply was additionally considered in our analysis, based on Table 7 in DUEV (2017), and added to the applied mineral N rate. The nitrogen equivalent of sporadically applied organic fertilizer was considered and added according to the applied mineral N rate. Unfortunately, no data of plant-available mineralized nitrogen in the soil (Nmin) before sowing was available. Fungicide and growth regulator application rates were standardized using the treatment frequency index (TFI) following Roßberg (2006). Many studies used the TFI to assess plant protection intensities in crop production (e.g. Klocke et al. 2020;Strehlow et al. 2020). Here, the TFI described the amount of plant protection products applied to a specific land unit relative to the application amount recommended by the approval authority for each individual plant protection product for the specific crop. A TFI of 1 might derive from the application of a single plant protection product in recommended full dose, but might also derive from the application of two plant protection products, each applied at half the recommended dose. The TFI was often derived separately for fungicides, herbicides, insecticides and growth regulators. Accordingly, we derived separate TFIs for fungicide, growth regulator and herbicide applications in our study. The average rate of applied nitrogen, fungicides, growth regulators and herbicides are shown in Fig. 1.

General remarks
Stem stability and diseases were assessed on an ordinal 1-to-9 scale, but we analysed them as being on a quantitative scale. This was standard procedure in the analysis of variety trials and, in fact, decisions for registration of new varieties were based on such analyses (e.g. Zhang et al. 2007). Whereas the scoring data cannot strictly meet the usual assumptions of normality and homogeneity of variance, residual analysis revealed no gross departures (we refer to this later in section "Statistical limitations of this study", and so we analysed these data as if they were on a quantitative scale.

Basic model
For a given observation, we used the standard three-way model with factors genotype, location and year given by Laidig et al. (2008) (1) where y ijk was the mean yield of the ith genotype in the jth location and kth year, μ was the overall mean, G i was the main effect of the ith genotype, L j was the main effect of the jth location, Y k was the main effect of the kth year, (LY) jk was the jkth location × year interaction effect, (GL) ij was the ijth genotype × location interaction effect, (GY) ik was the ikth genotype × year interaction effect, and (GLY) ijk was a residual comprising both genotype × location × year interaction and the error of a mean arising from sampling the replications. All effects except μ, and Y k were assumed to be random and independent with constant variance for each effect.

Trend for breeding progress based on variety means and gaps
The trend of breeding progress was based on variety means using a quadratic model given by jk y ijk was the mean of genotype i, averaged over all n i trials in which the ith genotype was present, the covariate r i was the first trial year of the ith genotype. If a variety was in trial for more than 3 years, which was the case for reference varieties, we used only the trials within the first 3 testing years to assess a varieties performance before registration, such that the means of all varieties were based on 3 years. We explain the reasons for using this approach rather than estimating adjusted least square means for varieties in section "Limitations of this study". For score traits, the frequency of diseased trials, and hence the number of observations for individual variety means, might be different from year to year. We therefore applied a weighted regression analysis with weight w i = n i . There were m groups of genotypes with the same first testing year r i , represented by categorical variable C m(i) , where the ith genotype had been assigned to the mth group. C m(i) was a random deviation of the mth group from the quadratic regression line with variance 2 C , and V i was a random deviation of the ith genotype from group C m(i) with constant variance 2 V . The change between 2017 and 1988 was calculated for I2 and I1 by the difference of the prediction for year 2017 and 1988 given by (1) and (2). It should be noted that 2017 was the last "first testing year" in the data 1988-2019.
Further, gaps between predictions for I2 and I1 (I2-I1) at first trial year 1988 and 2017 were calculated to indicate the difference between intensity levels in 2017 and 1988 (treated with fungicides and growth regulators vs untreated). (2)

Trends for ageing effects of varieties
To assess trends of ageing effects for varieties we included all traits. Ageing trends were considered for varieties with more than 10 trial years, which were only reference varieties, so that an ageing effect may be observable. We could not assume that ageing effects at I2 were fully controlled by fungicide and growth regulator application. Therefore, we calculated three different trends: (1) trend for the response at I2, (2) trend for the response at I1 and (3) trend for the difference of the response at both intensities (I2-I1). Further, we assumed that besides the ageing trend, a time trend might be present overlaying the ageing trend such that both trends were confounded. We applied mixed models with linear regression coefficients given by expanding Eq. (1) by a fixed linear regression term as where a ik = t k − r i was the age of variety i at year k, and h = 1, 2 denoted the intensity, ih was the fixed intercept of the regression slope of variety i and intensity h, ih was the regression coefficient for ageing effect of variety i and intensity h. Equation (3)  We further assumed that ih = h + V ih , where h was a common regression coefficient for ageing and V ih was a random deviation from the common regression coefficient of variety i and intensity h, which was considered to be independently distributed with variance 2 Vh (Piepho and Ogutu 2002). Then the fixed linear regression term of Eq. (4) reduced to When h r i − h r i was added to Eq. (5) and rearranged, then we got where was the apparent intercept of variety i and ih = ih + h r i was the apparent ageing effect taking into account a time trend.
We considered not only the ageing effect of both intensities separately, but further also the difference of ageing trend I2-I1, given by If we denoted ̃i 2 −̃i 1 by Δ̃i and ̃i 2 −̃i 1 by ̃i , then the fixed regression terms for ageing trend between I2 and I1 was given by From Eq. (7), we saw that the apparent intercept was a fixed effect, given as the sum of a constant and the product of the regression coefficient of the time trend and the variety's first trial year. We modelled fixed intercepts for varieties to eliminate trends due to genetic progress. The apparent ageing trend was composed of the ageing trend and the time trend. Unfortunately, the intercept ih and the ageing trend h were not estimable, because they were linearly dependent. This is an inherent property of long-term variety trial data ).

Genotypic and environmental variation
We estimated variance components to quantify genotypic, environmental and interaction of genotype environmental variation of crops and their traits using Eq. (1). Estimates of long-term variance components might be biased if time trends are present in random effects. Consequently, trends in traits might result in larger components for genotypic variance. Therefore, we extended Eq. (1) by regression terms to model time trends. To remove possible bias due to trends, we assumed a nonlinear trend in the genotypic effects given by where r i was the first testing year of genotype i and H i was a random deviation from the genotypic trend. Further, in the year effect, we assumed a linear time trend by where t k was the testing year k, and Z k was a random deviation from the linear non-genetic trend.
Then we estimated the variance components of the random effects of the extended model by assuming that the effects were independent and identically normally distributed.

Treatment intensities
For a valid classification and better interpretation of the following results, the trait-specific treatment intensities are graphically provided in Fig. 1. From 2005 onwards, nitrogen application rates were identical in both intensities for all crops. Accordingly, N rates increased in I1 and were reduced in I2 by about 10 to 50 kg ha −1 , respectively, depending on the crop (Fig. 1). Furthermore, nitrogen application levels were lowest in SB and highest in WW. In all crops, except WR, a rather linear increase in nitrogen application rates can be observed. In WR application rates did not increase after 2005. Fungicide applications showed a decreasing trend until about 2000 followed by increases especially after 2011, however, at different levels in the different crops. Here again, in SB the TFI was only about half as high compared to WW. The TFIs of growth regulators increased rather linearly and approximately doubled over the investigated time period, with WR having the highest TFI of all crops and being the only crop where I1 received growth regulator treatment. Only in SB growth regulator TFI increased rather marginally. Herbicide application increased sharply in the first few years and then showed only a slightly increasing trend without much differentiation between crops.

Trend for breeding progress based on variety means and gaps
The number of observations (trials) for a specific variety mean varied considerably among stem stability and diseases, because not at each trial an incidence was observed. As can be seen from Fig. 2, the frequency of occurrence was very different between traits. LDG occurred very frequently, whereas some diseases occurred only less than half as frequent, e.g. YLR. However, varieties with the same first trial year had always identical numbers of observations. As mentioned earlier, we calculated trial means only on basis of a variety's first three trial years. Figure 3 illustrates the breeding progress, where insets at the top right of the individual subplots indicate the absolute changes during 1988-2017 in I1 (left number) and I2 (right number). Landmark varieties were displayed to provide orientation across different traits, as they are popular varieties with considerable acreage in practical farming during a longer period with well-known susceptibility characteristics. Additionally, Table 3 shows trait levels of first trial year 1988 and 2017 in I1 and I2 and the difference between levels in 1988 and 2017 in I2 and I1. Further, gaps between levels I2 and I1 (I2-I1) in 1988 and 2017 and the difference between gaps 2017-1988 were displayed. Most of the trends were linear or showed a slight deviation from linearity, while the shape of nonlinearity differed between traits (data not shown). Nonlinear shapes were diminishing increasing (e.g. in WW and WR Hyb for YLD), concave (e.g. in WB and SB for DLR) or convex (e.g. in WR Hyb for LDG, in WW for STB).
Contrary to WR Hyb, EAD in WR Pop increased by only 98.9 ears m −2 in I1 and 81.4 ears m −2 in I2. The increase in I1 was generally notably higher than in I2, which resulted in a reduction in gaps 2017 between − 14.8 ears m −2 (WB 6r) and − 33.5 ears m −2 (WTI). EAY increased in all crops in I1 and I2, but not in WR, where EAY decreased. Figure 3a further shows increasing gaps between 1988 and 2017 in the range of 0.01 g (WTI) and 0.08 g (WB 6r and SB) per ear. HGT I1 was considerably reduced in WTI (− 14.0 cm) and WW (− 7.2 cm), whereas in WR and WB 6r HGT I1 increased slightly. HGT in I2 changed more than in I1 resulting in narrowing HGT gaps. The trend pattern for EAE was nearly the same in I2 and I1, therefore only the trends in I1 were plotted in Fig. 3a. In WW the period between sowing and ear emergence was reduced by 8.6 days followed by SB with 7.9 days, while not much change was found in WR and WTI.
Next, we looked at traits for stem stability LDG, SBL and EBL shown in Fig. 3b. When considering score traits, we should note that signs of changes and gaps need a reversed interpretation. For score traits, a positive change indicated increased susceptibility, i.e. a negative progress. Furthermore, negative gaps indicated a widening of gaps over time, i.e. that susceptibility in I1 increased stronger than in I2. LDG in I1 improved in WW (− 0.2) and WTI (− 0.6), while for other crops susceptibility in I1 increased between 0.2 (WB 6r) and 0.7 (WB 2r and SB). When comparing differences of gaps between 2017 and 1988, we found a widening among crops ranging from 0.3 in WTI to 1.5 in WB 6r, HGT Plant height (cm); EAE Days from sowing to ear emergence; I1 Intensity 1; I2 Intensity 2; ns non-significant; *Significant at 5% level; **Significant at 1% level; ***Significant at 0.1% level 1 3 which means that LDG increased up to 1.5 scores during 1988 and 2017, and stronger in I1 than in I2. Susceptibility of varieties for SBL and EBL increased in WR, WB and SB in I1 and I2 resulting in increasing gaps.
Finally, we considered the diseases (Fig. 3b, Table 3). Trends in I1 represented breeding progress for disease resistance when no fungicides and growth regulators were applied, but the trend for I2 represented breeding progress under fungicide and growth regulator treatment. MLD was observed in all crops, showing a decreasing trend in I1 and I2, except in WTI, where the susceptibility level increased by 1.2 in I1, but decreased by − 0.3 in I2. Susceptibility across crops in I1 decreased in the range of − 1.0 (WW) and − 0.6 (WB 2r), which showed that susceptibility levels for MLD were considerably higher in WR and WB compared to WW, WTI and SB. In all crops, the gaps decreased, but it widened in WTI by −1.5 scores, which means that the difference in susceptibility between I2 and I1 was 1.5 scores larger in 2017 compared to 1988. We found a strong decrease in susceptibility for BNR in WR Hyb (− 1.3 for I1, − 1.0 for I2), but in WTI susceptibility increased (0.8 in I1, 0.1 in I2). However, gaps decreased in WW, WR, but increased in WR Hyb by 0.3 scores. Notably, WR Hyb showed the highest BNR susceptibility level and WTI the lowest. STB susceptibility decreased only slightly in I1 in WW and WTI; however, in I2, the decrease was larger in both crops. As shown in Table 3, gaps in WW and WTI widened. SNB was only observed in WW showing a significant decrease of − 0.6 in I1 and − 0.5 in I2. YLR did not change in I1 in WW and WTI, and decreased slightly in WTI I2 (− 0.4), but gaps increased in WW (0.4) and WTI (0.3).
Breeding toward lower susceptibility for RYS I1 was successful in all crops, especially in WB 2r (− 1.1) and WB 6r (− 1.0). NTB susceptibility decreased by − 0.4 in WB 2r and 6r for I2, and by − 0.7 in WB 2r and − 0.6 in WB 6r in I2. While susceptibility also decreased by − 0.4 in SB I2, it increased by 0.5 in SB I1, so that the gap in SB increased by 0.9. DLR I1susceptibility declined by − 1.1 in   WB 2r and − 0.9 in WB 6r, but increased by 0.5 in SB I1. In WB 2r, WB 6r and SB I2, DLR susceptibility reduced by − 0.4, − 0.7 and − 0.7 in I2, respectively. In SB the gap remarkably increased by − 1.1, while in WB a considerable decrease was found. As Fig. 3b indicates, DLR susceptibility noticeably dropped until 2010 and 2015 in WB and SB, respectively, and then increased strongly. Table 4 shows the results of the estimated ageing effects of reference varieties over 10 years in I1, I2 and the absolute difference I2−I1. Further details of the regression coefficients for ageing effects are shown in Supplementary Material SM3, Table S1. While the number of varieties differed strongly between crops ranging from 7 in WR Pop to 49 in WW, differences in average age were much smaller ranging from 7.2 years in WW to 9.9 years in WR Pop. Besides ageing effects in I2−I1 d 21 ), we showed the effect in I2 (d 2 ) and I1(d 1 ) separately, since they differed greatly in terms of sign and magnitude. Knowing ageing effects in I2 and I1 was helpful to interpret the ageing effects d 21 . In most cases, the difference between the change estimated in I1 and I2 was very similar to the differences d 21 . However, standard errors for individual regression coefficients of d 1 and d 2 were larger than for d 21 , because random effects for the differences I2−I1 had smaller variances than the corresponding random effect of I1 and I2, separately (data not shown). In the following we focused mainly on the ageing effect d 21 .

Trends for ageing effects of varieties
We first looked at continuous traits YLD, EAD, EAY and HGT. We should note that for continuous traits positive ageing effects for d 21 mean that the decrease d 1 was larger than for d 2 . For YLD, we found significant negative ageing effects d 21 in all crops except in WR. In WR negative ageing effects of similar strength were found for d 1 and d 2 , which led to an ageing effect d 21 of nearly zero. WTI and WW showed the largest ageing effects of d 21 = 5.67 dt ha −1 and d 21 = 2.84 dt ha −1 , respectively. The individual changes for d 2 and d 1 indicated that also in the treated intensity I2 considerable decreases appeared in the different crops, but generally smaller than in I1. This gave evidence that the ageing effect d 21 differed from the ageing effect d 1 . For EAD all ageing effect d 21 were non-significant. For the yield component EAY, a similar ageing pattern as for YLD was apparent. HGT showed significant negative trends in WW (d 21 = − 1.34 cm), WB 2r (d 21 = − 1.66 cm) and WB 6r (d 21 = − 1.06 cm). When comparing the ageing effects in I2 and I1 for these crops, it became apparent, that HGT reduction in I2 was larger than in I1 (WW, WB 2r) or in the case of WB 6r, HGT increase in I2 (d 2 = 1.07 cm) was smaller than in I1 (d 1 = 1.99 cm). This means that over 10 years HGT reduced on average by d 21 = − 1.34 cm in WW, d 21 = −1.66 cm in WR Hyb and d 21 = − 1.06 cm for WR Pop.
Next, we considered the score traits LDG, SBL and EBL, which represented the physical stem stability at the stage of ripeness. Please note that for score traits, negative ageing effects in I2−I1 (d 21 ) meant that increase of susceptibility in I1 (d 1 ) was higher than for I2 (d 2 ). A positive value for ageing effect d 1 and d 2 indicated increasing susceptibility, while a negative value for d 21 indicated increasing susceptibility. For nearly all crops, we observed an increased susceptibility towards LDG. Moreover, WB and SB showed increasing susceptibility for SBL and EBL.
Among diseases, ageing effects for MLD were present in all crops, except in WR and WB 2r. A notably large decrease was revealed for WTI (d 21 = − 2.03). Furthermore, WW and WTI were subject to large BNR ageing effects of d 21 = − 1.31 and d 21 = − 1.24, respectively. In addition, the susceptibility for STB and YLR increased with age in both crops, while susceptibility for RYS was rather stable except for WB 2r (d 21 = − 0.47). Ageing effects for NTB and DLR were identified in WB 6r (d 21 = − 0.66, d 21 = − 0.29) and SB (d 21 = − 0.50, d 21 = − 0.37), respectively. Figure 4 shows variability of TSv scores I1 (grey circles) for crops and traits across years, where the red and blue line represent the year means in I1 and I2, respectively. Nondiseased trials are represented by circles along the 1-score line. All plots show a very large within-year variation of TSv compared to the smaller between-year variation. The line pattern in I1 was very similar for the same diseases and crops, e.g. STB in WW and WTI, NTB, and DLR in WB and SB. The simultaneous epidemic occurrence for YLR in WW and WTI was highlighted. The gradual adaptation of MLD and BNR to WTI from 1995 and 2000 on, respectively, is clearly visible in Fig. 4.

Genotypic and environmental variation
Variance components for continuous and score traits were considered separately and displayed in Fig. 5a, b, respectively. The detailed percentage numbers are listed in Supplementary Material SM4, Table S2. To get an overview on the variation pattern, we averaged variance components across crops and traits for both groups of traits. We found notably different patterns of variation between the two groups of traits. The environmental variation (Y + L + YL) accounted for about 80% on average for continuous and about 60% for score traits, while the residual variance (RES) for score traits was about twice as large (23%) as for continuous traits (12%). The influence of years for continuous compared to score traits was three times higher (12% vs. 4%). We found about the same relation between both groups of traits for the genotype-environment (GY + GL) vs. environmental variation of 5% vs. 2%, respectively. The genotypic variation was low and accounted for only 6% for continuous and 9% for score traits. We first considered the variation pattern for continuous traits (Fig. 5a). The genotypic variation was very small for EAE and YLD, but large for HGT and moderate for EAD and EAY. For HGT, WW and WTI genotypes showed considerably higher variation than the other crops. In all crops, the component for location (L) was higher than that for years (Y), except for HGT. The highest variation for HGT was found in WW genotypes. For all crops and traits, the share of GY and GL variation was nearly negligible. The residual variation for EAE and HGT was considerably smaller than for other continuous traits.
Finally, we looked at score traits (Fig. 5b). We found the largest genotypic variation for MLD, BNR and YLR, especially in WW, WTI and SB, but only a rather low variation for STB, SNB and NTB. In all crops, the highest average variation could be attributed to the YxL interaction followed by the residual error. Especially in WTI, the residual error was very large (e.g. for MLD, BNR). The variation of the year effect (Y) was much smaller than of the location effect (L). The same relation was evident for the interaction of GY and GL, except for YLR. For YLR larger components were estimated than for any other disease, which were 6% and 7% (GY) and 8% and 8% (GL) in WW and WTI, respectively.

Discussion
The primary focus of this study was to quantify trends of breeding progress in cereal crops towards more resistant varieties and to compare the achievements between crops. Yield, yield-related traits and disease traits interact in a complex agro-biological system. They are important traits for breeders to enhance varieties for improved performance and better resilience towards biotic and abiotic stress. Furthermore, they are essential criteria in official variety trials for assessing VCU.

Trend for breeding progress based on variety means and gaps
We found a notable increase in YLD in all crops in I1 and I2, especially in WB 2r and 6r. The increase in YLD I2 in WW and SB was lower than in WR Hyb and WB, which may be attributed to the fact that besides YLD, considerable progress in baking and malting quality in WW and SB has been achieved (Laidig et al. 2017a, b), hampering part of progress in yield potential in these two crops. Further, nitrogen level in WB and SB increased during 1988-2019, while in WW and WTI it was rather constant (Fig. 1). Comparing YLD gaps (I2−I1) in 1988 and 2017, they were only reduced in WW, WTI and WR, but not in WB and SB (Table 3). One possible reason might be the difference in the development of nitrogen application rates in I2 versus I1 in the different crops as shown in Fig. 1. The differences before 2005 were smallest in WB and SB, while after 2005 only the nitrogen levels in WB and SB continuously increased, but in the other crops were rather constant.
In all crops, increase for EAD was larger in I1 than in I2. As Fig. 3 and Table 3 show, the gaps decreased from 1988 to 2017. This observation may be mainly due to the different nitrogen application rates before 2005. Lower N rates in the early years limited EAD in I1, while this limitation diminished in later years. However, higher nitrogen rates also might have led to higher susceptibility for specific fungal diseases (see, e.g. Veresoglou et al. 2013), which then led to a lower increase for EAY in I1 versus I2 over time.
HGT I1 was considerably reduced in WW and WTI. However, in WR Hyb and Pop, even a slight increase was found. In WB and SB, nearly no change in HGT is shown in Fig. 3, which might be due to the fact that no major dwarfing genes were available (Berry et al. 2004). Overall, HGT reduction in I2 was stronger due to the application of growth regulators (Froment and MacDonald 1997), e.g. in WB 2r, HGT was reduced by − 8.0 cm for I2 versus − 2.3 cm in I1. It is remarkable that in WR HGT increased despite increased growth regulator application in I2 and, but at a lower rate, in I1. Photosynthesis is the primary determinant of crop yield (Simkin et al. 2019). In rye, the culm serves as increasingly important reserve storage under stress conditions and main photosynthesis organ (Nalborczyk et al. 1981) with its contribution to overall photosynthesis being higher than in all other cereals (Takeda and Udagawa 1976). Hence, continuous selection for grain yield and stress tolerance may be considered as the main driver for the increase in plant height in both WR variety types. This explanation has a genetic reason but increasing trends of ageing effects for HGT in I1 and I2 (Table 4) suggest that besides a genetic reason also non-genetic effects may be involved leading to increasing trends for HGT. Berry et al. (2015) evaluated the effect of breeding on HGT in WW during 1970-2013 in UK variety trials. They found a gradual HGT reduction until 1990 and then no significant trend between 1990 and 2013. They argued that breeders probably did not find an advantage in further reducing HGT to increase lodging resistance, because shorter plants would counteract in efforts towards further raising yield. We found a similar trend pattern for WW as depicted in Fig. 3, where a strong decrease until 1991 and then nearly no reduction was visible. Berry et al. (2004) stated "that the minimum height that is compatible with high yield is being approached in some cereals. Therefore, breeders must exploit the large amount of genetic variation in the strength of the stem and anchorage system to continue producing lodging-resistant varieties." Nevertheless, in WR, the tallest cereal investigated by far in this study, the gibberellinsensitive dwarfing gene Ddw1 is currently under research  to genetically reduce plant height and improve lodging resistance in future hybrid rye varieties (Rye-sus.eu 2021). The relation between grain yield and HGT was quantified in a study by Casebow et al. (2016). They found a quadratic relationship between grain yield and crop height manipulated by the reduced-height genes with highest yields around 80 cm. They further found an inverse quadratic relationship for nitrogen concentrations in grain at a minimum by about 80 cm.
In all crops, a trend towards shortening of EAE became obvious, i.e. earlier ear emergence. Our results were supported by a study of Voss-Fels et al. (2019) including 191 winter wheat varieties released during the last 50 years, mainly in Germany, and grown under contrasting agrochemical inputs over two growing seasons. Their results showed an increase in days from EAE to 1 July under all input levels. This result suggested that besides climate change also genetic effects may have increased days of EAE to 1 July. Bönecke et al. (2020) also reported a reduced number of days until ear emergence in German wheat since the middle of last century. However, in their study, the period did not shorten when expressed in thermal time, which indicates intermingled effects of genetic and climatic change regarding EAE over time.
LDG occurred in all crops showing an increasing trend towards higher LDG in I1, whereas in I2 the trend was more or less decreasing. LDG is a complicated phenomenon that is influenced by many factors including plant height, stem strength, wind, rain, topography, soil type, previous crop, husbandry and diseases that attack the basal shoots already during early vegetative growth (Berry et al. 2004). The highest LDG level was found for WR Hyb and Pop (Table 3), which corresponds closely to increased HGT despite increased application of growth regulators in WR. Further, a strong increase of EAD is likely another factor for increased lodging as reported by Matsuyama and Ookawa (2020) in WW varieties. They found that decreasing seed rates increased stem stability and resistance against lodging. From this, one can conclude that increased seed rates and EAD increase the susceptibility for LDG. Additionally, the tendency towards more lodging in I1 in cereals might be due to the increased nitrogen application in I1 (Fig. 1) and thus higher EAY and/or increasing EAD as was observed in WR and WB. The decreasing trend for LDG in I2 could be attributed to the higher application intensity and efficacy of growth regulators (Fig. 1). Overall, progress in lodging tolerance in WW was reported by Berry et al. (2004), Berry et al. (2015) and Zhang et al. (2020) mainly in consequence of reduced plant height and improved application of growth regulators. On the other side, higher nitrogen application rates increasing yield potential have raised the risk for lodging, which might also explain the increasing trend for LDG in other cereals.
A serious increase in SBL occurred in WB and SB in I1, but not in I2, which might be attributed to the stem shortening effect. EBL increased in WB 6r and SB in I1 and I2 indicating that growth regulator treatment had no reducing effect for this trait. Generally, Fig. 3 shows that stem stability decreased in the untreated trials.
Before discussing the progress of individual diseases towards higher resistance in the different cereals, we should briefly highlight the special situation in WTI. This cereal resulted from the hybridization of wheat and rye and has only been increasingly cultivated in Germany since the 1990s (Oettler 2005). Originally, WTI was completely resistant to the biotrophic pathogens MLD, BNR and YLR because of the absence of adapted pathogen populations. Therefore, no resistance breeding efforts took place. In the official trials, BNR occurred firstly in 1988, YLR in 1989, and MLD in 2001 and BNR occurred only sporadically on highly susceptible genotypes or at single locations (Fig. 4). A regular appearance or even devastating epidemics were observed for BNR since 1998, for MLD already with the first appearance in 2001, and for YLR in 2014 with the emergence of the Warrior race (see Fig. 4), and these years were also the start of natural resistance selection in the nurseries. From the beginning, there were some varieties protected by wheat genes (Pm, Lr, Yr, Audenaert et al. 2014) until the corresponding virulences spread in the newly emerged triticale pathogen populations as a result of the increasing acreage of WTI. At the beginning of this study (1988) WTI was grown in Germany on 20,000 ha, compared to 358,200 ha at the end (2019). In addition, WTI varieties might also contain specific resistance genes from WR. Before the start of resistance selection, resistances in WTI were chance events. Varieties without such resistance genes were susceptible to a much higher degree than WW varieties because they were not protected by quantitative resistances as these were accumulated in modern WW varieties over many years of breeding (Miedaner and Flath 2007;Serfling et al. 2017). The breeding progress for the resistances to biotrophic diseases estimated here was biased for WTI, because epidemics occurred not from the beginning of the study period onwards and were mainly episodic (Fig. 4). Starting with a rating of 1.0 to 1.7 for all biotrophic diseases in WTI in 1988, the breeding progress could be only very restricted (MLD in I2,YLR in I1 and I2) or even negative (MLD in I1,BNR in I1 and I2).
MLD can infect all cereals due to its different formae speciales. Nevertheless, it is one of the least important pathogens regarding its yield loss effects. This is mainly due to successful breeding by using quantitative resistances, at least in WW (Miedaner and Flath 2007). MLD susceptibility was considerably improved in the range of − 0.6 (WB 2r) to − 1.0 (WW), except in WTI and WR. During 1988-2019, the effect of fungicides reduced MLD infection nearly completely in all crops (conf. Figures 3, 4), even in WTI.
The same degree of control in I2 was also found for BNR in WW, WTI and WR. While susceptibility to BNR was strongly improved in WR Hyb by − 1.3 scores, it still maintains at a high average score of 4.1 in I1. The susceptibility level for STB is the highest compared with other diseases, and the application of fungicides is less effective. In WR Pop no breeding progress was found for MLD and BNR as might be explained by the low breeding activities since the upcoming of widely used hybrid varieties.
YLR occurred only episodically in wheat (1989-1990, 1999, 2001, 2014-2016) (Fig. 4) as in 1999 due to the breakdown of the gene Yr17 (Bayles et al. 2000), but became increasingly important in the last decade due to the advent of the aggressive Warrior race in 2011 (Hovmøller et al. 2016). Reduction in susceptibility for YLR was low in WW (− 0.3) and WTI (− 0.1). The latter could be attributed to the change in susceptibility of some varieties according to the rapidly changing YLR races from the NW European population to the Warrior and Warrior (-) races (Miedaner and Juroszek 2021). However, this was the average change over 31 years, where varieties were mostly resistant against YLR until 2013. The epidemics of YLR led to an increased application of fungicides in WW (see Fig. 1) to such an extent that YLR could be controlled nearly completely during YLR epidemic years.
Susceptibility trend for RYS decreased in WR, WB and SB in I1 and I2. In recent years RYS I2 was nearly fully controlled in WB and SB, but not in WR, where the susceptibility level was notably higher compared to WB and SB (Table 3). Presumably, breeding activities did not focus on improving RYS susceptibility in WR hybrid varieties. In the WR population varieties, a remarkable breeding progress was detected for RYS in I1 and I2; however, only a low number of varieties was included in the analysis.
NTB decreased in both intensities in WB, but not in SB I1 where an increase of susceptibility by 0.5 was estimated. As shown in Figs. 3, 4, DLR was fully controlled in I2, but in I1 the susceptibility in WB 6r and SB decreased considerably until 2008 and then a steep increase followed until 2019. This u-shaped trend cannot be ascribed to changing sensitivity of varieties but should rather be attributed to a lower DLR infestation in the respective years.
Generally, stem stability decreased whereas disease susceptibility was improved for diseases in most crops, but not so for YLR in WW. The application of fungicides and growth regulators decreased LDG, but was less effective in reducing SBL and EBL. Fungicide application was very effective for most diseases and in most crops, except for STB in WW and RYS in WR. Figures 3a and 4 show very similar susceptibility pattern for WR Hyb and Pop varieties, except for BNR where WR Pop varieties were considerably less susceptible in I1 and I2 than WR Hyb varieties. WTI showed a trend pattern for most diseases which was different from the other crops due to the gradual adaptation of wheat and rye diseases to WTI during the study period.
Across all crops, considerable gaps between I2 and I1 in 1988 and 2017 are shown in Fig. 3 and Table 3, with most gaps widening. Decreasing gaps were found only for YLD, EAD and RYS in WW, WTI and WR. This widening of gaps indicated that progress in the treated intensity was higher than in the untreated, which does not mean that no breeding progress was achieved in I1. A stronger convergence between trends could have been expected because nitrogen application was reduced in I2 and increased in I1, while fungicide application level did not change much and only growth regulator application increased (Fig. 1).

Trends for ageing effects of varieties
In Table 4 the changes due to ageing effects over 10 years were shown in I1, I2 and the difference in I2−I1. The sign of individual ageing effects depended on the intensity, traits and crops. Considerable ageing trends for yield and diseases have been reported, e.g. by Mackay et al. (2011) andLaidig et al. (2021) in WW variety trials. Both studies assumed that in the treated intensity no ageing effect was present, which means that genetic and non-genetic trends in I2 and I1 were identical. Further because there was no ageing trend in I2, the difference between I1 and I2 was an ageing effect of I1 on its own. Nevertheless, inspection of our data raised concern that this assumption may not hold true for all traits and all crops. Besides the ageing effect, we therefore additionally included a non-genetic time trend in Eqs. (3-10), and further considered intercepts of individual varieties as fixed to take into account genetic trends in reference varieties, i.e. that genetic trends are removed from trends of ageing effects. Unfortunately, it was not possible to estimate both, trends of ageing effects and time trends separately, because they were linearly dependent ). Nevertheless, the results confirmed our approach; this could be demonstrated for HGT in WR Hyb and Pop, where ageing effects were found in I1 and I2, but not in I2−I1. It was unlikely that HGT of varieties were altered with increasing variety age, but other factors could influence HGT, e.g. trends of treatment intensity (Fig. 1), pre-cropping or soil tillage (Supplementary Material SM2, Fig. S1). The result for HGT in WR suggested that time trends were similar in I2 and I1 such that the ageing effect I2−I1 was not significant. For LDG the trends for ageing effects in crops given in Table 4 might be attributed to other factors. As such, we found a significant decrease of stem stability in all crops in I1, but not in I2. This stability of LDG I2, independent of variety age, could be ascribed to the increasing application rate of growth regulators as depicted in Fig. 1. In another case, e.g. MLD and BNR in WTI, ageing trends I2−I1 were underestimated, because in I2 an increasing susceptibility of varieties was estimated (d 2 = 0.56 and d 1 = 2.48 for MLD, d 1 = 1.35 and d 2 = 0.14 for BNR), resulting in reduced d 21 = − 2.03 for MLD and − 1.23 for BNR.
The largest trend in ageing effects were observed for BNR in WW (d 21 = − 1.31) and WTI (d 21 = − 1.24) (except for MLD (d 21 = − 2.03) in WTI). Concerning climate change, all future scenarios predict a further increase in BNR incidence and severity in NW Europe due to the thermophilic nature of the pathogen (Miedaner and Juroszek 2021). This is likely going to aggravate the existing BNR challenges and newly registered varieties may become susceptible to BNR more quickly, even leading to increasing ageing effects in future.
A further example for increased susceptibility of varieties was that WW pathogens adapted and became visibly virulent in the VCU trials of WTI from the years 1999 and 2001 onward for BNR and MLD, respectively (Figs. 3, 4). The same but inverse effect occurred in WB 6r for RYS; in this case susceptibility I1 (d 1 = − 0.22) and I2 (d 2 = − 0.25) decreased significantly resulting in non-significant I2−I1 (d 21 = − 0.07). We found no sound reason to explain the decreasing susceptibility in I1 and I2. We assume that this effect could be ascribed to other factors which acted in the same direction and about the same magnitude in both intensities.
As we have seen from the above, considerable ageing effects of different magnitude, direction and causes were present. Not only increasing sensitivity or even loss of resistance might be responsible for ageing effects, but also other factors might be involved like changing treatment intensities, altered management practices, climatic changes, or decrease in atmospheric nitrogen and sulfur dioxide deposition (Storkey et al. 2015;Chandramohan and Shaw 2013). Hence, to ensure that reference varieties represent the actual state of breeding progress, it is reasonable to replace reference varieties with newer ones not only to keep pace with breeding progress, but also to avoid biased trends due to increasing ageing effects.

Genotypic and environmental variation
We estimated variance components removing genetic and non-genetic trends and assumed homogeneity, i.e. that components do not change over time. However, this assumption might not fully apply. Hadasch et al. (2020) showed that time trends of variance components were present in WW and WR variety trials, however, of lower magnitude indicating that increasing climatic conditions were likely to influence environmental variability of trials over time.
The large share of environmental variation of 80% for continuous and 62% for sore traits demonstrated the wide range of pedo-climatic conditions under which the trials in this study were carried out. However, the overall genotype and the genotype-environment variation was relatively low and accounted for only about 10% over all crops and traits. A low genotypic variation was estimated for YLD, which accounted for only 2% compared to 9% for score traits over all crops (see Supplementary Material SM4, Table S2). This means that varieties differed only slightly in yield potential, the most important trait. Only genotypes with high yield performance were finally released leading to less variation in this trait. The larger genotypic variation in WW compared to other cereals could mainly be explained by the different yield levels of quality groups (Laidig et al. 2017a). Another reason for the larger genotypic variation of score traits, compared to YLD, was likely due to the fact that only trials with visible disease infection were included which may have led to increased differentiation between genotypes. Additionally, selection intensity is usually much higher for YLD than for disease resistances, thus providing still susceptible genotypes in the VCU trials but not low yielding ones. This restricts the variance for YLD. As heritability estimates are usually higher for disease resistances than for YLD, we can conclude that the latter has a much more complex genetic architecture.
More environmental variation was explained for continuous traits than for score traits. However, we found twice as high residual variation in score traits, which indicates that scores were subject to larger assessment errors than continuous traits. Figure 4 depicted a very large variation of TSv within years, because disease infestation was conditional on the presence of inoculum and the fulfillment of disease specific micro-climatic conditions, i.e. temperature and moisture requirements (Caubel et al. 2017). The concurrent presence of inoculum and fulfilment of climatic requirements were very likely to differ much stronger between trials than between years. These conditions also interacted much stronger with varieties, compared to conditions responsible for yield formation and interaction between varieties as shown by Fig. 3.

Limitations of this study
The disease and stem stability traits were assessed by visual observations on an ordinal 1-to-9 scale, which is an appropriate and widely used method in plant breeding and official variety trials (Zhang et al. 2007). For diseases, this roughly follows a logarithmic transformation of the underlying percentage area of diseased leaves or spikelets (Bundessortenamt 2000, Sect. 4.1). In fact, not every field trial showed an effect on stem stability or incidence for a specific disease, i.e. there were quite some trials in which all varieties had a score of 1. For the assessment of trends of a specific disease, we included only those trials for which 1 3 a severity score for this particular trait was 2 or higher. We did so, because the susceptibility of a variety could only be investigated in trials with visible symptoms. As such, the investigated score traits differed regarding their frequency of occurrence as shown in Fig. 2. For LDG in WR about 80% of trials showed lodging, where for YLR in WTI only 20% were diseased.
In this paper, we analysed ordinal scores on a 1-to-9 scale as if they were metric data. This approach clearly constituted an approximation as the assumptions underlying our linear mixed models could not strictly be met. In a previous study (Laidig et al. 2021), diagnostic residual plots for six WW diseases indicated, however, that no gross departures from assumptions of normality and homogeneity of variance were observed. Thus, we believed that our results were based on the best possible analyses, given the nature of the data. There were, of course, several dedicated statistical methods for ordinal data as alternatives to our approach (Breslow and Lin 1995;Shah and Madden 2004;McCullagh and Nelder 1989, § 9.2.4;Thöni 1985). As outlined in Laidig et al. (2021), however, none of the seemingly obvious alternative routes of analysis were viable options for our data. All of them required a larger sample of independent and identically distributed observations per treatment (variety-by-environment combination), and we did not have such data.
We evaluated breeding progress based on variety means using Eq. (2) and did not estimate variety means (genetic trend) by least square means eliminating year effects, because variety means based on scores may be biased downwards due to existing ageing effects . For score traits, which are bounded below at 1, this ageing effect, combined with the bounded range, might result in least square estimates below 1 or in even negative scores, which are not interpretable (Laidig et al. 2021). In the previous section "Effect of variety ageing" we showed that considerable ageing effects occurred for reference varieties, e.g. in WTI for MLD and BNR, in WW for BNR. We therefore used only the first three trial years of reference varieties to eliminate the ageing effects possibly occurring in later years. A further complication in estimating trends by Eq. (2) might occur because variety means with successive first trial years were autocorrelated. Variety means of successive first trial years had two main effects from two trial years in common and varieties whose first trial year was two years different had one trial year in common. We therefore included an autocorrelation of first order AR(1) in Eq. (2). The estimated changes were often either not interpretable or not estimable, especially if autocorrelation was close to 0 or 1. We therefore did not apply the AR(1) option as Piepho et al. (2015) recommended not using AR(1) models in such cases as we found them. Year effects in variety means were not completely eliminated as in least square estimates, where year was a fixed factor, but years and trials within years were averaged and confounded with the genotypic effect.
Overall, due to the huge extent of the data, respective analysis and results, we were not able to go into such details as would be possible in individual crop-by-crop studies. Instead, in this study, we provided a comprehensive overview on the general breeding progress and ageing effects of cereal crops with regard to yield-related and disease resistance traits. Distinct follow-up studies focusing on single crops including plant physiological or rather agronomic aspects are planned.

Conclusions
In this study we investigated major traits relevant for variety registration and information of growers regarding variety choice in cereal crops. For the first time, results of long-term trends were quantified and compared between five cereal crops based on genotypes grown under two different treatment intensities in a wide range of pedo-climatic conditions.
Considerable increasing yield trends was found suggesting very successful breeding progress in cereal crops, where the highest gains were achieved in winter barley and the lowest one in winter rye for population varieties. Yield gaps between treated and untreated intensity (without fungicides and growth regulators) increased from 1988 to 2017 in the barleys, but decreased in other cereals.
In contrast to increasing yield trends, however, stem stability decreased in all crops during 1988 and 2017 despite successful breeding activities for reduced plant height, except in winter rye. Hence, breeding towards higher stem stability requires more attention in the future. With the exception of rye, the potential of further reducing plant height without impairing further yield increases appears to have been exhausted in cereal crops. Our results demonstrated progress in improving disease resistances of varieties to mildew, brown (leaf) rust, scald and dwarf leaf rust but no or only a low increase for resistances to Septoria leaf blotch, yellow rust and net blotch. For stem stability and disease resistances, the declining trends in the treated intensity were stronger than in the untreated.
Our results revealed considerable trends due to ageing effects in yield, stem stability and diseases not only in the untreated but also in the treated intensity. The strongest decline for yield occurred in winter wheat and winter triticale. This decline was mainly attributed to a gradual loss of disease resistance and decreasing stem stability. The strongest increase in disease susceptibility due to ageing was shown for brown rust, followed by yellow rust and Septoria leaf blotch. Our results strongly demonstrated that evident ageing trends are calling for continued breeding efforts and release of new improved varieties to maintain performance at the same level.
The very large environmental variation of about 80% for yield and 60% for stem stability and diseases compared to genotypic variation of 2% and 9%, respectively, indicated the wide range of environmental and pedo-climatic conditions included in our study. For all crops, the year-to-year variation in yield was about four times that of stem stability and diseases, while the residual variation in yield was only half that of stem stability and diseases, suggesting that score traits are subject to larger assessment errors especially when the average infection severity is low.
This study gives important and novel evidence on the long-term breeding progress achieved, and on the performance-reducing ageing effects in all relevant cereal traits. The results and conclusions can therefore provide evidence supporting the ongoing discussion on the appropriateness of chemical plant protection in cereal crop management. Further intensification of resistance, breeding efforts will be necessary in the future in order to maintain the productivity of cereals with lower crop protection intensity and availability.