Introduction

Screening of breeding populations has been used to develop disease-resistant plant varieties that underpin food and fiber production around the world (Beebe and Corrales 1991; Johnson 1992; Bernardo and Yu 2007; Boyd et al. 2013; Nelson et al. 2018). Many commodity crop breeding programs have established dedicated disease screening facilities in areas where the pressure from pathogen populations is high and disease symptoms are readily assessed (Nene 1988; He et al. 2013; Araus and Cairns 2014). When varieties are established across the landscape, pathogens evolve to overcome the mechanisms underlying plant resistance, and new strains are encountered, with in-field screening of diverse breeding populations where the outbreaks occur often used as a first response (Singh et al. 2008; Gururani et al. 2012). Alternatively, as new variants of a pathogen are observed, they are used to screen breeding populations in controlled greenhouse or laboratory environments. This has been used to identify resistant and susceptible plant varieties and to provide an understanding of the molecular interactions between hosts and pathogens of interest (Neik et al. 2017; Karki et al. 2021). Characterization of host and pathogen interactions has developed with improved phenotyping methods providing increasingly detailed descriptions of symptom expression and molecular changes that result from challenge or infection. While qualitative resistance was the dominant paradigm for some time, with one or a small number of genes assumed to regulate disease incidence, these more refined phenotypic observations of disease symptoms have led to an understanding that diseases may be polygenic or quantitative, where the interplay of many genetic factors results in disease. Quantitative, or partial, resistance mechanisms have been proposed to describe how many genes contribute smaller effects to disease severity (Palmer et al. 2004; Poland et al. 2009; St.Clair 2010) with evidence and theory indicating that polygenic forms of disease resistance may provide more durable forms of disease resistance (Gou et al. 2023).

Fusarium verticillioides Nirenberg (previously F. moniliforme Sheldon, teleomorph Gibberella moniliformis Wineland) is a common pathogen infecting maize (Zea mays L.) in all areas of the world that cultivate maize and is the primary causal agent of Fusarium ear rot (FER) (van Egmond et al. 2007; Zila et al. 2014). Several Fusarium species can damage tissue in nearly all parts of the maize plant. Symptoms reflecting the interaction of the host and pathogen are highly variable between tissue types and the present phenotypic diversity among maize varieties, pathogen isolates, and environments, which change from season to season (Covarelli et al. 2012; Gai et al. 2018). F. verticillioides is a soil-borne, hemibiotrophic pathogen closely related to other species of Fusarium found infecting maize (Proctor et al. 2010; Duncan and Howard 2010; Beccaccioli et al. 2021). Most Fusarium species, F. verticillioides included, can persist in the soil or on plant debris for long periods as hyphae, conidia, and chlamydospores before infecting a new plant, doing so by first feeding off living tissue (often asymptomatically in the case of FER), before switching to a necrotrophic phase and feeding of dead tissue. By these methods, F. verticillioides can infect a plant through roots, stems, leaves, and ears before spreading systemically within the host and manifesting in a different tissue causing disease symptoms (Murillo-Williams and Munkvold 2008). Additionally, it takes advantage of other common maize pests, such as insects and birds, by invading the wounds they create and then spreading into the plant (Reid et al. 2002; Alakonya et al. 2008). Perhaps the most important component of FER is the production of the mycotoxin ‘fumonisin’ which is produced as a secondary metabolite by F. verticillioides and other Fusarium species. Consumption of fumonisin-contaminated food has profound health implications in livestock, where it causes various diseases and cancers, and is also a ‘probable carcinogen’ of humans (Rheeder et al. 2002; Bush et al. 2004). These features make Fusarium one of the most important fungal genera in the world and the FER pathosystem an important target for research (Dean et al. 2012).

Host resistance to FER is considered to be highly quantitative (Robertson-Hoyt et al. 2006; Chen et al. 2012) and is therefore difficult for breeders to incorporate into breeding populations as many markers must be tracked and accumulated into new varieties. Nevertheless, quantitative disease resistance is the type of resistance most widely used by maize breeders as it provides a more durable form of disease resistance compared to qualitative resistance (Kou and Wang 2010; Yang et al. 2017). Plant breeders have demonstrated repeatable differences among large populations of maize varieties using disease screening and demonstrated that variability in disease symptoms may be attributed to genetic differences within both the host and the pathogen populations (McDonald and Linde 2002; Holland et al. 2020). To better understand the variation within pathogen populations, plant pathologists have challenged resistant and susceptible varieties of the host with diverse pathogen populations and identified pathogenicity genes that regulate virulence (Buerstmayr et al. 2020; Chitwood-Brown et al. 2021). Efforts have been made to develop reliable and efficient artificial inoculation methods to better control for the effects of environmental variation, increase heritability estimates, and allow for greater replication in controlled screening systems. While plant breeders and pathologists have worked together to understand the pathosystem, the evaluation of resistance and virulence simultaneously using diverse populations is experimentally challenging and difficult to achieve. The exponential increase in sample size required to test all comparisons across two diverse populations has limited the number of empirical estimates of genetic variation attributable to variable hosts, pathogens, and their interactions (Hickey et al. 2014; Butoto et al. 2022). While it may be infeasible to observe all host and pathogen combinations in any given experiment, indirect predictions of all interactions may be achieved using genomic data.

Due to the advances in genotyping, obtaining genomic data for large pathogen and breeding populations is feasible, and this information may be used to explore pathosystems (the relationships between hosts and pathogens). These are evolutionary systems that result in an arms race for resistance (host) and pathogenicity (pathogen); the interaction between gene products leads to various patterns of resistance and disease. Specific genes in both hosts and pathogens have been identified to cause disease by knocking-in or knocking-out genes to determine the roles they play in resistance (Staskawicz et al. 1995; Hammond-Kosack and Jones 1997). More recently, genome-wide sets of markers have been used to explicitly define connectivity among individuals as relatedness estimates, and incorporating this information in linear models has been used to provide indirect predictions for any genotyped individual. As shown for physiological epistasis, additivity at the gene level may express complex interactions at the crop level or, in this case, the pathosystem level (Messina et al. 2011, 2018). In this study, we extend this approach to both host and pathogen populations to provide models that may be used to predict host resistance, pathogen virulence, and the effect of any unique host-by-pathogen combination. Incorporating relatedness estimates derived from genomic marker data collected from large breeding populations is used extensively to improve the accuracy of varietal predictions and to provide predictions for individuals that have been genotyped but not phenotyped (Meuwissen et al. 2001; Su et al. 2014; Bernardo 2021). Genome-wide association studies (GWAS) have been used to identify markers that tag genes associated with resistance in breeding populations (Zila et al. 2013, 2014; Gurung et al. 2014) and pathogen populations (Demirjian et al. 2023). These GWAS studies rely upon relatedness matrices derived from genome-wide marker panels to account for polygenic effects and to provide more reliable estimates of marker effects (Yang et al. 2014). Integrating phenotypic data from disease screening with relatedness matrices from both host and pathogen populations into a unified genomic prediction model may be used to: 1) identify alleles to fix in breeding populations, 2) improve the accuracy of disease severity predictions, and 3) quantify the relative importance of host, pathogen, and their interactions for predicting disease development.

Here, we present an approach for generating genetic parameter estimates along with predictions of genetic effects for hosts (Breeding Values, BVs), pathogens (Virulence Values, VirVs), and their interactions using genomic data from both sides of the FER pathosystem. Prediction models developed to examine genotype by environment interactions in plant (Jarquín et al. 2014) and animal (Přibyl et al. 2013) breeding programs are used to incorporate genomic relatedness matrices from diverse maize and Fusarium populations along with a similarity matrix for interactions. Heritability and stability estimates derived from linear mixed models incorporating the standard experimental design effects, marker-derived similarity estimates, and a reaction norm model for interactions are presented to demonstrate changes in genetic parameter estimates for resistance and virulence that result from the incorporation of genomic information.

Materials and methods

Experiments were undertaken to provide the phenotypic data required to compare statistical models that incorporate genome-wide marker sets for both host and pathogen populations. For model comparisons, estimates of genetic parameters, cross-validation model accuracies, and likelihood estimates of model fit are provided. Genomic relatedness matrices describing the structure of host and pathogen populations along with predictions of resistance or virulence are provided to summarize relatedness and the results of the screening experiment.

Host population

The highly diverse population of maize varieties selected for this study was grown in Citra, FL (University of Florida Plant Science Research and Education Unit) in three seasons from 2022–2023 (planting dates in March–April and August–September). Both field and sweet corn varieties (all maize genotypes are in Supplemental Table 1) were grown in this experiment and were chosen to maximize the population diversity based on the phylogenetic distance between varieties as determined in previous research (Hu et al. 2021; Colantonio 2022). There were 20 seeds sown in each of three replicated blocks of the field trial to provide 30 ears of suitable quality for disease inoculation and subsequent phenotyping of disease severity. The 30 ears of each variety were harvested 21–24 days after 50% of the plants reached their silking date. Ears from each variety were harvested and stored at room temperature overnight prior to inoculation. Each set of 30 ears was divided into 10 sets of three, and each set of three ears was inoculated with one of the Fusarium isolates. This provided three replicates of the same variety-by-isolate combination, where each variety was inoculated with 10 different isolates in a circular design (Fig. 1). In Fig. 1, each red cell identifies host and pathogen combinations that were tested. Once the 50th Fusarium isolate was used in inoculations, the first isolate was used again with isolates being used in sequence until all maize varieties had been infected with 10 different pathogen isolates. For randomization, Fusarium isolates and maize varieties were randomly assigned a treatment number between 1–50 and 1–158, respectively, so that the host and pathogen pairs that would be tested in the study were clearly identified prior to inoculation.

Table 1 Comparisons of genetic parameter estimates, prediction accuracies, and model accuracies
Fig. 1
figure 1

Display of the circular design used for phenotyping. Red cells indicate host and pathogen pairs that were phenotyped with three replicates. Each number corresponds to a host variety and a pathogen isolate that were randomly assigned prior to inoculation

Pathogen population

The Fusarium population was thought to consist of 46 isolates of F. verticillioides and one isolate each of F. circinatum, F. temperatum, F. proliferatum, and F. subglutinans so that a total of 50 isolates were used in the screening. Whole genome sequencing of the isolates and subsequent analysis revealed that three additional isolates were not F. verticillioides (FV62720 is F. subglutinans; FV32966 and FV32964 are F. proliferatum), bringing the total to 43 F. verticillioides isolates and seven out-species isolates. The isolates were gathered from across the US and were primarily sourced from the ARS culture collection (NRRL). Additional isolates were provided by other researchers and collected from the field locally (Florida). Isolates were cultured on potato dextrose agar (PDA) or isolated from plant material if collected directly from the field. A single spore was extracted from each isolate and grown on a PDA plate before storing at − 80 °C. Spore suspensions of each fungal isolate were made by culturing stored isolates on PDA for 10 days at room temperature before adding four 1cm plugs from the plate to 45 mL of 1/3 strength potato dextrose broth (PDB) that was shaken continuously for five days at 300 rpm. The spore suspensions were then filtered through sterilized cheesecloth to remove the agar and any mycelia. Suspensions were quantified and, depending on the concentration, diluted or concentrated to 5 × 105 spores/ml.

Inoculations and incubation

Inoculation of maize ears with Fusarium isolates began by cleaning the husk with 70% ethanol, drawing a circle on the husk slightly below the midpoint of the ear, and puncturing the ear with a sterile 18G needle to a 1″ depth or until impassible resistance was reached. The needle used to puncture the ear was removed, and a second needle, attached to a Prima Tech® 5cc Adjustable Dose Vaccinator, was inserted into the new hole for inoculation. The primed vaccinator was then used to dispense 0.75 mL of the spore solution into the wound. Once the spore solution was dispensed, the needle was removed, and the ear was enclosed in a pre-labeled plastic bag that was sealed to prevent contamination. Inoculated ears of the same maize variety were placed in plastic bins with the inoculation wound facing down to incubate until the lesion size was measured for the disease severity assessment. Bins were stored in a growth chamber with 12/12 h of light/dark, and the temperature was set to 30 °C to provide conditions that were near optimal for F. verticillioides growth (Samapundo et al. 2005). Ears were incubated for seven days, then removed from the bin and the bag, husked manually, and their shanks and silks were removed, and any damage to the ear that was not associated with the inoculation was removed with hand shears. In cases where poor pollination (< 50%) resulted in a lack of kernels, severe physical damage caused by insects, or where multiple or different pathogen infections were seen and may have led to an inaccurate phenotype, the ear was discarded.

Phenotyping

A circular design, which has been shown to increase the efficiency per unit of observation and to improve the accuracy of genetic parameter estimates in breeding populations (Huber et al. 1992), was used to assign Fusarium isolates to each maize variety (Fig. 1). Of the 7900 potential variety-by-isolate combinations, 1400 combinations were observed and scored. Three ears provided replication for each host and pathogen combination that was observed in the phenotypic dataset. Disease severity was assessed by counting the number of kernels with visual disease symptoms. Visual estimations of disease severity have been used in genomic prediction models for FER, though different scales are used. Infected kernels were counted to increase the replicability compared to other methods that use scales (1–7 or 1–9). Qualitative differences in color, texture, and depth of infection were noted; however, these variables were inconsistent and were not used for estimations of disease severity.

Genomic data

The maize population originated from two separate sources: Bukowski et al. in 2018 (Maize 282 AGPv3) and Colantonio in 2022 for the field and sweet varieties, respectively. Raw reads from the sweet population were aligned to the B73v3 reference genome before calling variants. The field corn population’s variants had already been called against B73v3, so the variants of both populations could then be merged (Purcell et al. 2007). By merging the two genotypic datasets, the number of variants shared between at least 40% of the individuals was reduced to 155 K SNPs.

The genomes of all Fusarium isolates were sequenced using pair-end Illumina short-read sequencing, and reads were aligned against a Fusarium verticillioides reference genome (assembly ASM14955v1) for variant calling. Fusarium genomic data were filtered for quality, and adaptors were trimmed using FastQC (Andrews 2010) and Trim Galore, respectively. Genomes were indexed and aligned using BWA (Li and Durbin 2009) and sorted using Samtools (Li et al. 2009) prior to variant calling with GATK.

Genomic relatedness matrices

Marker data were used to derive marker similarity matrices for hosts and pathogens for incorporation into mixed models as genomic relationship matrices (GRMs). All 155 K SNPs available for the maize population were used. The pathogen (Fusarium) VCF generated from GATK contained 5.7 M SNPs. Filtering of SNPs for both host and pathogen datasets was completed using the ASRgenomics package (Gezan et al. 2021). For both populations, markers were eliminated where the minor allele frequency (MAF) was less than 0.05 and missing SNP calls were greater than 0.5. All individuals were retained to obtain relatedness estimates for their respective GRMs. By using these filtration steps, the maize data consisted of 154,661 markers, and the Fusarium marker set was reduced to 530,394.

Following quality control, the maize VCF files were read into the AGHmatrix package (Amadeu et al. 2016) to produce the GRM using the Van Raden method (VanRaden 2008). To produce the GRM for the haploid Fusarium isolates, PLINK (Purcell et al. 2007) was used with the ‘haploid’ option, producing relatedness estimates using the 0/1 gene content coding described for the male X-chromosome (Druet and Legarra 2020). Diagonal and off-diagonal elements of the maize and Fusarium GRMs are summarized and provided in Supplemental Fig. 1. Estimates of inbreeding and relatedness coefficients in the pathogen GRM greatly exceeded theoretical limits when isolates of the seven out-species were included. Rather than discarding these isolates and their phenotypic data, a hybrid GRM that contained relatedness estimates from F. verticillioides along with the out-species was created using the ASRgenomics package (Legarra et al. 2009; Gezan et al. 2021). The identity matrix associated with all Fusarium isolates was blended with the GRM derived from marker data for F. verticillioides isolates to incorporate the alternative species as uncorrelated effects. This allowed for the incorporation of all isolates in the pathogen relationship matrix and the exclusion of bias associated with relatedness estimates coming from different species. The ASRgenomics R package was used to blend, bend, and then align the design and genomic relationship matrices using a lambda value of 0.95 to produce the hybrid genomic relationship matrix for Fusarium. The resulting maize and Fusarium GRMs are presented in Figs. 3 and 4.

Statistical models

Mixed linear models of increasing complexity were used to estimate variance components and provide predictions for host, pathogen, and host-by-pathogen interaction effects. REstricted Maximum Likelihood (REML)-derived variance components were used to estimate genetic parameters approximating the relative importance of host and pathogen main effects as well as their interactions. The complete linear model was defined as:

$${y}_{ijn} = \mu + {H}_{i} + {P}_{j} + H{P}_{ij} + {E}_{n}$$

where Yijn is the vector of n phenotypic observations; µ is the overall mean lesion size estimate; Hi is the deviation of random host variety effects that are independent and identically distributed following a normal distribution with a mean of zero and a variance associated with host (H) effects, ~ N(0,σ2H); Pj is the random effect of j pathogen isolate (P) effects, ~ N(0,σ2P); HPij is the random interaction effect associated with the combination of host i and pathogen j, ~ N(0,σ2HP); and En is the error associated with each individual maize ear, ~ N(0,σ2E). In total, nine models were fit with either design or relationship matrices included for host, pathogen, and interaction effects. The simplest models (Models 1 and 2 in Table 1) included either the host or the pathogen effect with a design matrix associated with random effects: \({y}_{in} = \mu + {H}_{i} + {E}_{n}\) for host, and \({y}_{in} = \mu + {P}_{i} + {E}_{n}\) for pathogen. The design matrix for each host or pathogen was then substituted for a genomic (or hybrid) relationship matrix for host or pathogen (Models 3 and 4 in Table 1). The next models (Models 5 and 6 in Table 1) included both host and pathogen effects associated with design matrices and relationship matrices, respectively: \({y}_{ijn} = \mu + {H}_{i} + {P}_{j} + {E}_{n}\). The final three models are expressed in the complete linear model equation above, where host, pathogen, and the interactions (HPij) were estimated using design matrices (Model 7). Host and pathogen populations were then incorporated as relationship matrices with the interaction included as a design matrix (Model 8), and finally, all genetic effects were associated with relatedness matrices (Model 9).

The variance of genetic effects in models that incorporate relatedness among experimental treatments is associated with genomic relationship matrices that connect observations using the correlation of host (GH) and pathogen (GP) effects rather than the design matrix that associates observations with experimental treatments and assumes treatments are independent and unrelated. When GRMs replace design matrices, this equates to observations that are normally distributed with a mean of zero and the GRM specifying the covariance matrix of observations that are distributed with the variance among genetic effects, ~ N(0, Gi σ2i). For the host-by-pathogen interaction effects, a reaction norm model was used so that the covariance matrix describing the similarity of each host and pathogen pair was estimated as the Hadamard product (represented by ‘·’) of relatedness matrices GH and GP expanded using design matrices associated with each factor. To obtain a covariance matrix that aligned with these interaction effects, the GRM was expanded to align host and pathogen pairs with their corresponding interaction, where GHP = (ZHGHZH′) · (ZPGP ZP′), and Z is the design matrix aligning the host and pathogen treatment levels of the relatedness matrix with the associated interaction term (Jarquín et al. 2014; Crossa et al. 2022).

Estimates of genetic parameters, prediction accuracies, and model accuracies

Models including treatments as standard design effects were compared with models including genomic relationship matrices that make the similarities among treatments explicit. To compare the efficacy of different models and to understand the impact of incorporating host-by-pathogen interaction effects, genetic parameters, log-likelihoods, and tenfold cross-validation model accuracies were estimated for each of the increasingly complex models. Models were used to generate genetic parameter estimates, predictions of breeding values (BVs) for hosts, and predictions of virulence values (VirVs) for pathogens. Predictions derived from standard design effects were also compared with predictions resulting from a reaction norm model, which uses a marker-derived covariance matrix that approximates similarities between all possible host and pathogen pairs to predict host-by-pathogen interaction values (HPIVs). The proportion of variance attributable to genetic effects, or heritability, was estimated for both host (H2) and pathogen (P2) populations as the ratio of estimated genetic variation to total variation: \({H}^{2}=\frac{{\sigma }_{h}^{2}}{\left({\sigma }_{h}^{2}+{\sigma }_{p}^{2}+{\sigma }_{hp}^{2}+{\sigma }_{e}^{2}\right)}\); \({P}^{2}=\frac{{\sigma }_{p}^{2}}{\left({\sigma }_{h}^{2}+{\sigma }_{p}^{2}+{\sigma }_{hp}^{2}+{\sigma }_{e}^{2}\right)}\), where σ2h is the variance associated with differences among host varieties, σ2p is the variance associated with differences among pathogen isolates, σ2hp is the variance associated with interaction effects or deviations of specific host-by-pathogen combinations from their host and pathogen main effects, and σ2e is the error variance associated with residual variation. To provide comparable estimates of the variance accounted for by host-by-pathogen interactions (σ2hp), the interaction variance σ2hp was divided by the total estimated variance: \(H{P}^{2}=\frac{{\sigma }_{hp}^{2}}{\left({\sigma }_{h}^{2}+{\sigma }_{p}^{2}+{\sigma }_{hp}^{2}+{\sigma }_{e}^{2}\right)}\). For models incorporating GRMs, prediction model accuracy estimates were generated using tenfold cross-validation to provide the average correlation between observed and predicted values (Resende Jr et al. 2012). Briefly, the data were divided into two subsets, the first containing 90% of the individuals and the second containing 10% of the individuals. The second subset with only 10% of the individuals was the validation set, and their phenotypes were predicted based on the model derived from the first set, also called the training set. This process was repeated 10 times, each time predicting the phenotype of a new 10% until all individuals were predicted (Kohavi 1995; Usai et al. 2009). As well, likelihood ratio tests, or log ratio tests (LRTs), were performed using ASReml to compare each applicable model’s (Models 1–6) goodness of fit relative to the complete model (9) that included all similarity matrices.

A ‘type-B’ genetic correlation is a statistic that is often used to estimate the stability of genetic entries or the impact of changes in genotype performance or rankings when tested in different environments (Burdon 1977). Here, we provide genetic correlation estimates for host and pathogen to describe how host breeding values for FER resistance change when varieties are infected with different pathogen genotypes (rgH) or how predictions of pathogen virulence change when isolates are used to infect different host genotypes (rgP). These intra-class correlation statistics were used to provide population-wide estimates of stability for the host: \({r}_{gH}=\frac{{\sigma }_{h}^{2}}{\left({\sigma }_{h}^{2}+{\sigma }_{hp}^{2}\right)}\), and pathogen: \({r}_{gP}=\frac{{\sigma }_{p}^{2}}{\left({\sigma }_{p}^{2}+{\sigma }_{hp}^{2}\right)}\).

Results

Phenotyping results

Figure 2 presents the distribution of the disease severity phenotyping, with Fig. 2A describing the distribution of severity by Fusarium isolate and Fig. 2B describing the distribution of severity scores for each maize variety. The distribution of diseased kernel counts was nearly normal for both Fusarium isolates and maize varieties. Averages of the Fusarium and maize phenotypes were 23.4 and 23.2, and their standard deviations were 5.0 and 7.6, respectively. The coefficient of variation for the Fusarium data was 21%, and 33% for the maize data.

Fig. 2
figure 2

Histograms summarizing phenotypic data as treatment means. A displays the mean disease severity for each Fusarium isolate scored using kernel counting. B describes the same phenotypic data; however, values are presented for each maize variety

Model variance estimates

Table 1 summarizes the factors that were included in the models with estimates of genetic parameters, model accuracy from cross-validation, percentage of variance attributable to error, and log-likelihoods estimates. As well, intra-class correlations estimating the stability disease severity are provided for both host and pathogen. Models progress from simple to complex, incorporating GRMs and additional factors in subsequent models.

The first two models were the simplest single-factor models that included effects for either host or pathogen and used standard design matrices to associate phenotypic data with independent experimental treatments. The resulting heritability estimates of 0.33 for maize and 0.07 for Fusarium were biased upward as the interaction variance was excluded from these models. Incorporating marker data in the form of GRMs for the next two models (Models 3 and 4) resulted in the same heritability estimate for maize (0.33) and increased the estimate for Fusarium to 0.15.

The next model (5) removed the GRMs and used design matrices to relate both the host and the pathogen treatments to phenotype, which resulted in nearly identical heritability estimates (0.34 for host and 0.07 for pathogen) than the first two models that also excluded genomic data. Once the GRMs were added back to the model (Model 6), host heritability fell to 0.31, and the percentage of variance attributable to pathogen isolates rose to 0.13, which was nearly the same as the heritability estimate for the pathogen from model (4) that also included a GRM. The following models included factors to model the host-by-pathogen interaction effects.

In Model 7, genomic matrices were again removed, but the interaction of the host and the pathogen was introduced as a design matrix defining each host-by-pathogen pair as a unique interaction. Host heritability was 0.33, and the percentage of variance attributable to the pathogen isolates returned to 0.07. The percentage of variation attributable to host-by-pathogen interaction effects (HP2) was estimated to be 0.10. The population-wide intra-class correlations estimates, rgH and rgP, indicate the importance of host-by-pathogen interactions and explain the stability of the host resistance or pathogen virulence when evaluated with a range of other genotypes. The maize varieties’ breeding values were more stable (rgH = 0.77) when inoculated with different isolates than the isolates were (rgP = 0.40) when used to screen a diverse set of maize varieties.

The penultimate model (8) incorporated the genomic relationship matrices for both main effects (host and pathogen) but left the host-by-pathogen interaction term unchanged. The heritability estimates for the host (H2) slightly decreased to 0.30, and P2 increased to 0.13, while the percentage of phenotypic variation accounted for by interaction effects was 0.09. Incorporating the GRM of the host and pathogen main effects resulted in the same stability estimate for maize varieties (rgH = 0.77) and increased for pathogen isolates (rgP = 0.59).

The final model (9), which included genomic relationship matrices for both host and pathogen as well as a similarity matrix for the host-by-pathogen interactions, saw similar heritabilities for both the host (0.29) and pathogen (0.11), while the inclusion of the similarity matrix for interactions increased HP2 to 0.15. The genetic correlation estimates were similar (rgH = 0.66; rgP = 0.43), with a decrease in both host and pathogen stability.

The model accuracies, estimated with tenfold cross-validation, were only comparable across models that used relationship matrices, as indirect predictions for treatments that had phenotype data removed required a GRM to produce these indirect predictions. The accuracy estimates for models incorporating similarity estimates (3, 4, 6, and 9) are displayed in Table 1. Across the models, cross-validation accuracy estimates for host genotypes remained moderate and stable: 0.33, 0.29, 0.30, and 0.34. Model accuracies for pathogen predictions were lower but were also stable across models: 0.18, 0.21, 0.15, and 0.20. The accuracy of the interaction predictions for the final model was estimated to be 0.31.

Twice the negative of the log-likelihood estimates is provided in Table 1 to provide an indication of model fit. The probability (p value) and the log ratio statistics for each model comparison are presented in Table 2. Models 7 and 8 were excluded as they had the same number of parameters as Model 9. The LRTs showed that the complete model provided significant improvements in parameter estimates when any model was compared to it. The lowest LR statistics came from the more complex models (5 and 6) indicating parameter estimates from the models including both host and pathogen effects provide a better fit to the data. Models including pathogen parameters were higher than those of host models, but the LR statistics were little changed when genomic data were included. For example, Models 2 and 4 both included pathogen parameters, and although Model 4 included genomic data, when compared to Model 9, their values differed little.

Table 2 Results of likelihood ratio tests comparing Models 1–6 to Model 9

Genomic relationships

Figure 3 shows a heatmap of the maize genomic relationship matrix (GRM) derived from the filtered SNP markers along with the breeding value predictions for lesion size resulting from Model 9. Some of the inbreeding coefficients in the diagonal of the maize GRM exceeded the theoretical limit of 2 (detailed in Supplemental Fig. 1A); however, this was expected as they are highly inbred and their relationship to B73 is greater for two isolates (A680 and NC310). The relatedness coefficients on the off-diagonal were primarily centered around zero with few intra-variety relatedness estimates shown to be greater than 1 (Supplemental Fig. 1B). Clustering of the maize varieties by relatedness reveals no association with the resistance breeding value (BV) predictions from the complete model (9) that are provided in the annotation of the heatmap; however, field corn varieties had a higher average BV (1.97) than the sweet corn varieties (− 0.81). Annotation row colors are lighter as the BVs decrease (susceptible) and become darker as the values increase (resistant).

Fig. 3
figure 3

Genomic relationship matrix heatmap of maize genotypes. Relationships between maize varieties are reflected from 0 to 3 (white to red) in the heatmap. Genomic estimated breeding values (BV) are incorporated into the heatmap on the left side for each maize variety

Figure 4 provides the analogous heatmap for Fusarium isolates that was estimated as a hybrid relationship matrix (HRM) to incorporate all F. verticillioides isolates along with the additional Fusarium species (out-species) because the out-species formed a distinct outgroup and resulted in highly skewed inbreeding and relatedness coefficients, and they were incorporated in the HRM with isolate relationship coefficients set to 0 as would be the case when independent and unrelated treatments are included in the experimental design. Estimates for virulence values (VirVs) are shown alongside the heatmap to indicate the lesion size change that would be expected when the respective isolate is used for inoculation.

Fig. 4
figure 4

Hybrid relationship matrix heatmap of Fusarium isolates. Relationships between Fusarium varieties are reflected from 0 to 2 (white to red) in the heatmap. Virulence values (VirVs) are incorporated into the heatmap on the left side for each Fusarium isolate. The inbreeding and relationship coefficients for the Fusarium isolates from other species were set to 1 and to 0 which specify no recent relationships with any other isolates

Discussion

The distribution of the mean Fusarium ear rot (FER) severity estimates was mostly normal when compared across the Fusarium isolates and the maize varieties (Shapiro–Wilk test p values: 0.91 and 0.41, respectively). While no susceptible and resistant maize varieties were included as controls, the normal distribution and genetic parameter estimates provided evidence that the phenotyping strategy captured genetic variation in FER resistance among the maize varieties. These 158 varieties were selected to provide a genetically diverse maize population. Principal component analyses (PCA) were derived from the genomic marker data and plotted for both Fusarium and maize populations. Two PCA plots were drawn for the maize population. The first (Supplemental Fig. 3) included the three varieties (B73, A680, and NC310) that were more closely related than the rest of the varieties, and the second (Supplemental Fig. 4) excluded these varieties to better view the two clearly distinguished clusters that represented the sweet and field corn groups. The Fusarium PCA (Supplemental Fig. 4) showed expected clustering of all out-species isolates, except for Ft25622, together away from the Fv isolates which clustered in two groups. One group was of interest as it grouped together several NC and FL isolates, potentially showing a geographic differentiation. The genetic variation attributable to isolate virulence provides evidence that incorporating pathogen isolates in a genomic selection (GS) models may improve the accuracy of disease severity predictions. While the plants from which the ears came were grown in the field, phenotyping (inoculation, incubation, and scoring) was done in a controlled and uniform environment to reduce sources of error.

A portion of this study was done to assess our ability to provide predictions for all varieties and isolates from a highly reduced number of phenotypic data points by using genotyping to provide connectivity among both the host plant varieties and the pathogen isolates. The circular design presented in Fig. 1 shows how the reduced number of the host–pathogen interactions were tested, which allows for predictions to be made from all varieties and all isolates that were evaluated. The incorporation of relatedness estimates using the GRM allowed for the sharing of information across treatments using a GS model framework. This design captured 3661 disease severity data points from a possible total of 23,700, resulting in a sixfold reduction in time, labor, materials, and other expenses associated with screening the complete set of varieties with all isolates. This lack of phenotypic data for specific combinations of host and pathogen was countered by obtaining sets of SNP markers from full genome sequences for all individuals involved in the study. While data are not available from the complete experiment to provide direct comparisons with the full dataset, one can draw conclusions about the success of using a reduced dataset from examining changes in estimates of heritability, error variance, and model accuracy. Alternative similarity matrices, such as correlation matrices, may be used in place of standard genomic relationship matrices by scaling the GRM (Xue et al. 2016), and a range of other GRM or similarity matrix estimation methods may be explored (Su et al. 2012; Vitezica et al. 2013).

Across the values in Table 1, several trends are apparent. First is how the variation accounted for by host effects changes, or rather does not change, across the models. Host heritability ranged from 0.34 to 0.29, all within overlapping margins of error. Another trend is the variation accounted for by pathogen effects, ranging from 0.07 to 0.15 and increasing each time additional genomic data are included in the models (Model 2 vs. 3, Model 5 vs. 6, and Model 7 vs. 8 vs. 9). The third trend is that of the interaction effect, which changes from 0.10 to 0.09 to 0.15 in Models 7, 8, and 9, respectively. These comparisons indicate that non-additive genetic effects are non-negligible in this pathosystem, particularly for the pathogen and the interaction effect. In all models, the percentage of variance attributable to the host (maize) was greater than that of the pathogen and interaction combined. Variance attributable to the pathogen and the host-by-pathogen interaction was similar to each other with the interaction being slightly higher in Model 9 (0.11 vs. 0.15). This suggests that maize provides greater genetic control over disease severity in this pathosystem. Incorporation of additional relationship matrices estimating dominance or other non-additive effects provides further areas to evaluate the fit of prediction models (Muñoz et al. 2014). Crucially, the overall variation attributed to genetic effects increases due to the inclusion of the interaction term. This is seen when comparing Model 6 to Model 9, where the total genetic contribution is estimated to be 0.44 (0.31 + 0.13) in Model 6 and increases to 0.55 (0.29 + 0.11 + 0.15) in Model 9 when the interaction is included. It was also important to see improvements in the variance attributable to the interaction effect by including the host–pathogen covariance matrix, seen when comparing Model 8 to Model 9 (0.09 to 0.15), and no loss of total variance attributable to genetic effects (0.52 to 0.55).

Another trend that is connected to the importance of genetic effects is the decrease in error variance as model complexity increases. Error variance percentage, calculated by dividing the error variance by the sum of all variance components, decreased from 67 and 85% of the total in Models 3 and 4 to 46% in Model 9. Comparing error variance across all models showed a general decreasing trend when more data were included. The greatest change in error estimates resulted from the inclusion of interaction effects in the model, with the proportion of error variance reducing to 55% in Model 6 to 46% in Model 9, again underlining the gains observed when including the genomic interaction component.

From these trends, several conclusions can be drawn. One is that the model is improved by including an estimation of the host-by-pathogen interaction effects as evidenced from the increase in variance explained by genetic effects, reductions in error variance, and improvements in log-likelihood estimates. Another is that there is a lower level of genetic variation attributable to the pathogen genotypes compared to the host genotypes, and intermediate to both is the estimated contribution from the interaction effects. Model accuracy, as determined via 10X cross-fold validation for each model able to be evaluated (Models 3, 4, 6, 8, and 9), was very consistent for host (0.32, 0.29, 0.30, and 0.31) and that of the pathogen was lower and slightly less consistent (0.18, 0.21, 0.15, and 0.20). Model accuracies for host and pathogen were highest or second highest in Model 9, which also included the accuracy of the interaction (0.31) being nearly as high as for the host. Accuracies in this range are unsurprising as FER severity is known to be impacted significantly by environmental effects. Taken with the variance components, the validation accuracies suggest that it would be more accurate to predict how a genotyped maize variety will react to unknown Fusarium genotypes rather than the reverse; however, the additive interaction and pathogen (P2) effects suggest a potentially significant increase is available if the pathogen population is characterized/genotyped. This does provide evidence as to why genomic selection models for FER resistance have overestimated the potential gains. The pathogen and interaction effects could also resolve some of the substantial environmental impacts that FER is well-known for. These environmental differences may be confounded by unique pathogen populations which could develop in distinct climate conditions. Natural infections likely consist of multiple pathogen genotypes (field-wide) that may do better or worse in different environments, and not accounting for the host–pathogen interactions among seasons or years would result in unexpected changes in the responses of less stable maize varieties.

Other genomic selection models for FER have generated higher heritability estimates for maize populations than presented here (ranging from 0.3 to 0.7), and there could be several explanations. Firstly, the experimental population is relatively small (158 individuals) and was chosen from the breeding population to maximize genetic diversity. Second, the circular phenotyping design massively reduced the number of data points, and while not receiving a substantial penalty in heritability, we expect a larger or more complete phenotypic dataset to increase heritabilities. Here, it should be noted that in more well-characterized pathosystems, the circular design may not suffer any penalty when the pathogen population is included. Third, as discussed previously, it is known that other models have not seen the predicted gains in resistance to FER, so a lower heritability may, in fact, be more accurate. We suggest several steps for future work to obtain improved and accurate selection models. The first is to increase the population size of the maize (or other host) population, particularly if similar levels of genetic diversity are to be included. The second is to identify genotypes presenting consistent and opposing phenotypes (controls) when using the same phenotyping methods. Third, expand phenotyping proportionally with genetic diversity of both populations.

As small populations were used to provide a proof-of-concept, this study does not attempt to identify individual maize genotypes as resistant to FER or Fusarium genotypes that are always virulent. Instead, this study was focused on alternative prediction models for genomic selection that incorporate both sides of a pathosystem to quantify the relative importance of host, pathogen, and host-by-pathogen interactions. Disease screening is typically focused on either the host or pathogen, and traditional methods do not provide sufficient information to estimate the weight of the interaction component. The results show that including both the pathogen population and the interaction component in prediction models results in a moderate increase in total estimated variance and model accuracy.

Though our total variance accounted for by genetic and interaction effects (0.55) is difficult to compare with other studies, the ability to predict the interaction between host and pathogen with the incorporation of genomic data from host and pathogen has been attempted only a handful of times in the literature and never in a pathosystem of real-world significance. In one of the rare plant-pathosystems exploring similar methods, the proportions of variance explained were 0.44 for the pathogen, 0.02 for the host, and 0.05 for the interaction resulting in a total variance attributable to genetic and interaction effects of 0.53 (Wang et al. 2018). A previous study investigating the phenotypic variation attributable to HIV viral strain or human host genotype revealed that human HLA genes explained 8.4% of the variance, viral phylogenetics explained 28.8%, and together the total variance explained was 29.9% (Bartha et al. 2013). If these estimates are compared with the estimates presented in this research, it is no surprise to see a different value for each of the categories as various pathosystems are known to be dependent upon genotypic differences in both the host and pathogen to varying degrees. Research using similar methods in other pathosystems for disease resistance can also expect to see different proportions in their populations.

Isolate virulence values (VirVs) of a pathogen genotype are not used by plant breeders; however, these data are highly useful when applied. Just as plant varieties are assessed and bred across a wide range of environments and then grown in the locations where those lines do best, breeding for disease resistance using a diverse set of pathogens to determine which varieties perform better against certain strains, then growing those varieties where disease pressure is high from similar pathogen genotypes could be used to reduce disease severity. Being able to assess and include both sides of the pathosystem allows for the targeting of varieties to certain environments (containing specific pathogen genotypes) and may aid in extending the durability of resistance by breeding against diverse pathogen populations. Screening with genetically diverse isolates will reduce the chances of breeding for isolate-specific resistance, which may lead to an outbreak of a pathogen variant that utilizes distinct virulence mechanisms. It additionally improves our understanding of what level of resistance is possible given a specific group of pathogen genotypes and therefore improves accuracy of breeding values. Toward the goal of reducing error in all breeding programs, adding data like interaction values to multi-environment or multi-pathogen trials can reduce error. Considering the pathosystem is a complex adaptive system, inclusion of both the host and the pathogen genomes may be conducive to decreasing the average prediction errors over time. Increasing prediction accuracies may require further incorporation of pathosystem dynamics in a similar way as was shown for crop improvement to abiotic stress (Messina et al. 2018, 2022). Using cross-validations to estimate model accuracy will be increasingly important when applying this type of prediction model to larger datasets, specifically to reduce environmental error.

Conclusions

Making practical use of this proposed GS model requires integrating the results into a breeding program with a distinct host and pathogen population and isolating markers associated with FER resistance, both of which will require screening of a broader set of germplasm in additional experiments. This research provides estimates of gains obtained by including host, pathogen, and host-by-pathogen effects into the genomic selection model along with the protocol used to produce phenotypic data for disease severity. Using other methods of phenotyping, specifically the methods used for inoculation and scoring disease severity, could alter the confidence in the presented results or go further to increase heritability estimates and the accuracy of predictions. Methods used to convert genomic information to relatedness estimates will allow others to build upon this study by combining genotypic data from the host and pathogen populations that are used for additional screening. Using genomic connectivity will allow the community that is working on developing resistance to FER and reducing fumonisin contamination to steadily increase the number of varieties and isolates that are tested which will better approximate the true genetic parameters underlying the FER disease severity phenotype. Additionally, using larger populations for the purpose of identifying QTLs associated with host resistance, pathogen virulence, and host-by-pathogen genomic interactions could provide validation of the interaction components that are described by these models. Integrating pathogen populations into genomic prediction models can improve variance components and prediction accuracies and thus should be further validated in pathosystems with known mechanisms of resistance and pathogenicity.