Introduction

Barley is one of the most important cereal crops globally and is ranked fourth in the world in terms of quantity produced and area of cultivation. (Capettini et al. 2010; Zhou 2010). It is also the second most important crop in Australia, with average grain production of about 9 million tonnes per annum, 70% of which is exported (ABARES 2019). Australia makes up over 30% of the world malting barley trade and approximately 20% of the world feed barley trade (Barley 2022). Although barley yield in Australia increased from 1.2 t/ha in 1969 to 2.5 t/ha in 2015, this rate of genetic gain is still slower than those of other countries especially the European Union (EU) (2 to 5–8 ton/ha) (Schils et al. 2018; Spragg 2016). This is probably caused by low soil fertility, lower and highly variable precipitation with many stresses that reduce productivity in Australia (Hochman et al. 2013; Richards 1991; Van Gool and Vernon 2006). Thus, breeding for higher yield in Australia is a challenging task as grain yield has a relatively low heritability due to the high GEI (genotype × environment interaction) (Slafer 2003). Nevertheless, there have been continuous efforts spent to increase barley grain yield in this highly variable environment (Brown et al. 1988; Donald 1979; Eagles and Moody 2004; Finlay and Wilkinson 1963).

The most significant breakthrough in barley yield improvement was achieved by the introduction of Green Evolution genes sdw1.d, sdw1.c (originally named denso), uzu1.a, and erectodies (ari-e.GP), which resulted in shorter plants, better lodging resistance, and higher harvest index, thus leading to higher yield (Jia et al. 2016; Mickelson and Rasmusson 1994; Nadolska-Orczyk et al. 2017). However there is evidence that the best alleles conditioning the short stature in barley tend to become fixed in modern barley breeding germplasm (Jia et al. 2009). In addition, some of these genes (i.e. denso and uzu1.a) also were reported to associate with unwanted traits such as lowered malt quality or temperature sensitivity (Dockter et al. 2014; Hellewell et al. 2000; Wang et al. 2010b), and therefore, there were attempts to replace them with other plant height-conditioning genes (Dockter et al. 2014).

The other promising avenue employed to improve yield is to improve yield component traits such as number of spike/plant (or tiller number), grain number per ear, and thousand-grain weight (Jedel and Helm 1994; Peltonen-Sainio et al. 2007). Depending on the genotypes and the environments tested, the interaction among these three components was found to be either negative (Bulman et al. 1993; Hadjichristodoulou 1990), positive (Saade et al. 2016), or both, but depending on the environment (Markova et al. 2015; Wiegmann et al. 2019). This indicates that yield components also have strong GEI, and there is plasticity in the yield components to compensate for yield reduction (Sadras and Rebetzke 2013). Nevertheless, improving one trait without compromising one another was feasible (Griffiths et al. 2015; Zhou et al. 2016). Genotypic selection for yield component quantitative trait loci (QTL) in tandem with phenotypic yield selection may be more beneficial to improving yield than yield testing alone. QTL affecting the three aforementioned yield components have been defined and mapped in barley (Cu et al. 2016; Maurer et al. 2016; Mikołajczak et al. 2016; Sharma et al. 2018; Walker et al. 2013; Wang et al. 2019; Xu et al. 2018; Zhou et al. 2016). In recent studies, many “hotspots”, QTL regions controlling multiple traits, were identified. In a few cases, these hotspots reside near key genes controlling plant stature, flowering time, and spike morphology, such as vrs1, nud, VRN-H3 and sdw1/denso (Wang et al. 2019). In other cases, they co-locate with barley orthologs of genes controlling grain size and weight in other cereal species such as rice, maize, and wheat, demonstrating the conservation of gene function within the cereal family. Functional genomics and analysis of natural or induced mutants have revealed genes involved in spike morphology, such as Vrs1 (Komatsuda et al. 2007), Vrs5 (Ramsay et al. 2011), INT-C/HvTB1 (Lundqvist 1997), HvAP2 (Houston et al. 2013), and HvCKX (Zalewski et al. 2010, 2014, 2012) determine yield components and grain yield.

For a specific trait, the genetic improvement of crop plants is accomplished by stacking and accumulating favourable alleles within gene pools. In barley, domesticated barley (H. vulgare L. ssp. vulgare) and wild barley (H. vulgare L. ssp. spontaneum) constitute two primary sources of advantageous alleles in breeding (Wendler et al. 2014). Beneficial alleles from wild barley have been repeatedly used to improve yield in elite cultivars (Kalladan et al. 2013; Lakew et al. 2013; Nevo and Shewry 1992). To capture a wide range of genetic diversity from wild barley for better identification of beneficial alleles, a Nested Association Mapping (NAM) population was created from crossing 25 genetically diverse wild barley accessions originating from the Fertile Crescent region to the malting cultivar ‘Barke’ (Maurer et al. 2015). NAM populations combine the advantages and minimise the disadvantages of two traditional methods for identifying quantitative trait loci, including bi-parental and association mapping populations. NAM populations have higher allelic diversity and higher recombination events than biparental populations, decreasing the population structure's confounding effect seen in the association mapping populations (Gage et al. 2020; Hu et al. 2018; Kitony et al. 2021). HEB-25 population (HEB stands for Halle Exotic Barley) has been utilized to study flowering time, salinity and drought tolerance, grain size and weight and disease resistance in barley in a wide range of environments across the globe (Büttner et al. 2020; Herzig et al. 2018; Maurer et al. 2015; Maurer et al. 2016; Pham et al. 2019; Saade et al. 2016; Sharma et al. 2018; Vatter et al. 2017, 2018). This population harbours a reservoir of beneficial alleles that can be exploited for variety development and yield improvement in previously tested environments. In 2020, the HEB-25 population was evaluated and screened to identify QTL relating to phenology in Charlick, South Australia (Pham et al. 2020). Here, we report on the results of a Genome-Wide Association Study (GWAS) conducted for spike morphology-related traits from the same field trials conducted by Pham et al. (2020) that might contribute alleles to increase yield in environments such as those in southern Australia.

Materials and methods

The plant materials

The wild barley HEB-25 NAM population was developed by crossing and then backcrossing 25 diverse wild barley accessions (Hordeum vulgare ssp. spontaneum and agriocrithon) to the cultivar Barke. Details about the population design and development were described previously (Maurer et al. 2015). In summary, there were 25 subfamilies, with the number of lines within each family in the range from 22 (family no.18) to 75 (family no.3) (Maurer et al. 2015, Additional file 1). On average, there were 52 lines per family, with each BC1S3 plant hadving an expected segregation of 71.875% homozygous Barke loci, 21.875% homozygous wild barley loci, and 6.25% heterozygous loci. The lines sown in 2015 and 2016 were BC1S3:8 and BC1S3:9 lines, respectively.

Genotyping of NAM lines

The population was genotyped with the Illumina iSelect 50 K chip (Bayer et al. 2017), and a total of 32,995 SNPs meeting the quality criteria (polymorphic in at least one HEB family, < 10% failure rate, < 12.5% heterozygous calls) were utilised for GWAS in this study.

A quantitative identity by state (IBS) approach described by Maurer et al. (2015) was used to define the SNP matrix. Missing SNPs were imputed using the mean score of polymorphic flanking markers (matrix E, Maurer & Pillen, 2019, https://doi.org/https://doi.org/10.5447/ipk/2019/20).

HEB-25 field trials

Field trials were conducted at Charlick, Strathalbyn in South Australia in 2015 (− 35° 19′ 20″ N, 138° 53′ 24″ E) and 2016 (− 35° 19′ 22″ N, 138° 52′ 56″ E) with 941 and 1294 lines of the HEB-25 population respectively. The lines tested in 2015 were a subset of the population containing 18 families, of which seeds were available for testing at the time of sowing in 2015. The experiment design was fully described by Pham et al. (2020). In summary, the experiment design for 2015 was an unreplicated augmented block design with lines grouped by family. There were 1176 plots arranged into 12 bays × 98 columns with two plots for check varieties per column. For 2016, a partially replicated randomized design was implemented with 147 out of 1294 lines replicated, and the same check lines sown in 2015 were used again at a frequency of one check plot every 10 test lines. Check lines used in both years included ten Australian varieties (Admiral, Capstan, Commander, Compass, Fleet, Flagship, Hindmarsh, Gairdner, Navigator, and Keel), and the parental line Barke. Plots in 2015 and 2016 were 2.24 m2 doubled-rows with 3.20 m in length and separated by 0.5 m to reduce competition between plots. Two rows within a plot were 0.20 m apart.

The sowing and harvesting dates for the 2015 trial were June 16th and December 15th, respectively. The sowing and harvesting dates for the 2016 trial were May 18th 2016, and December 20th 2016, respectively.

Phenotypic data

Seven traits measured in this study included ear length (EL), awn length (AL), grain number per ear (GPE), grain area (GA), grain width (GW), grain length (GL), thousand-grain weight (TGW). For the first three traits, phenotypic values for each genotype were calculated as an average of measurements taken from 15 randomly chosen ears per plot. Ear length was measured in cm from the bottom to the top of the ear, and awn length was measured in cm from the top of the ear to the tip of the awn. Grain number per ear (GPE) was measured as total grain counts from each ear. The latter four were measured using the 150–300 seeds using GrainScan software (Whan et al. 2014) and expressed in mm for GL and GW, mm2 for GA and gram for TGW.

Within and between year phenotypic analysis

Within each of the years, the analysis of individual grain traits was conducted using a linear mixed model (LMM) that appropriately partitioned and accounted for all genetic and non-genetic sources of variation (Pham et al. 2020) specified as the following:

If \({\varvec{y}}=\left({y}_{1}, \dots , {y}_{n}\right)\) are a vector of \(n\) phenotypic trait responses, then the LMM was defined as

$$\begin{array}{c}y=X\beta +Zu+{{\varvec{Z}}}_{g}g+e\end{array}$$
(1)

where \({\varvec{X}}{\varvec{\beta}}\) was the fixed component of the model and contained a population type factor to estimate the overall mean of the HEB-25 population as well as separate means for each of the parents and controls involved in the experiment. This component also contained terms to model distinct spatial trends such as linear column or row effects if present in the field. The term \({\varvec{Z}}{\varvec{u}}\) was the random component containing factors to model possible sources of non-genetic variation including non-linear row or column effects. This component may have also contained terms to model potential outliers using the methodology outlined by Gumedze et al. (2010). Additional extraneous spatial variation was captured with the residual model error term, \({\varvec{e}}\), and was assumed to be distributed \({\varvec{e}} \sim N(0, {\varvec{R}}\)) where \({\varvec{R}} = \sigma^{2} {{\varvec{\Sigma}}}_{r} \otimes {{\varvec{\Sigma}}}_{c}\) is a separable correlation structure with \({{\varvec{\Sigma}}}_{r}, {{\varvec{\Sigma}}}_{c}\) parameterized as an auto-regressive structure of order 1 in the row and column direction respectively. The underlying genetic variation of the HEB-25 lines was modelled using the random component term \({{\varvec{Z}}}_{g}{\varvec{g}}\) where the genetic random effects, \({\varvec{g}}\), are an \(r\) length vector and assumed to be distributed \({\varvec{g}} \sim N(0, {\sigma }_{g}^{2}{{\varvec{I}}}_{r})\). This assumes a common genetic variance across the HEB-25 population. Under this LMM structure the effects, \(\left({\varvec{u}}, {\varvec{g}}, {\varvec{e}}\right)\), were considered to be mutually independent. From each of the fitted models, the vector of best linear unbiased predictions (BLUPs) \(\widetilde{{\varvec{g}}}=({\widetilde{g}}_{1}, \dots , {\widetilde{g}}_{r})\) of the HEB-25 lines, as well as their prediction error variances, were used to calculate broad sense generalized heritabilities with the formula developed by Cullis et al. (2006), namely

$${H}_{g}^{2}=1-\frac{PE{V}_{ave}}{2{{\widehat{\sigma }}_{g}}^{2}}$$

where \(PE{V}_{ave}\) is the average of the prediction error variances of all elementary contrasts between the progeny lines and \({\widehat{\sigma }}_{g}^{2}\) is a REML estimate of the genetic variance of the progeny obtained from the fitted model.

To understand the genetic relatedness of the traits between the 2 years, (1) was extended to a multi-year LMM. For each trait, the fixed component of the multi-year LMM consisted of an interaction of a two-level year factor with a population factor to ensure HEB-25 progeny means, parents and controls were estimated independently for each year. Extraneous sources of environmental variation modelled with fixed and random terms in the single year models, were also appropriately added to the multi-year LMM. The multi-year LMM residual was assumed to be distributed \({\varvec{e}} \sim N\left( {0, \oplus_{i = 1}^{2} {\varvec{R}}_{i} } \right)\) where \(\oplus_{i = 1}^{2} {\varvec{R}}_{i} = {\text{diag}}\left( {{\varvec{R}}_{i} } \right)\) (Butler et al. 2018) with \({\varvec{R}}\) defined previously. Most importantly, the random genetic effects are assumed to have a multiplicative structure with distribution \({\varvec{g}} \sim N\left( {0,{\varvec{G}} \otimes {\varvec{I}}_{r} } \right)\) where \({\varvec{G}}\) is a 2 × 2 correlation matrix with diagonal elements reflecting the underlying genetic variation of the HEB-25 lines in each year and off diagonal covariance reflecting the genetic relationship of the lines between years.

Computations and heritability

All phenotypic models were fitted using the LMM software ASReml-R (Butler et al. 2018), available as a package in the R statistical computing environment. ASReml-R uses a residual maximum likelihood approach toestimate mean and variance parameters (Patterson and Thompson 1971).

For individual traits analysed using the LMM, broad sense generalised heritabilities were calculated using the formula Cullis et al. (2006) described.

Genome-wide association study

QTL detection and cross-validation were conducted as described by Büttner et al. (2020). The detection rate was calculated to represent validity and significance, and each marker detected in at least 20 cross-validation runs was declared significantly associated with the trait. Significant marker-trait associations were grouped to a single QTL if the significant SNPs were linked by less than 5 cM and expressed the same direction of additive effects, i.e. both exotic alleles increased or decreased the trait of interest. We applied the cumulation method to estimate a parent-specific QTL effect, as presented in Maurer et al. (2017). This procedure was conducted within each cross-validation run, and their mean of them was taken as the final parent-specific QTL effect estimate. The required genetic positions of 50 k markers were estimated as described in Büttner et al. (2020). All SNPs from the respective QTL interval were fitted in a linear model to estimate the QTL’s explained phenotypic variance (Vp) in the whole dataset.

Specifically, preceding marker-based analysis, the BLUPs for individual traits within each year were de-regressed using the formula described by Garrick et al. (2009), namely

$${g}_{i}^{*}=\frac{{\widetilde{g}}_{i}}{1- PE{V}_{i}{{/\widehat{\sigma }}_{g}}^{2}}, i=1, \dots , r.$$

where \({\widetilde{{\varvec{g}}}}_{{\varvec{i}}}\) and \({\varvec{P}}{\varvec{E}}{{\varvec{V}}}_{{\varvec{i}}}\) is the BLUP and prediction error variance of the \({\varvec{i}}\)th line respectively.

For each set of de-regressed BLUPs, a GWAS was conducted using a two stage multiple regression procedure similar to the methods described by Liu et al. (2011) and Maurer et al. (2016). In the first stage a set of SNP co-factors are sought of from an initial saturated additive SNP model specified as

$$\begin{array}{c}{{\varvec{g}}}^{\boldsymbol{*}}={1{\varvec{\mu}}+\boldsymbol{ }{\varvec{X}}}_{{\varvec{p}}}{{\varvec{\tau}}}_{{\varvec{p}}}+ {\sum }_{{\varvec{i}}=1}^{{\varvec{t}}}{{\varvec{M}}}_{{\varvec{i}}}{{\varvec{q}}}_{{\varvec{i}}}+{{\varvec{e}}}^{\boldsymbol{*}}\#\end{array}$$

where \({{\varvec{\tau}}}_{{\varvec{p}}}\) was a vector of fixed effect parameters used to estimate the NAM sub-population means and \({{\varvec{X}}}_{{\varvec{p}}}\) was the associated indicator matrix that maps trait responses to the appropriate sub-population. In this saturated SNP model \({{\varvec{M}}}_{{\varvec{i}}}\) was the \({\varvec{i}}\)th genetic marker covariate containing quantitative allelic values spanning the NAM genotypes and \({{\varvec{q}}}_{{\varvec{i}}}\) is its associated effect size. The model error, \({{\varvec{e}}}^{\boldsymbol{*}}\) represents residual genetic error and is assumed to be distributed \({{\varvec{e}}}^{\boldsymbol{*}}\boldsymbol{ }\sim \boldsymbol{ }{\varvec{N}}(0,\boldsymbol{ }{{\varvec{\sigma}}}^{2}{{\varvec{I}}}_{{\varvec{r}}})\). To determine the set of SNP co-factors, a forward–backward stepwise regression approach was implemented where inclusion or exclusion of individual SNPs was determined through minimising the Bayesian Information Criterion. From the determined set of \({\varvec{c}}\) SNP co-factors, a genome wide scan is then conducted with each marker individually added to a multiple regression model of the form

$${{\varvec{g}}}^{\boldsymbol{*}}={1{\varvec{\mu}}+\boldsymbol{ }{\varvec{X}}}_{{\varvec{p}}}{{\varvec{\tau}}}_{{\varvec{p}}}+\boldsymbol{ }{{\varvec{M}}}_{{\varvec{k}}}{{\varvec{q}}}_{{\varvec{k}}}+{\sum }_{{\varvec{j}}=1}^{{\varvec{c}}}{{\varvec{M}}}_{{\varvec{j}}}{{\varvec{q}}}_{{\varvec{j}}}+{{\varvec{e}}}^{\boldsymbol{*}}$$
(2)

where \({{\varvec{M}}}_{{\varvec{k}}}\) is the marker being assessed and co-factors that were less than 1 cM from the marker being assessed were excluded from the co-factor set. The Bonferroni-Holm method (Holm 1979) was used to provide a family-wise adjusted \({\varvec{p}}\)-value for multiple testing of marker-trait associations with significant markers accepted if \({{\varvec{p}}}_{{\varvec{B}}{\varvec{O}}{\varvec{N}}-{\varvec{H}}{\varvec{O}}{\varvec{L}}{\varvec{M}}}<0.05\). The approximate proportion of genetic variance explained by a marker was determined by estimating \({{\varvec{R}}}^{2}\) after modelling the marker solely in a linear model.

Candidate genes for QTL detected by GWAS were identified using the BARLEYMAP pipeline, MorexV3 map (Cantalapiedra et al. 2015). In addition we compared the genomic position of the detected QTL with the position of 164 barley orthologs with the cereal genes for grain size and weight reported by Wang et al. 2019. Genes were suggested as candidates if they were within 4 cM upstream or downstream of a QTL, reflecting the LD decay of 7.85 cM reported in HEB-25 by Vatter et al. (2017).

Results

Trait variation, trait heritability, and multiyear relationship

Among the investigated traits, large ranges in phenotypic values were observed, with the maximum values often 2-3X larger than the minimum values. (Fig. 1 and Table 1). HEB-25 lines surpassing the recurrent parent ‘Barke’, and ten check varieties were observed for all measured traits except for TGW, GW, and GA in 2015 and AL, GA, and GW in 2016 (Fig. 1 and Online Resource 1).

Fig. 1
figure 1

Ridge plots display seven traits measured in 2015 and 2016. The circle and triangle indicate trait scores of Barke (recurrent parent) and Compass (an Australian high yielding variety)

Table 1 Summary of simple statistics for the field experiments in 2015–2016

When a Student t-test was used, a significant difference between the 2 years was observed for all traits measured (P < 0.05, data not shown). The mean trait values for AL, EL, GA, and GL in 2015 were higher than those in 2016, while the reverse trend was found for GPE, TGW and GW. The significant difference between the 2 years for the seven traits measured was probably attributable to the difference in cultural practice, environment and genetic composition between 2 years. The trial in 2015 was sown 1 month later than the usual sowing date for the location (mid-May) as there was little rain to support seed growth in May 2015. Growing season rainfall (GSR) of 2015 and 2016 at the tested location was 229 mm and 502 mm, respectively, representing a reduction of 26% and an increase of 60% compared to the 26-year average for the location (Online Resource 2). Finally, seeds of only 18 families were sown in 2015 as the other seven families were still under assessment in quarantine.

In general, heritabilities were relatively high, ranging from 0.49 to 0.84 (Table 2). Heritabilities were higher for 2016 compared to 2015, possibly due to the smaller number of lines tested in 2015 compared to 2016. The lowest heritability was found for GPE in 2015 (0.49), and the highest heritability was found for GW in 2016 (0.87). The difference in magnitude of heritability between the 2 years was greatest for GPE and GW.

Table 2 Summary of QTL and total phenotypic variation explained by the QTL detected for seven traits in 2015–2016

Although the heritabilities of the traits were consistent between 2 years, the within-year genetic correlations ranged from 0.57 to 0.69 except for GW (0.84), reflecting a reduced genetic similarity in each of the studied traits between the years (Table 2). Taking everything into consideration, traits within each year were individually processed for further analysis.

Within-year trait relationships

Correlations among seven measured traits are shown in Fig. 2. In both years, TGW had a strong positive correlation with GW and GA and weaker positive correlations with GL and AL. There was a correlation with GPE and EL in 2016 only.

Fig. 2
figure 2

Correlation matrices for seven traits measured from field trials from 2015 to 2016. Correlation matrices for seven traits in 2015 and 2016. In the following plots, the distribution of each variable is shown on the diagonal. The bivariate scatter plots with a fitted line are displayed on the bottom of the diagonal. The value of the correlation plus the significance level as stars are displayed on the top of the diagonal. Each significance level is associated to a symbol: p-values (0, 0.001, 0.05, 0.01) <  =  > symbols (“***”, “**”, “*”)

GA correlated positively and strongly with GL, GW, and TGW. Both AL and GL showed a negative correlation with GPE. AL had a low but significant positive correlation with TGW, GA and GL but not with GW. Although slightly differing in magnitude, a similar correlation trend among these variables was also observed in the field trials in Germany and Scotland for the HEB-25 population, as shown by Sharma et al. (2018).

GWAS results

The summary and full GWAS output of seven traits in both years were listed in the Table 2 and Online Resource 3, respectively.

GWAS-AL: there were 13 and 14 QTLs identified for awn length in 2015 and 2016, respectively, with four common QTLs between 2 years. The QTLs explained that the most phenotypic variation (Vp) in 2015 and 2016 was QAl.HEB-25-3H.2 (43.5 cM, 15%) and QAl.HEB-25-1H.3 (97.9 cM, 20%), respectively.

GWAS-EL: there were 10 and 13 QTLs identified for EL in 2015 and 2016, respectively, with three common QTLs between 2 years. The QTLs that explained the most Vp in 2015 and 2016 were QEl.HEB-25-2H.1 (63.55 cM, 11.4%) and QEl.HEB-25-3H.1 (40.7 cM, 21.3%), respectively. Among the three common QTLs for EL, the QTLs QEl.HEB-25-4H.1 had wild alleles that increased EL in both years, while the remaining two QTLs on chromosomes 5H and 7H had wild alleles with a mixed effect in both years.

GWAS-GPE: there were 7 and 14 QTLs detected for grain number per ear for 2015 and 2016, respectively, with two QTLs shared between 2 years. Both common QTLs had wild alleles from all families that reduced GPE. The QTLs explained that the largest Vp were QGpe.HEB-25-7H.1 (11.3%) and QGpe.HEB-25-2H.1 (21.7%) in 2015 and 2016, respectively. The QTLs at which wild alleles increased GPE the most were QGne.HEB-25-3H.3 (52.6 cM) in 2015 (up to 1.4 grain on average) and QGne.HEB-25-6H.1 (116.75 cM) on 2016 (increased GPE up to 2.3 grain in average) (Fig. 3).

Fig. 3
figure 3

Family-specific effect of alleles from 25 families that were identified to be associated with the grain number per ear trait across 2 years of trials

GWAS-TGW: 8 and 23 QTLs were identified for TGW in 2015 and 2016, respectively TGW QTLs were mapped to all chromosomes. The QTLs explained that the largest Vp were QGpe.HEB-25-4H.4 (20.6%) and QGpe.HEB-25-6H.1 (20.9%) in 2015 and 2016, respectively. There were three common QTLs between the 2 years, with the two QTLs QTgw.HEB25-6H.1 and QTgw.HEB25-7H.2 having wild allele reduced TGW in both years in most of the HEB-25 families. In contrast, the common QTLs QTgw.HEB25-5H.1 increased TGW in both years in all families. The QTLs that increased TGW the most was QTgw.HEB25-6H.3 at which wild alleles from only family no. 25 increased TGW to 5.35 g. However, it was only detected in 2016, and alleles from all other families except no.25 reduced TGW.

GWAS-GL, GW, GA: there were 13 QTLs detected for GL in either 2015 or 2016, with only one common QTLs between 2 years. These QTLSs were mapped to all chromosomes and individually explained 1–35.2% of Vp. There were 12 and 16 QTLs detected for GA in 2015 and 2016, respectively, with five shared between 2 years. An individual QTLs explained from 5.9 to 35.5% of the phenotypic variance. The QTLs explained that the most Vp in both years for GL and GA was QGl.HEB25-1H.1, which co-localized with the thresh-1 locus (Schmalenbach et al. 2011).

There are 17 and 21 QTLs detected for GW in 2015 and 2016, respectively, with six shared between 2 years. These QTLSs were mapped to all chromosomes, and an individually explained 6.2 to 27.1% of the phenotypic variance The QTLs explained that the most Vp for GL and GA in both years was QGl.HEB25-1H.1, which co-localized with the HvAPO2/BFL locus.

Hotspot for grain size and weight

When all trait QTLs detected were plotted on the barley genetic map, several regional ‘hotspots’ were associated with multiple traits. There were 18 ‘hotspots’, defined as a window within 7.85 cM where four or more QTLs coincide, as this window was reported to be the linkage disequilibrium decay for this population by Vatter et al. (2017) (Fig. 4). There were three hotspots in each of the five chromosomes 1H, 2H, 4H, 5H and 7H, two hotspots in chromosome 3H and one in chromosome 6H. The hotspot 1_2 was found to associate with five traits, three of which had QTLs within this hotspot in both years. Another hotspot, 3_1, was linked with QTLs of six traits measured in this study. The rest of the hotspots associated with three to four traits.

Fig. 4
figure 4

Barley grain and ear dimension QTLs–genetic distributions and overlaps. The genetic positions of all QTLs identified here were based on the genetic map developed by Maurer et al. (2015) and (Büttner et al. 2020). The QTLs affecting four or more traits were grouped into QTL hotspots which shown in the rectangular boxes. Also plotted are genetic map locations for barley orthologs of known cereal grain trait genes (Wang et al. 2019) and other developmental genes known to affect yield in barley. Bars indicate the detection rate of each QTL detected for each trait, green and orange represents 2015 and 2016, respectively

Discussion

Phenotypic variation compared to the control varieties reveals promising candidate lines for breeding

For all seven traits except for GW, there were individuals with phenotypic values greater than most of the ten check varieties (Online Resource 1), implying the potential for use of the HEB-25 wild barley alleles in breeding to improve yield and yield component traits in Australian environments. Two HEB-25 lines, HEB-16-121 and HEB-16-144, had significantly higher GPE (up to 10 grains/ear) while having TGW equivalent to Compass, the current benchmark variety for high TGW and plumpness in Australia. In addition, line HEB-22-118 had TGW and GPE greater than Compass in both years, and its tiller number was equal to that of Compass in the wetter year 2016 (Online Resource 4, Pham et al. 2020). The correlation between GPE and TGW was found to be very low (< 0.1) for both years in the field trials, which was similar to findings from the salt stress study by Saade et al. (2016) and Sharma et al. (2018). The independent relationship of these traits in the HEB-25 population would be advantageous as beneficial alleles conditioning high GPE and high TGW can be combined into one genetic background without worrying about the trade-off effect between these two traits.Detection of a QTL is merely the result of statistical work using the association between genotype and phenotype. Therefore, the positive effect of identified beneficial alleles needs to be further validated in different genetic backgrounds using either biparental populations or a panel of diverse germplasm lines (Langridge et al. 2001; Pu-yang et al. 2022; Zhang et al. 2019). The beneficial wild allele conditioning increased GPE and TGW in the above lines. It was introgressed into three Australian cultivars, Compass, LaTrobe and Granger, via backcrossing and Kompetitive allele specific PCR (KASP) markers. We hypothesize that some progenies with transgressive segregants from these crosses will surpass the yield of the recurrent parental elite cultivars Compass, LaTrobe, and Granger. Both subsequent genomic and phenotypic selection will enable the identification of lines with potential high yielding from these crosses. Ultimately, field trials need to be conducted in the future to validate the effectiveness and drawbacks (i.e. linkage drag) of these beneficial wild alleles to yields and yield components.

Hotspots with beneficial wild alleles detected in this study and possible candidate genes

Eight hotspots were found to be common when the locations of our 18 hotspots were aligned to those of 14 hotspots detected for grain size and weight when the HEB-25 was evaluated in Germany and Scotland (Sharma et al. 2018). Among these typical hotspots, few co-localize with genes regulating flowering time/development, including Ppd_H2, HvELF3, HvCEN, HvCO1, and the remaining located closely to barley orthologs of cereal genes, including SRS3, GSK2, qGL3/GL3.1, and GW2 (Xu et al. 2018).

In this study, only three of the 16 hotspots (2_1, 4_1, and 5_1) had wild alleles, which positively affected on the traits measured, while others had a mixed effect or only negative effect. Wild alleles at these three hotspots increased GPE up to 0.9 grains/ear (hotspot 4_1 in family no.1, equivalent to 4.1% increase) and TGW up to 2 g (hotspot 2_1 in family no. 4, equivalent to 5.5% increase) compared to the Barke alleles. At the hotspot 5_1, four important yield component traits (GPE, TGW, GA and GW) were improved by the wild alleles from all families. It is noteworthy that the beneficial QTLs detected for GPE at these hotspots 4_1 and 5_1 were not found in the European counterpart study, signifying the value of this study that not only we can learn about the essential core loci/genes underlying one trait across environments but also those that are useful yet very environment-specific. Another advantage of these hotspots is that they reside near the end of the chromosome with a higher recombination rate, thus greatly facilitating recombination-based breeding.

Among these three hotspots, only one (4_1) has a potential candidate gene, LONELY GUY (LOG), which is a gene that has a direct role in the activation of cytokinins (Kuroha et al. 2009; Tokunaga et al. 2012). Rice plants with mutant log had abnormal branching, reduced number of floral organs and inflorescence complexity (Kurakawa et al. 2007).

When markers linked with the QTLs within the hotspot 2_1 and 5_1 were used to search for candidate genes within 4 cM upstream and downstream of these markers, 28 genes and 39 genes with high confidence were detected, respectively (Online Resource 5). For the hotspot 2_1, it is difficult to pinpoint a potential candidate gene for this region. In contrast, four candidate genes such as sucrose-phosphatase 1 (HORVU5Hr1G000580), ethylene receptor 1 (HORVU5Hr1G000590), MAD-box transcription factor (HORVU5Hr1G000370 or HORVU5Hr1G000480) could be the most promising candidate genes for the hotspot 5_1 as these were shown to hold a critical role to the grain filling process and grain weight (Jiang et al. 2011; Jiang et al. 2015; Luo et al. 2019; Paul et al. 2020; Radchuk et al. 2021; Wuriyanghan et al. 2009; Chen et al. 2016; Yang et al. 2012).

Comparison to QTLs detected in other studies for GPE and TGW

For GPE: across the four global field trials with the HEB-25 population conducted in Germany, Scotland, United Arab Emirates, and Australia, QTLs located near five loci including HvCEN (2H-56 cM), Vrs1 (2H-80 cM), Vrs4/btr1/btr2 (3H-46 cM), LOG (4H-1 cM), and VRN-H3 (7H-34 cM) were always detected regardless of the environment (Sharma et al. 2018; Saade et al. 2016). When GPE was mapped using materials with less complex genetic structure (association panel, double haploid populations, or introgression lines from wild barley), fewer QTLs linking with previously known genes controlling grain number observed, such as loci near Vrs1 and VRN-H3 were reported by Li et al. (2006); Ren et al. (2013), or only Vrs1 (Wang et al. 2016), HvCEN and Vrs1 (Honsdorf et al. 2017, Xu et al. 2018), or Ppd-H1, VRN2 and VRN-H3 (Wang et al. 2010a) were detected for GPE. Vrs4 was reported to regulate Vrs1 to control spikelet determinacy and morphology and indirectly control LOG-like gene expression (Koppolu et al. 2013). Furthermore, HvCEN was reported to interact with HvFT1 (VRN-H3) to regulate floral development and, thus, indirectly control GPE (Bi et al. 2019; Loscos et al. 2014). The consistent result in finding the five loci associated with GPE in experiments across the globe demonstrates the power of the HEB-25 population to detect essential genes/loci regulating GPE due to its intrinsic wide genetic variation.

In this study, the two QTLs at which wild alleles increased GPE the most were QGpe.HEB25-4H.3 (4H-43.5 cM, family no.16) and QGpe.HEB25-6H.1 (6H-117 cM, family no. 17 and 25). The former QTL was not detected when HEB-25 was evaluated in the European environments where the latter was only detected in Scotland, with much lower effect. Both of these two QTLs were detected in Dubai (UAE) by Saade et al. (2016) for GPE, but in that study, the former did have the largest effect while the latter QTL expressed a negative effect. In contrast, the QTL with the biggest GPE promoting effect in the counterpart European study on chromosome 5H at 165 cM (family no.18) was not detected for GPE in South Australian environment. Thus, this study provides another piece of the puzzle to help barley breeders better understand the location-specific effect of the wild barley alleles governing GPE in the HEB-25 population. In this study, the markers used to map yield component traits in the HEB-25 population were 6X more than those used in previous GWAS studies by Saade et al. (2016) and Sharma et al. (2018). The markers identified in this study, thus, are expected to be closer to the actual gene(s) than those identified in the previous study. For example, the marker associated with GPE and located near Vrs1 genes in this study, JHI-Hv50k-2016-107351, was 0.35 cM (777 kb in physical distance) from the actual Vrs1 gene. In contrast, Sharma et al. (2018) identified one GPE-linked SNPs for either Halle (Germany) or Dundee (Scotland), both of which reside 4 cM from Vrs1. Thus markers identified in this study should be much more useful and informative for subsequent fine mapping and marker-assisted selection for breeding to improve yields and yield components in barley.

For TGW: the TGW QTLs detected in this study overlapped with 11, 11, and 22 QTLs reported for TGW when the HEB-25 was evaluated at Halle (Germany), Dundee (Scotland) by Sharma et al. (2018), and Dubai (United Arab Emirates-UAE) by Saade et al. (2016), respectively. Notably, among four locations where HEB-25 was tested for TGW globally, TGW QTL, where the wild allele expressed the most significant positive effect, were usually environment-specific. For example, it was 3H-96.3 cM for Halle (Germany), 3H-107.8 cM for Dundee (Scotland), 6H-110.9 (Charlick, South Australia), and 3H-51.5 for Dubai (UAE). Thus, barley breeders should exercise caution depending on the target environment when selecting beneficial wild alleles from the HEB-25 population.

Among ten common TGW QTLs shared across the four global field trials with the HEB-25 in Germany, Scotland, UAE, and Australia, QTLs that reside closely to genes regulating flowering time (HvELF3, HvAP2(Zeo), HvPRR95, HvPRR1/HvTOC1/HvCO5/HvCO7) and row-type phenotype (Vrs1/3/4/5) were found to govern the TGW trait at all sites (Online Resource 3). The only common QTLs that increased TGW in all families at all four evaluating locations was QTgw.HEB25-5H.1 (1 cM). The wild alleles at this QTLS increased TGW in both control and salt-stressed field conditions and 2 years with the stark contrast in precipitation in Australia. Therefore, it could serve as a novel marker to add to the breeders’ toolbox to improve TGW in barley, especially in water-limiting or salt-stressed environments. This locus also merits further investigation as no known genes can be aligned to this locus thus far.

Conclusions

In this study, the field trial of the HEB-25 NAM population conducted in Australia identified QTLs unique to the Australian environment and different from the counterpart studies in the northern hemisphere. Our study showcased that genes known to regulate the flowering time and spike morphology in barley were pivotal in determining yield component traits like seed size and weight.

There were 18 hotspots associated with multiple grain size/weight traits in which one of them had wild alleles exerting a positive effect on both TGW and GPE. The wild alleles of genes/loci lying within this hotspot could serve as a valuable source for improving yield in Australia and worldwide.