Mining and predictive characterization of resistance to leaf rust (Puccinia hordei Otth) using two subsets of barley genetic resources

Sustainable barley (Hordeum vulgare L.) production will require access to diverse ex-situ conserved collections to develop varieties with high yields and capable of overcoming the challenges imposed by major abiotic and biotic stresses. This study aimed at searching efficient approaches for the identification of new sources of resistance to barley leaf rust (Puccinia hordei Otth). Two subsets, Generation Challenge Program Reference set (GCP) with 188 accessions and leaf rust subset constructed using the filtering approach of the Focused Identification of Germplasm Strategy (FIGS) with 86 accessions, were evaluated for the seedling as well as the adult plant stage resistance (APR) using two barley leaf rust (LR) isolates (ISO-SAT and ISO-MRC) and in four environments in Morocco, respectively. Both subsets yielded a high percent of accessions with a moderately resistant (MR) reaction to the two LR isolates at the seedling stage. For APR, more than 50% of the accessions showed resistant reactions in SAT2018 and GCH2018, while this rate was less than 20% in SAT2017 and SAT2019. Statistical analysis using chi-square test of independence revealed the dependency of LR reaction on subsets at the seedling (ISO-MRC), as well as at the APR (SAT2017 and SAT2018) stage. At seedling stage, the test of goodness of fit showed that GCP subset yielded higher percentages of resistant accessions than FIGS-LR in case of ISO-MRC isolate but the two subsets did not differ for ISO-SAT. At APR, FIGS approach performed better than GCP in yielding higher percentages of accessions in case of SAT2017 and SAT2018. Although some of the tested machine learning models had moderate to high accuracies, none of them was able to find a strong and significant relationship between the reaction to LR and the environmental conditions showing the needs for more fine tuning of approaches for efficient mining of genetic resources using machine learning.

evaluated for the seedling as well as the adult plant stage resistance (APR) using two barley leaf rust (LR) isolates (ISO-SAT and ISO-MRC) and in four environments in Morocco, respectively. Both subsets yielded a high percent of accessions with a moderately resistant (MR) reaction to the two LR isolates at the seedling stage. For APR, more than 50% of the accessions showed resistant reactions in SAT2018 and GCH2018, while this rate was less than 20% in SAT2017 and SAT2019. Statistical analysis using chisquare test of independence revealed the dependency of LR reaction on subsets at the seedling (ISO-MRC), as well as at the APR (SAT2017 and SAT2018) stage. At seedling stage, the test of goodness of fit showed that GCP subset yielded higher percentages of resistant accessions than FIGS-LR in case of ISO-MRC

Introduction
Cultivated barley (Hordeum vulgare subsp. vulgare L.) is the fourth most important cereal crop in the world after wheat, maize, and rice, in terms of production of 143.13 million metric tons and acreage of around 47.37 million hectares (FAOSTAT 2017). In Morocco, barley is grown on 2 million hectares in the arid and semi-arid regions with 1.23 t/ha of average grain yield, which is relatively low compared to North America (3.67 t/h) (FAOSTAT 2017). The lower national average grain yield of barley is due to limited or no use of inputs, and the prevalence of abiotic and biotic constraints. Foliar diseases such as powdery mildew (Blumeria graminis f. sp. hordei), net form net blotch (Pyrenophora teres f. teres) and spot form net blotch (Pyrenophora teres f. maculata), spot blotch (Cochliobolus sativus), and leaf rust (Puccinia hordei) are important biotic constraints that limit the grain and straw yields and their quality. Barley leaf rust caused by Puccinia hordei Otth (Ph) is one of the most destructive and globally spread barley diseases (Clifford 1985;Park et al. 2015). It is widely distributed throughout barley growing region, and can cause serious yield losses in the regions of North Africa, Europe, New Zealand, Australia, the Eastern and Midwestern parts of United States, and some parts of Asia, where susceptible and late maturing varieties of barley are sown (Arnst et al. 1979;Clifford 1985;Chicaiza et al. 1996;Brunner et al. 2000;Niks et al. 2000). Losses of barley production due to LR can reach up to 30% on susceptible cultivars (Cotterill et al. 1992;Griffey et al. 1994).
Applying fungicides is an efficient strategy to control major foliar diseases, but it is not economical for barley grown under marginal lands and low-input conditions of Morocco. Therefore, the use of resistant varieties is the most effective, economical, and environmentally safe way for controlling barley LR. This can be achieved by transferring identified resistant genes from diverse genetic resources into elite barley germplasm (Hajjar and Hodgkin, 2007;Rehman et al. 2020). To date, 23 Rph genes conferring hypersensitive resistance to barley leaf rust at the seedling stage (Rph1-Rph19, Rph21, Rph22, Rph25 and Rph28), and 3 APR genes (Rph20, Rph23, Rph24) have been identified from Hordeum vulgare subsp. vulgare, or transferred from H. vulgare subsp. spontaneum, and H. bulbosum Kavanagh et al. 2017;Yu et al. 2018;Rothwell et al. 2020;Mehnaz et al. 2021). However, LR resistance faces a big challenge from a rapidly evolving pathogen due to recombination and mutations which lead to the development of new pathotypes that overcome deployed single major Rph genes in a short time span (McIntosh 1988;Figueroa et al. 2016). Most of the barley varieties released in Morocco are susceptible to LR, and limited sources of resistance are available against the LR populations prevailing in the northern parts of Morocco. Therefore, it is required to evaluate and identify continuously new sources of resistance from existing germplasm, and from gene bank collections (Qualset 1975;Sing et al. 2015).
The genetic resources remain the most important source of parental germplasm for barley breeding programs to develop new varieties with high yield, better end-use quality, tolerant to abiotic stresses, and resistant to major diseases and pests. But the search for a given trait is limited owing to the large number of accessions being held in the genebanks. Further, the evaluation of these large collections for some traits can be very expensive. To facilitate the screening and the mining of genetic resources, it requires the development of intelligent sub-setting approaches to fit the available funding and facilities (ICARDA 2015). These approaches aim to select subsets from the original collection to harness maximum diversity within limited number of accessions (Gollin et al. 2000a, b). Frankel and Brown (1984) recommended the use of core collection which selects 5-10% of the original collection, representing maximum geographical or morphological diversities. However, because of the large number of accessions in the entire collection in genebanks, even a core collection can still be unmanageable for the evaluation of some traits, and other sub-setting approaches were suggested. Minicore collections were suggested by Upadhyaya and Ortiz (2001) to concentrate broad genetic diversity in smaller subsets. It allows selecting about 1% of the total accessions from the entire collection to represent maximum diversity. The Generation Challenge Program (GCP) (https://www.generationcp.org) recommended the development of a reference set representing 10% of the core collection to represent maximum diversity using molecular markers.
From ICARDA barley in-trust collection totaling more than 32,000 accessions, the barley core collection (composite set) of 3000 accessions of both cultivated and wild progenitor species (H. spontaneum) was selected based on climatological data of the collection sites; from which the Generation Challenge Program (GCP) developed a reference subset of 300 accessions based on the diversity of EST-derived, and genomic SSR markers (https://www.croptrust.org/wp/ wp-content/uploads/2014/12/Barley_Strategy_ FINAL_27Oct08.pdf), however, many researchers have reported on the limitations of core collections in capturing rare and adaptive alleles (Dwivedi et al. 2008;Xu 2010).
The Focused Identification of Germplasm Strategy (FIGS) was developed by ICARDA in collaboration with the Australian and the Russian partners as an alternative approach for efficient mining of genetic resources that maximize the likelihood of capturing specific adaptive traits in subsets of manageable size extracted from the original collection (Mackay 1990;Street et al. 2008). FIGS is based on the co-evolution between the accessions and the environmental conditions in which they evolved (Mackay 1995;Gollin et al. 2000a, b;Mackay and Street 2004;Bari et al. 2012). This approach exploits the development of the relationship between the specific sought-trait and ecogeographical data by filtering germplasm collections through exerting selection pressures of the emergence of a sought trait. When the relationship is confirmed, a manageable subset can be selected to include accessions with high probability of having the sought trait. FIGS subsets have allowed to identify for the first-time sources of resistance to Sunn pest in wheat (El Bouhssini et al. 2009), resistance to net blotch in barley (Endersen et al. 2011), and drought tolerance in faba bean (Khazaei et al. 2021).
The present study aimed at: (i) identification of sources of resistance to LR in FIGS_LR and GCP subsets; (ii) assessing the dependence of resistance on sub-setting approach; and (iii) search for the best model that describes the relationship between resistance to LR, and the environmental conditions using machine learning.

Plant material
Two barley subsets extracted from ICARDA in-trust collection available from the regeneration efforts conducted in Morocco to reconstruct the active and base collections were used in this study. A total of 188 accessions from the reference set constructed within the Generation Challenge Program (GCP) and extracted from the composite set of barley collection held at ICARDA based on diversity of EST-derived and genomic SSR markers (Supplementary Table S1). Another subset composed of 86 accessions was selected using filtering approach of the Focused Identification of Germplasm Strategy (FIGS_LR) based on the following parameters: • Count number of days where the average daily temperature is between 8-15°C, 10 days before the onset of growing period and up to 15% into the vegetative phase. • Remove sites with zero count from step 1.
• Sum daily rain for 10 days before the onset of growing period up to 10% into the vegetative phase. • Normalize both variables (steps 1 and 2) to range 0-1 for each site. • Add variables to create index 1.
• Rank based on index 1 and remove bottom 25 percent of sites.
For the remaining sites, the following was done: • From 10% into the vegetative phase until onset of grain filling divide into 3 separate sub-phases of equal length.
• For each sub-phase count the number of days where the average daily temperature is between 18-20 o C. • For each sub-phase determine the amount of precipitation. • Remove sites if any of the variables = 0 (3 count variables and 3 precipitation variables). • Normalize each variable for a range between 0-1.
• Add each variable and then add index 1 to create index 2. • Rank sites using index 2 from largest to smallest.
• Since there are more sites than the desired set size, then one accession could be chosen randomly from each site starting at the top ranked site until the desired set size is reached. Alternatively, this approach could be taken after one candidate accession is donated by each country represented in the candidate site list.
The climatic conditions layers were extracted from the GIS surfaces modeled from data collection sites as described by De Pauw (2008). The FIGS_LR subset has more accessions from Greece, Turkey, Ethiopia, and India (Supplementary Table S2).

Seedling screening of LR resistance
The seedling screening of GCP and FIGS_LR subsets was conducted under controlled conditions in the growth chamber with two pure isolates of LR (ISO-SAT and ISO-MRC). The single urediniospore was isolated from infected leaves collected from the experimental stations of Sidi Allal Tazi (ISO-SAT) and Marchouch (ISO-MRC) in 2017 and were multiplied on the susceptible barley cultivars (Bowman and Aglou) followed by collection and drying of urediniospores on silica gel and storage at -80 8C until further use.
Barley plants were grown in sterilized peat moss (supplemented with 14-14-14 NPK) in plastic cones (14 cm long cones with 3.8 cm diameter) positioned in a 14 9 7-unit tray (Steuwe & Sons, Inc., OR, United States). For each barley accession, 4-5 seeds were planted per cone in two replications. Each tray contained 96-test genotypes along with resistant (Philadelphia) and susceptible (Bowman) checks. Plants were raised in the growth chamber (Snijder Scientific, Tilburg, the Netherlands) with a photoperiod of 16 h light/8 h dark at 20 ± 1 8C. Inoculation was carried out on 10-12 days old seedlings when the first leaf was fully expanded. To prepare LR inoculum, urediospores were taken from the -80°C freezer and subjected to heat shock for 5 min at 40°C.
For each tray, 15 mg of urediniospores were suspended in 10 ml of light mineral oil (Novec 7100, Sigma Aldrich), and this spore suspension was sprayed onto plants as a fine mist using an airbrush (Revell, Munchen, Germany). Inoculated plants were left to dry for 20 min at the room temperature and were placed in growth chamber in the dark for 24 h at 18°C with * 100% relative humidity. Then plants were maintained in the growth chamber with a light/ dark period of 16/8 h at 20°C for symptoms development. The evaluation for LR reaction was carried out 12-14 days post-inoculation based on infection types (ITs) according to the 0 to 4 scale developed by Stakman et al. (1962). The seedlings were classified either as immune (0), resistant (0; and 1), moderately resistant (2), moderately susceptible (3), or susceptible (4).
At SAT, the disease was established naturally, but at Guich station the disease was initiated using the artificial inoculation. About 1 g of dried urediniospores were suspended in 200 ml of mineral oil and sprayed on the trial using an airbrush (Revell, Munchen, Germany). The establishment and spread of the disease were favored by covering the spreader rows with a plastic sheet overnight and by periodic sprinkler irrigation. The LR resistance was assessed for GCP and FIGS_LR subsets at growth stage 65-77 (Zadoks et al. 1974) using the modified Cobb scale (Peterson et al. 1948) which combined the LR severity (0 to 100%) and host response; 0 (Immune), no visible infection on plants; R (resistant), visible chlorosis or necrosis, no uredia are present; MR (moderately resistant), small uredia are present and surrounded by either chlorotic or necrotic areas; MS (moderately susceptible), medium sized uredia are present and possibly surrounded by chlorotic areas; S (susceptible), large uredia are present, generally with little or no chlorosis and no necrosis. The Coefficient of Infection (CI) was calculated by multiplying the infection response values (R = 0.2, MR = 0.4, MS = 0.8, S = 1) with the percent disease severity (0-100%) (Stubbs et al.1986), and the accessions were rated based on the average coefficient of infection (ACI) where values of 0-7, 8-16, 17-29, 30-50, and [ 50 were considered as resistant, moderately resistant, moderately susceptible, susceptible, and highly susceptible, respectively.

Comparing the reactions of GCP and FIGS subsets
The statistical analysis was performed using R software (R Core Team 2018). The statistical association between sub-setting approach and the reaction to LR was calculated using v 2 test of independence with significance level (a = 0.05) using the following equation: The equation used for calculating expected values in a test of independence was as follows: To find out the differences between FIGS and GCP subsets in terms of reaction to LR, the test of goodness of fit using v 2 test at a significance level (a = 0.05) was used where GCP was simulated to a random sample.
The expected values for the test of goodness of fit are calculated as follows: where E i is the expected value, n is the total sample size, and p i is the hypothesized proportion of observations in level i.
Both tests were performed using different groupings of reactions, all classes (I, R

Modeling of the reaction to leaf rust disease
The second pathway of FIGS using machine learning was investigated using the available reactions of the accessions of FIGS_LR and GCP subsets to find a function that links adaptive traits, environments (and associated selection pressures) with genebank accessions. We used environmental data from WorldClim1 databases as predictors. The WorldClim is an open access database providing global climatic layers describing past climatic profiles of collection sites intended for spatial modeling or mapping. It includes averages of monthly minimum and maximum temperatures, precipitation and bioclimatic variables (Fick and Hijmans 2017).
The following machine learning algorithms were used: K-nearest neighbors KNN (Kotsiantis 2007), Support Vector Machine SVM (Hsu et al. 2010), Random Forest RF (Breiman, 2001), Neural networks NNET (Venables and Ripley 2002), and Bagged Carts BCART (Kołcz 2000). Each machine learning model was tuned to select the best tuning parameters using a training set (70% of the total set), and then the best model was selected between different machine learning models based on several metrics including accuracy, specificity, and Kappa. The modeling metrics were computed on the test set (30% of the total set). In this study, R language and caret library were used for machine learning analysis (Kuhn 2008). Models were tuned for parameter's optimization and trained on 70% of the data and tested with 10 cross validation folds and 100 replications. In addition, modeling was done for the two isolates for the seedling stage. For the APR, modeling was done for the entire multi-locations data sets and for each location separately.

Seedling resistance
In the seedling test, successful artificial inoculation was carried out for the two isolates and diverse infection responses were recorded. The frequency distribution of infection response of GCP and FIGS_LR accessions at the seedling stage has been presented in Fig. 1 (IG 143,876,IG 143,886,IG 143,906,IG 143,929,IG 143,998,IG 143,999,IG 144,014,IG 144,064,IG 144,076,IG 144,108) and one in FIGS_LR (IG 18,957) were resistant to both isolates. Most of the resistant accessions originated from USA, Turkey, Greece, and Morocco.

Adult plant resistance (APR)
Under field conditions, good natural LR infection was recorded at the Sidi Allal Tazi during the three cropping seasons, and good artificial infection was established at Guich in 2018. However, late and light artificial infection at Marchouch during 2017 and 2018 seasons did not allow disease severity assessments. The uniformity of the disease development was assessed through the high susceptibility of the checks, Bowman and Aglou, at the adult stage at Sidi Allal Tazi and Guich sites. The good development of the LR allowed efficient screening of the germplasm at the adult plant stage, as shown by wide range of reactions observed (Fig. 2). The average of coefficient of infection (ACI) values across the environments ranged from 0 to 85 with several accessions showing contrasting reactions in different environments. Accessions of FIGS_LR and GCP subsets showed different distributions of the reaction classes with near normal distribution for SAT2017 and SAT2019, and positive skewness with high percentage of R accessions (ranged from 55.8 to 68.3%) for SAT2018 and GCH2018 (Fig. 2b and c). While at SAT2017 and SAT2019, this reaction class distribution percentage ranged from 5.59 to 19.64% (Fig.. 2a and d) for both subsets. When considering MR reaction, additional   Fig. 1b). The test of goodness of fit showed that GCP subset yielded higher percentage of accessions with R, but FIGS_LR subset yielded higher percentage of accessions with MR reactions in case of ISO-MRC isolate, but no significant differences were observed between the two subsets when tested with ISO-SAT isolate ( Table 2).
Except the grouping of two classes (I ? R ? MR, MS ? S) for the goodness of fit test, the tests of independence and goodness of fit were significant for  (Table 3).
For APR, the reaction to LR was dependent on subsets for Sidi Allal Tazi during 2017 (SAT2017) and 2018 (SAT2018) with respective v2 (P-values) of 0.001 and 0.02, respectively. But this dependence was not found in case of GCH 2018 (GCH18) and Sidi Allal Tazi 2019 (SAT2019) ( Table 4). The tests of goodness of fit showed that FIGS_LR outperformed GCP subset at Sidi Allal Tazi with higher percentages of accessions with R and MR reactions under heavy infection in 2017, but the opposite was observed during 2018 season at the same site. For GCH2018 and SAT2019 environments, no significant differences were observed between the two subsets.
When different groupings of the reactions were performed, the significance probability of the two tests were highly significant for SAT2017 and SAT2018, but not for GCH2018 and SAT2019, except for the test of goodness of fit for GCH2018 in case of the grouping (R ? MR ? MS; S ? HS) with P-value of 0.04 (Table 5).

Predictive modeling of the reaction to leaf rust
At the seedling stage, the tested machine learning models did not perform similarly for the two LR isolates. For ISO-MRC isolate, all models yielded a significant medium to very high accuracy. The maximum accuracy (0.94) was reached using the BCART model and was then chosen as the best model. The remaining modeling parameters showed the strong mathematical relationship between the reaction to ISO-MRC and the environmental characteristics (Table 6). However, the modeling pattern was opposite for ISO-SAT isolate where all the models were not significantly accurate (Table 7), since the accuracy was similar to the ''No Information Rate'' and hence demonstrating that the models were as good as the naïve model. It is noticeable that the specificity was much lower than sensitivity for all tested models.
For the APR, no model performed significantly for the two locations (Table 8). Accuracy was high for all models, however, the unbalanced data due to the higher number of resistant genotypes make the model not performing better than the naïve model because of the low values of specificity and high value of ''No Information Rate''. Among the tested models, RF was the best model for all locations.

Discussion
LR occurs annually with high incidence in the Northern regions of Morocco, and the Sidi Allal Tazi has been used as the LR hotspot for the barley germplasm screening. Over three years, most of the barley varieties and advanced breeding lines showed high susceptibility to P. hordei at this site. The   (Golegaonkar et al. 2009;Derevnina et al. 2013;Sandhu et al. 2014;Singh et al. 2015). Genetic resources conserved ex situ in the genebanks are important sources of breeders 'sought traits including the resistance to major diseases, but need efficient mining approaches. In this study, both FIGS_LR and    b,Rph3.c,Rph4.d,Rph5.e,Rph6.f Rph5,Rph7.g,Rph8.h,Rph9.i,Rph10.o,Rph11.p,Rph9.z Rph12,Rph2.j,Rph2.y,Rph2.t, whereas the isolate ISO-SAT was virulent on NILs carrying Rph1.a, Rph3.c,Rph4.d,Rph8.h,Rph9. Furthermore, of the 19 differentials tested, 11 (58%) showed differential interaction between both isolates. Hence, differential response of both LR isolates to FIGS_LR and GCP at SRT can be attributed to their diverse virulence spectrum (Fig. 1). In addition, among the resistant accessions to both isolates in case of FIGS_LR and GCP subsets, only 7 and 16% R accessions were common which further corroborate difference in their virulence spectrum. Of the 19 Bowman differential lines tested at the adult plant stage at Sidi Allal Tazi in 2017 cropping season (SAT2017), only one differential line carrying Rph2 (Rph2.y) displayed moderately resistant reaction to LR field population (unpublished data). Contrary to SRT, FIGS_LR performed better than GCP at APR. Except SAT2018, higher percentage of R and MR barley accessions were observed in SAT2017, SAT2019, and GCH2018 in FIGS subset compared to GCP (Fig. 2). In the present study, four accessions IG 143945, IG 144000, IG 144064 from GCP subset, and three accessions IG 28613, IG 28636, IG 33039 from FIGS_LR subset showed resistant (R) to moderately resistant (MR) response at the seedling and at APR stage (Table 1). Under Moroccan growing conditions, barley is planted in November and LR is the last disease which effect barley in March-April. Therefore, APR is quite important, and a large number of R-MR accessions identified in FIGS_LR and GCP subsets will be useful resource for combating LR. Most probably, LR resistant accessions identified in this study may possess either new R genes or allelic variants of existing R genes or a combination of both. A high-density genotyping and genome wide association studies seem to be a logical step to dissect the resistance diversity. These putative R genes could be either pyramided or used sequentially to ensure a better R gene deployment strategy.
The seedling resistance is usually characterized by hypersensitivity and is governed by single major genes, Such genes can be easily overcome by new LR races because of their excessive utilization over large areas which exert selection pressure on the pathogen population which lead to the emergence of new races, and eventual breakdown the effectiveness of resistance genes. Virulence has been detected for most known seedling Rph genes in various barley growing regions throughout the world. In Australia, only Rph3, Rph7, Rph11, Rph14, Rph15, and Rph18 of the characterized major genes were still effective to prevailing pathotypes (Cotterill et al. 1995;Park 2003). However, pathotypes virulent to Rph3 were detected in New Zealand (Cromey and Viljanen-Rollinson 1995), and the virulence for Rph7 has been identified in Israel (Golan et al.1978), Morocco (Parlevliet et al. 1981), and North America (Steffenson et al. 1993). Virulence for Rph11 and Rph14 has also been found frequently in many parts of the world (Fetch et al. 1998), and virulence to Rph15 was reported by Sun et al. (Sun 2007). Therefore, an accession with LR resistance at the seedling stage alone might not provide durable and effective resistance (Singh 1992;Park 2008;Singh et al. 2015).
APR against rusts is a key component of durable resistance in wheat (Singh et al., 2001). Similarly, APR to barley LR is a good strategy for effective disease control and the identification and characterization of such sources could facilitate their utilization in breeding programs. Since there are several accessions at the adult plant stage with MR and MS reactions or with slow progression of the disease based on the area under the disease progress curve (data not presented) under heavy rust epidemics, partial resistance and slow rusting mechanism could be considered to ensure a race non-specific and a more durable resistance. Several studies have promoted partial and non-race specific resistance in case of rusts and powdery mildew in barley and wheat as this type of resistance is available in some commercial varieties (Parlevliet and Kuiper 1977;Andres and Wilcoxon 1986;Niks et al. 2000;Stuthman et al. 2007). Several APR genes were well characterized and deployed in wheat to control rust diseases (Park and McIntosh 1994). In barley, three genes governing APR to LR have been identified and used (Rph20, Rph23, and Rph24) (Hickey et al. 2011;Singh et al. 2015;Ziems et al. 2017). Even if there are no reports of virulence for Rph20, Rph23 or Rph24, identifying new APR resistance genes for LR are essential for diversifying resistance and to promote gene pyramiding to increase resistance levels. Marker assisted selection (MAS) provides an opportunity to breeders to pyramid the APR genes in barley.
FIGS has shown its efficiency in identifying novel sources of resistance to powdery mildew, yellow and stem rusts, Sunn pest, and Russian wheat aphid in wheat (Bhullar et al. 2009;El Bouhssini et al. 2009Bari et al. 2012Bari et al. , 2014, and to net blotch of barley (Endresen et al. 2011). This study included the first attempt to compare FIGS with another subset, the Reference set of the Generation Challenge Program (GCP) selected from the global barley core collection based on diversity using SSR markers. FIGS sub-setting using filtering approach has allowed to identify higher percentages of accessions when combining R and MR reactions compared to GCP subset in case of field tests (except SAT2018). The reduced sample size as well as the non-balance between the two classes (Resistant and Susceptible) could explain the low predictability of the machine learning models. Modeling outcomes using machine learning approach were dependent on the isolates or predominant field pathogen populations and the environments. The results showed the need for further fine tuning of FIGS approach to consider the diversity of virulence of the pathogen populations using larger subsets. Overall FIGS remains more relevant as it focuses on the traits needed by users, uses available evaluation data, and allows to select subsets from all the collections compared to core and mini-core collections where the focus is only on the overall genetic diversity included in 10% and 1% of the whole collection. It will be interesting also to compare both sub-setting methods in yielding new different effective genes. This can be investigated using molecular markers or by screening the identified sources of resistance to a larger number of isolates with different virulence spectrums.

Conclusion
This current study suggests that the trait mining approach can be an efficient alternative to the core collection method. The resistant and moderately resistant accessions at the seedling and at the adult plant stages in this study are valuable resources of P. hordei resistance and can lead towards effective and durable resistance against P. hordei when combined with appropriate gene deployment strategies. The evaluation of larger subsamples in different environments, and against different pathotypes will allow the fine tuning of FIGS sub-setting approach using machine leaning.
Acknowledgements The authors would like to thank Dr. Kenneth Street for his help in developing FIGS subset and Mr. Amer El-Omrani, the technician at Sidi Allal Tazi experimental station for his help with field experiments.
Funding This study is supported financially through GIZattributed funding to ICARDA genebank and the CAIGE-GRDC-ICA00010 project.
Data availability The data that support the findings of this study are available in the ICARDA genebank database and can be obtained upon request from the corresponding author.

Declarations
Conflict of interest The writers declare that they have no known conflicting financial interests or personal relations that may have had an impact on the work presented in this article.
Human and animal rights This work does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.