Genome-wide association studies reveal novel loci for resistance to groundnut rosette disease in the African core groundnut collection

Key message We identified markers associated with GRD resistance after screening an Africa-wide core collection across three seasons in Uganda Abstract Groundnut is cultivated in several African countries where it is a major source of food, feed and income. One of the major constraints to groundnut production in Africa is groundnut rosette disease (GRD), which is caused by a complex of three agents: groundnut rosette assistor luteovirus, groundnut rosette umbravirus and its satellite RNA. Despite several years of breeding for GRD resistance, the genetics of the disease is not fully understood. The objective of the current study was to use the African core collection to establish the level of genetic variation in their response to GRD, and to map genomic regions responsible for the observed resistance. The African groundnut core genotypes were screened across two GRD hotspot locations in Uganda (Nakabango and Serere) for 3 seasons. The Area Under Disease Progress Curve combined with 7523 high quality SNPs were analyzed to establish marker-trait associations (MTAs). Genome-Wide Association Studies based on Enriched Compressed Mixed Linear Model detected 32 MTAs at Nakabango: 21 on chromosome A04, 10 on B04 and 1 on B08. Two of the significant markers were localised on the exons of a putative TIR-NBS-LRR disease resistance gene on chromosome A04. Our results suggest the likely involvement of major genes in the resistance to GRD but will need to be further validated with more comprehensive phenotypic and genotypic datasets. The markers identified in the current study will be developed into routine assays and validated for future genomics-assisted selection for GRD resistance in groundnut. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-023-04259-4.

Despite being the second most important legume crop after common bean (Phaseolus vulgaris) in many sub-Saharan African (SSA) countries, groundnut productivity is extremely low, owing to various biotic and abiotic challenges. One of the most important foliar diseases in SSA is Groundnut Rosette Disease (GRD), which is endemic to SSA and was first reported in Tanzania in 1907 . GRD has since spread to several countries in SSA and its offshore islands leading to losses of up to 100% in pod yield, especially if the symptoms occur before flowering (Okello et al. 2010. A complex of three agents that function in a synergistic manner cause GRD; groundnut rosette assistor luteovirus (GRAV); groundnut rosette umbravirus (GRV) and its satellite RNA (satRNA) Deom et al. 2000). The satellite RNA depends on GRV for its replication and on GRAV for its encapsidation (Taliansky et al. 2000). Aphids (Aphis craccivora Koch) are the principal transmission vectors for the GRD agents (Lynch 1990).
The presence of all the three disease agents results in severely stunted and bushy plants with reduced leaf size and shortened internodes (Waliyar et al. 2007;Nigam et al. 2012). Sole infection from GRAV or GRV agents alone result in either no symptoms or in a mild transient mottle or yellowing in groundnut foliage (Waliyar et al. 2007). The main cause of GRD damage is the GRV-satRNA (Murant and Kumar 1990;Taliansky et al. 2000), which is responsible for symptoms ranging from green (Okello et al. , 2017Mabele et al. 2021), chlorotic/yellow (Okello et al. 2017;Mabele et al. 2021) and mosaic rosette (Waliyar et al. 2007;Mukoye et al. 2020). Although there is evidence suggesting that different forms of satRNA from different regions of the world may be responsible for different symptoms (Murant and Kumar 1990;Mukoye et al. 2020), the studies are not comprehensive enough to be conclusive on the specific satRNA forms causing yellow, green or mosaic symptoms. Disease scoring has, therefore, been done according to the number of plants showing at least one of the GRD symptoms rather than by the types of symptoms observed (Reddy 1991;Waliyar et al. 2007;Mugisa et al. 2016;Mukoye et al. 2020).
The most effective and practical solution for groundnut farmers is to grow GRD resistant varieties (Nigam et al. 2012). However, the complexity of the viral agents and the involvement of a transmission vector have made successful breeding for complete resistance difficult. Previous reports suggest that resistance could be specific to the various agents (Waliyar et al. 2007) or to the vector (Minja et al. 1999). Furthermore, the genetics of resistance to the disease agents or the vector are not clearly understood Olorunju 1992;Herselman et al. 2004;Usman et al. 2015;Athanas 2015;Nalugo et al. 2016). Although efficient development of resistant varieties in other crops with similar complex diseases has been possible through the use of molecular markers (Awata et al. 2021), groundnut breeding programs in Africa are largely conventional. The few reported molecular studies deployed include Amplified Fragment-Length Polymorphisms (AFLPs) for resistance to aphids (Herselman et al. 2004) and Simple Sequence Repeat (SSR) markers for GRD resistance (Pandey et al. 2014;Athanas 2015). However, the reported associated markers were not validated and, therefore, are not used routinely in any of the breeding programs in SSA.
Recent developments in groundnut genomics (Bertioli et al. 2016(Bertioli et al. , 2019Pandey et al. 2017;Clevenger et al. 2018;Korani et al. 2019) provide great opportunities for enhanced utilization of state-of-the-art molecular markers in breeding programs in SSA and elsewhere. Linkage disequilibrium (LD) or association mapping has rapidly become a useful method in elucidating the molecular basis underlying phenotypic variation (Alqudah et al. 2020). Genome Wide Association Studies (GWAS) have been used to identify molecular markers and Quantitative Trait Loci (QTLs) associated with economically important traits in groundnut (Wang et al. 2019;Zhang et al. 2020Zhang et al. , 2021Otyama et al. 2022). The only reported GWAS study that involved GRD resistance in groundnut (Pandey et al. 2014) used germplasm from the 'reference set', majority of which are not part of the SSA breeding programs. In this study, we performed a GWAS for GRD resistance using 213 genotypes selected from the African core collection. Our aim was to exploit the natural variation present in this representative set of genotypes to identify novel sources Page 3 of 20 35 of resistance to GRD, associated molecular markers and putative genes.

Plant material
Two-hundred and thirteen (213) breeding lines from nine African countries that were part of the African core collection were used in this study (Supplementary Table 1; Fig. 1). The African core collection was constructed from a nucleus of 116 non-redundant breeders-preferred genotypes and expanded to 300 genotypes using genotyping data and the core hunter software (De Beukelaer et al. 2018). The 213 genotypes of the subspecies fastigiata (32 "hybrid" (combinations between botanical types), 97 Spanish, 10 Valencia) and the hypogaea subspecies (74 Virginia) used in this study were selected based on availability of seed for multi-location trials. Each trial contained a maximum of 200 genotypes per season depending on seed availability.

Field screening and evaluation for disease resistance
Field evaluation was done in Eastern Uganda at two GRD hotspot locations, Serere and Nakabango (Okello et al. 2010). Serere is located 33A°26′43.943″ E and 1A°31′58.580″ N at 1126 m above sea level while Nakabango is located 33 o 12′47.588″ E and 0 o 31′26.762″ N at 1169 m above sea level. The 200 lines were planted in two 1-m row plots at a spacing of 15 cm within rows and 45 cm between rows in a 10 × 20 lattice design. The trial was planted in two replicates across the two locations in three seasons (2020A, 2020B and 2021B). Genotypes Ug-43_Oug-RED_BEAUTY_UG  and Gh2-54_GhaII-NUMEX_03 were used as susceptible  checks, while Ug-41_Oug-DOK_1_RED_UG and Ug-194_  Oug-ICGV_90099 were used as resistant checks (Supplementary Table 1).
GRD incidence was recorded based on the intensity and presence of any one of the symptoms recorded in literature (Waliyar et al. 2007;Okello et al. 2014Okello et al. , 2017Mukoye et al. 2020;Mabele et al. 2021). Percentage GRD incidence (Waliyar et al. 2007) was recorded at 30, 60 and 90 days after planting (DAP). Percentage disease incidence (PDI) was calculated as: The PDI data at 30, 60 and 90 days were used to calculate the Area Under Disease Progress Curve (AUDPC) using the formula: where y i is the PDI at the ith observation; t i is time (in days) at the ith observation and n is the total number of observations (Simko and Piepho 2012).

Statistical analysis of phenotypic data
Best linear unbiased predictions (BLUPs) and thereafter variance components within environments were estimated in the lme4 package (Bates et al. 2015) in R (R core team 2021) by manipulating the REstricted Maximum Likelihood (REML) method using the model: where Y ijk is the kth observation for the ith genotype; µ is the overall mean; G i is the Genotype effect, Rj is the replication effect while R/B jk is the effect of blocks nested in replicates, respectively; ε ijk is the error term associated with Y ijk .
BLUP variance components estimated within environments were appropriated to calculate Broad-sense heritability (H 2 bs) for GRD using the formula: where σ 2 g is the genetic variance component and σ 2 e is the residual (error) component and nr is the number of replications.

GRD PDI (%) =
Number of plants showing rosette symptoms Plant stand count at a given crop stage × 100 BLUPs were further used to generate frequency distribution curves and in GWAS.

DNA isolation, genotyping and SNP calling
Three seeds per genotype were planted per pot in the screen house at the Regional Center for Drought Adaptation Improvement (CERAAS) in Senegal West Africa. Thinning was done to retain one plant per genotype. Twenty mg of oven-dried young leaves from a single plant were collected 21 days after planting. DNA was isolated using the MATAB protocol (Gawel and Jarret 1991) and purified using the Zymo DNA purification Kit (ZYMO Research USA). A final concentration of 100 ng/µl was obtained for genotyping.
Genotyping was done using the Thermofisher SNP array Axiom Arachis2 with 48 K SNPs Korani et al. 2019). SNP data were extracted from raw files and filtered using Axiom Arachis Suite Version 4.0.3 from Thermofisher scientific (https:// www. therm ofish er. com/ fr/ fr/ home/ life-scien ce/ micro array-analy sis/ micro array-analy sis-instr uments-softw are-servi ces/ micro array-analy sis-softw are/ axiom-analy sis-suite. html). The raw SNPs were filtered at a call rate > 0.95 and minor allele frequency > 0.05. The distribution of the final filtered high-quality SNPs was plotted across the chromosomes using CMplot (Yin et al. 2021).

Genetic diversity, population structure and linkage disequilibrium (LD)
Filtered SNPs were used to draw a Neighbor-Joining dendrogram in TASSEL 5.2.67 (Bradbury et al. 2007). Principal Component Analysis (PCA) was done in the SNP & Variation Suite (SVS version 8.9.0). Ten principal components (PCs) and the additive model were used to generate Eigen values. The first three principal components of the variation were plotted and visualized in R software using the scatterplot3d 0.3-41 package (Ligges and Machler 2003). The Discriminant Analysis of Principal Components (DAPC) was done using the adegenet v. 2.1.5 package in R software by retaining fifty principal components and clustering the genotypes into four groups (Jombart 2008). LD decay was estimated using the software Pop-LDdecay v.3.41 ) using the parameter "-MaxDist 500". Script Plot_OnePop.pl in the package was then used to plot the estimated r 2 values over 10 kb bins. The r 2 threshold was set to 0.2.

GWAS analysis
Marker trait associations (MTAs) were calculated by combining the filtered SNP dataset of the genotypes with the corresponding BLUPs in R software using the Genome Association and Prediction Integrated Tool (GAPIT) version 2 package (Tang et al. 2016). The enriched Compressed Mixed Linear Model (ECMLM) method which builds on the Compressed/Mixed linear model factors by grouping individuals into clusters and stipulates the relationship among groups to correct for population structure (Li et al. 2014) was used as below; where y is a vector of the phenotype (disease levels); β represents unknown fixed effects, as well as population structure and marker effects; u is a vector of size s (number of groups) for unknown random polygenic effects following a distribution with mean of zero and covariance matrix of G = 2K 2 a and K is the group kinship matrix with element K ij (i, j = 1, 2, ....s) representing the relationship between group i and j, and 2 a is an unknown genetic variance. X and Z are matrices for β and u while e is a vector of random residual effects that are normally distributed with zero mean and covariance R = I 2 e . where I is the identity matrix and 2 e is the unknown residual variance.
The resulting associations were displayed as Manhattan plots alongside quantile-quantile (Q-Q) plots to demonstrate the model fitness using qqman package in R (Turner 2018).
The P values for each marker were adjusted for false discovery rate (FDR) (Benjamini and Hochberg 1995) and used to select significant associations (P < 0.05). Candidate genes were identified within 250 kbp distance of the significant marker using Arachis duranensis and Arachis ipaensis reference genomes. Information on the location of the genes and their annotations were obtained from the A. ipaensis, A. duranensis and annotation files (https//peanutbase.org/).

Identification of haplotypes
Stable markers within identified significant QTL regions were used as references for building the haplotype blocks.
All markers that were within the LD decay distance of 250 kbp made up a haplotype block. Individuals with ambiguous nucleotide calls were excluded from analysis. Phenotypic data were categorized based on the identified haplotypes and used to test for association. One way ANOVA with Duncan's test as a post hoc test was used to identify significant associations and measure specific differences between pairs of means in R using package DescTools ( et al. 2021). Only haplotypes that were present in at least five or more genotypes were considered for the statistical analysis. Further, Haploview v4.2 (Barrett et al. 2005) was used to visualize the presence of LD between the SNP markers within the significant haplotypes. We used combined dataset analysis to identify genotypes harboring unique haplotypes and further established the extent of diversity among the resistant genotypes in comparison to the African core collection. Table 1 provides descriptive statistics for the response of groundnut germplasm to GRD. The most common symptoms observed were green and yellow rosette (Fig. 2). Highly significant differences (P < 0.001) were observed among the genotypes for AUDPC across all the Nakabango trials (Table 1). At Serere, data revealed significant differences (P < 0.05) among the genotypes in seasons 2020A and 2020B. There were no significant differences observed for Serere 2021B. The broad sense heritability was low (0-30%) for environments Serere 2020B and Serere 2021B; moderate (31-60%) for environments Serere 2020A, Nakabango 2020A, 2020B and high (> 60%) for Nakabango 2021B. The frequency distribution graphs for AUDPC showed near normal distribution for environments Serere 2020A and Nakabango 2020A while for Serere 2020B and 2021B, AUDPC values were skewed to the right (Fig. 3). Environments Nakabango 2020B and 2021B were normally distributed (Fig. 3).  Fig. 1). LD decayed more rapidly in the B sub-genome (177 kbp) in comparison to the A sub-genome (388 kbp) (Supplementary Fig. 1). The Neighbor-Joining dendrogram, PCA and DAPC all grouped the groundnut genotypes according to market class and not by country of origin ( Fig. 5A-C). The Virginia and Spanish group clusters were the most distinct with minimal contamination within the major clusters (Fig. 5). Clusters 2 and 3 within the DAPC analysis were composed of a mixture of Spanish and Virginia (cluster 2), and Valencia and Hybrid (cluster 3) (Fig. 5C). Although a few Virginia genotypes clustered with the Spanish, there were no Spanish genotypes that clustered with the Virginia genotypes. The first 3 PCs explained a total of 67.3% (48.6%, 11.8% and 6.9%) genetic variation across the genotypes indicating the superior quality of SNPs used in the analysis (Fig. 5B).

Genomic regions associated with GRD resistance
Due to low disease pressure in Serere that resulted in the lack of significant genetic variation in the response of genotypes to GRD for season 2021B, this dataset was not included in the GWAS analysis. Both the genotypic and phenotypic datasets that were used for GWAS have been made available at this link (https:// figsh are. com/s/ ebf60 2b52e a2c55 07f26). GWAS for Serere 2020A and 2020B yielded no significant markers ( Supplementary Fig. 2) and for that reason, were not used for any further analysis or data interpretation. All the results presented below are for Nakabango.

Haplotype-based association analysis
We identified five haplotypes from the respective QTL regions that were associated with resistance to GRD (Table 3; Fig. 7). All the haplotypes were located on chromosome A04 except one that was located on chromosome B08. Tests of significance for all the possible allelic combinations at each haplotype block is given in Supplementary  Table 3. Box plots drawn using each season and combined data confirmed the differences in performance between the favorable haplotypes and the alternative allelic combinations ( Supplementary Fig. 3). One of the haplotype blocks (TGAA), was just 1 Mbp away from a major disease resistance gene (TIR-NBS-LRR) (Supplementary Table 4).
Using combined datasets, we identified 39, 25, 13, 9 and 9 genotypes that harbored favorable haplotypes 1, 2, 3, 4 and 5, respectively (Table 3). There were 46 non-redundant genotypes from the combined dataset that harbored at least one favorable haplotype ( Supplementary Fig. 4). Two genotypes (Ug-5_Oug-SERENUT_9T_UG and Ug-164_Oug-ICGV_SM_06518) harbored all the favorable haplotypes ( Supplementary Fig. 4). Most of the 46 genotypes were Virginia (39) types with only 5 being Spanish and one each as Valencia and Hybrid types. A majority of the 46 genotypes were from Uganda (Table 3) and were genetically similar, forming a cluster within the predominantly Virginia market class group (Fig. 8). There were hardly any GRD resistant genotypes among the Spanish market class cluster (Fig. 8).

Identification of candidate genes
We identified a non-redundant set of 383 genes within 250 kbp of all significant SNPs, of which 253 were from the A sub-genome (Aradu) while the remaining 130 genes were from the B sub-genome (Araip) (Supplementary Table 4). Of the 383 candidate genes identified, 62 (43 from subgenome A and 19 from sub-genome B) were unknown proteins while an additional 37 (31 from sub-genome A and 6 from sub-genome B) were uncharacterized (Supplementary Table 4). A total of 17 markers were localized within genes, 10 on the A sub-genome and 7 on the B sub-genome (Supplementary Table 4). Two markers from sub-genome A, AX-147219924 and AX-147219925, were localized within a disease resistance protein (TIR-NBS-LRR; 39,354,055-39,358,311 bp) as shown in Fig. 9. There was a cluster of 9 "Disease resistance response proteins" on chromosome B04 that spanned from 15,5743, 372 bp to 15,709,800 bp. The other candidate genes onto which markers were localized included entatricopeptide (ppr) repeat-containing protein, peroxisome biogenesis protein 1-like isoform, protein root hair defective 3 homolog 2-like, vesicle-associated membrane protein 725, exocyst complex component 84b, Myosin heavy chainrelated protein, Poly(rc)-binding protein 3-like protein, Ser/thr-rich protein t10 in dgcr region-like protein, argonaute family protein, Zip zinc/iron transport family protein and Phosphate transporter 1 (Supplementary Table 4).

Stability of GRD resistance across seasons
Most of the top 20 most resistant groundnut genotypes showed stability across all the three seasons, with 14 genotypes showing top levels of resistance in all the three seasons, while an additional five were stable across two seasons (Table 4). Expectedly, Ugandan accessions dominated the top 20 performers category recording 9 stable genotypes across all seasons and another 3 genotypes across 2 seasons.

Discussion
We sought to shed light on the genetic basis of Groundnut Rosette Disease (GRD) resistance using a diverse set of cultivated groundnut germplasm that had been carefully selected using genotypic data and breeders' preferences. We identified significant Marker Trait Associations (MTAs), favourable haplotypes and candidate genes tightly linked to significant markers. Our key findings reveal important aspects on choice of germplasm, environments, molecular markers, the identification of candidate genes and their implications on future genetic and genomic studies for GRD resistance in groundnut.

The choice of genotypes and environments
We have demonstrated that the African core collection was suitable for the identification of GRD resistance loci. We also captured the predominant groundnut market classes used across Africa. Collections with useful diversity such as core and mini core sets have been recommended as more Despite this careful selection of germplasm, our results also revealed the minimal existence of GRD resistant sources from the Spanish market class. Thirty-nine out of 46 (84.8%) of GRD resistant genotypes were from the Virginia class compared with 5 out of 46 (10.9%) from the Spanish class despite the Spanish class having the highest number of genotypes (97) in the evaluation set. This finding was inconsistent with earlier reports (Subrahmanyam et al. 1998) but may also have to do with the region where the experiment was undertaken and the existing pathogen isolates. The experiment was done in Uganda where majority of the breeding lines are Virginia types that have been bred for resistance to the pathogen isolates specific to the region. The Ugandan breeding lines were therefore more likely to be adapted to the pathogen isolates, and hence better performance. It is not surprising therefore that a majority of the GRD resistant and stable genotypes were from Uganda. Future studies will need to screen the same set of germplasm across different African ecologies to fully confirm their stability across different locations that might have different GRD pathogen isolates. Deom et al. (2000) reported region-specific clustering of GRV and sat-RNA isolates in comparison to GRAV, further indicating that a thorough characterisation of the GRD agents will be necessary for future gene and marker discovery studies. Future studies will need to decipher what forms of satRNA are responsible for the different symptoms observed. While the current study reported mainly the yellow and green rosette, all the three major GRD symptoms have been previously reported in Eastern Africa, ranging from green (Okello et al. , 2017Mabele et al. 2021), chlorotic/ yellow (Okello et al. 2017;Mabele et al. 2021) and mosaic rosette (Mukoye et al. 2020). It will be extremely important to partition the various symptoms and fully understand the corresponding agents or forms of satRNA responsible for the specific symptoms. The focus of this initial study was to determine broadly, resistance versus susceptibility under Ugandan GRD hotspots. The Area Under Disease Progress Curve (AUDPC) tool was useful in providing a quantitative summary of GRD intensity over time for each of the genotypes observed. No doubt, GRD will not be fully understood unless the specific agents and forms of satRNA are fully characterised with the corresponding reactions that they elicit from different genotypes.
The lack of consistent disease pressure in Serere resulted in no significant marker-trait associations. Accurate phenotypic datasets are critical for successful GWAS results (Gage et al. 2018). Earlier studies in GRD also reported significant and positive correlation between broad sense heritability (H 2 bs ) and increased disease pressure (Van der . Our results are a strong indication that future GRD studies should include artificial inoculation to enhance disease pressure and its uniformity across trials. Lack of sufficient disease pressure would result in poor detection of causal alleles, especially those with minor effects (Davis et al. 1990;Zheng et al. 2018). Studies in other crops have also reported the need for enhanced disease pressure for accurate QTL identification (Gowda et al. 2018;Sitonik et al. 2019). While we were not able to detect any QTLs for Serere, it is also not clear if there could have been causal alleles with minor effects that we may have missed for Nakabango location too. Previous investigations in GRD enhanced disease pressure by growing plants in the glasshouse and inoculating with viruliferous aphids that had been reared on GRAV-infected groundnut (Naidu and Kimmins 2007) or using the field-based infector-row technique (Bock and Nigam 1988). Nevertheless, the consistency of QTLs detected in Nakabango across three seasons under natural inoculation will remain significant and a strong foundation for future studies in GRD.

The suitability of the markers used in population structure analysis and GWAS
We used SNP markers based on the two diploid reference genomes of A. duranensis (sub-genome A) and A. ipaensis (sub-genome B) (Bertioli et al. 2016). Though fewer than expected, these SNP markers were fairly evenly distributed across the chromosomes, worked extremely well and were highly informative for establishing the population structure, and subsequently for the GWAS. There was a consistent absence of markers at the top of chromosomes A05 and B05 that we attributed to tetrasomic recombination, which is quite frequent in groundnut (Leal- Bertioli et al. 2015). The choice of the best markers for genotyping groundnut is always difficult given the ploidy and the large genome size (2.7 Gb). Although SNP markers called from reduced genome representation libraries (Gupta et al. 2015;Zhao et al. 2016;Han et al. 2018) or from transcriptome sequencing (Chopra et al. 2015) have been used previously in groundnut, they tend to result in homoeologous SNP calls (Zhao et al. 2016;Peng et al. 2020) unless the very expensive option of whole genome resequencing (WGRS) at high coverage is applied (Agarwal et al. 2018) to improve accuracy of calls. The SNP markers used in the current study had earlier been validated using the HAPLOSWEEP pipeline, which applies a haplotype-based method to retain allelic polymorphisms between genotypes . A recent study comparing different SNP development pipelines recommended the use of Axiom Arachis2 48 K SNP array followed by HAPLOSWEEP as the most accurate pipeline resulting in informative homozygous SNP calls (Peng et al. 2020). According to Korte and Farlow (2013), the power of GWAS to detect significant MTAs is dependent on the phenotypic variation explained (PVE) by the SNPs. The minimum PVE recorded in our current study was 25%, which suggests that the SNP markers used were informative and sufficiently captured the existing phenotypic variation. However, the overall low density of the SNPs significantly reduced the power to identify relevant genomic regions with higher resolutions. Disease resistance is generally known to be controlled by both qualitative and quantitative genes (Jiquel et al. 2021). Although the GWAS results point to qualitative resistance based on the two major peaks identified, it will be difficult to rule out the involvement of quantitative resistance, which was also strongly supported by the frequency distribution graphs. While the QTLs identified in the current study will form an important basis for further understanding of the genetics of GRD resistance, future studies will need to include higher marker densities that would ensure that each causal genomic region is adequately captured. The lower numbers of SNPs in the current study may have reduced the power to identify rare variants, especially those with small effects (Gibson 2012).
Linkage disequilibrium (LD) is the non-random association between alleles at different loci in a breeding population and is the result of interplay of several factors including linkage, population structure, relatedness, selection and genetic drift (Flint-Garcia et al. 2003;Bush and Moore 2012).
Understanding the LD pattern is crucial in genomic analysis as it determines the resolution and power of association analysis for a given population. Our study observed an overall LD decay of 250 kbp, suggesting that at least 10,384 markers would have been required to adequately scan the genome (2.7 Gb) for the population that we studied. Although our markers were slightly less, they were extremely informative across the population and provided consistent QTL peaks across seasons, especially for Nakabango location. The large LD blocks reported in our study are common in self-pollinating crops and are much smaller than those reported in other groundnut studies (Pandey et al. 2014;Otyama et al. 2019;Zhang et al. 2020;Zhou et al. 2021;. Future studies will need to use significantly more markers with a higher number of diverse genotypes to enhance the resolution of association analysis. Additional markers will also be needed for the A sub-genome where the markers were fewer and LD decay much slower than in the B sub-genome, consistent with earlier reports (Zhao et al. 2017).

Genetics of GRD resistance, candidate gene identification and haplotype analysis
Moderate (51%, 58%) and high (68%) broad sense heritability (BSH) estimates reported for Nakabango location in our study was lower than in other studies that had fewer germplasm, but also with enhanced disease pressure either using artificial inoculation (Amoah et al. 2016) or infector-rows (Kayondo et al. 2014;Nalugo et al. 2016). Although our frequency distribution curves suggested that GRD resistance in groundnut is a quantitative trait, the clear peaks identified from GWAS indicate the likely involvement of a few major genes. In addition, two of the significant markers were located on a disease resistance gene (TIR-NBS-LRR) on chromosome A04, and one of the markers was located on an argonaute family protein on chromosome B04. We identified four significant haplotypes on chromosome A04. Preceding studies conducted via conventional breeding approaches proposed that GRD resistance is simply inherited and controlled by a single dominant gene (Olorunju et al. 1992;Athanas 2015) or two independent recessive genes (Nigam and Bock 1990;Olorunju et al. 1992). More studies will still be required to conclude the genetic control of GRD in groundnut. Resistance (R) genes, which encode mostly nucleotidebinding site and leucine-rich repeat (NBS-LRR) proteins (Dangl and Jones 2001;Yuksel et al. 2005;Mchale et al. 2006) facilitate the ability of plants to fight pathogens through an antiviral mechanism known as Effector-triggered immunity (ETI). NBS-LRR proteins do this by recognizing effectors released by pathogens which result in activation of downstream signaling pathways consequently triggering plant defense reaction toward various pathogens (Bao et al. 2018;Li et al. 2017;Dubey and Kunal 2018). In the NBS-LRR cluster of proteins, the Toll/interleukin-1 receptor (TIR) associated with GRD resistance in this study is the most common and has been reported to play a role in the detection of Avr proteins such as in the tobacco mosaic virus (TMV) (Dubey and Kunal 2018), in Pseudomonas syringae in Arabidopsis thaliana (Kim et al. 2009) and in a downy mildew-resistant genotype in grapevine (Vitis vinifera L.) .
Argonaute family proteins have been implicated in RNA interference (RNAi), a gene silencing mechanism deployed by plants to fight viral infections by hindering expression of genes during and post transcription (Muhammad et al. 2019). The involvement of argonaute proteins in the specific translational control of viral transcripts has been anticipated as an essential factor in resistance against viruses arbitrated by NBS-LRR proteins (Marone et al. 2013). Future studies will not only need to validate the QTLs identified in the current study using bi-parental mapping populations, but also characterise all the candidate genes within these QTLs. The functional markers identified in the current study will be developed into easy-to-use marker assays and validated for future routine genotyping and early generation selection for GRD resistance.
Haplotype analysis has been used in groundnut to distinguish botanical varieties (Zheng et al. 2022) and characterize different traits of interest (Wang et al. 2018;Liu et al. 2022;Zou et al. 2022) in past studies. The five favourable haplotypes identified in the current study provide an immediate resource for marker development and functional gene identification. Developing marker assays targeting candidate genes within the haplotype blocks will be a more precise approach for identifying putative functional markers for routine selection for GRD resistance but will still need to be validated using bi-parental populations. Haplotype based markers, once validated, will distinguish any new recombination blocks of interest on the chromosome that produce any favorable or unfavorable phenotypes (Bhat et al. 2021). The two genotypes (Ug-5_Oug-SERENUT_9T_UG and Ug-164_Oug-ICGV_SM_06518), which harboured all the favourable haplotypes will be useful, both as donor parents for introgressing GRD resistance, but also as resources for better understanding the genetics and evolution of GRD resistance alleles.

Conclusion
Our results open a new chapter for GRD resistance studies and breeding in groundnut in Africa. Our findings, which include the identification of novel genomic regions, associated haplotype blocks and putative candidate genes that affect GRD resistance, will pave the way for marker assisted breeding for GRD. Bi-parental mapping populations and routine marker assays will need to be developed for validating the genomic regions identified for more efficient selection for GRD resistance in the future. Given the complexity of the disease, future studies should be planned more carefully to enable the full understanding of the genetics of resistance to the various agents as well as the vector. While single location experiments will enhance our understanding of the genetics of resistance to individual isolates, the search for more durable resistance in farmer-preferred varieties should be undertaken across several locations and seasons under high disease pressure. The current collaboration that involves several African countries will form a solid backbone for future successful characterisation of the host, the vector, as well as the various pathogen agents. Several advanced breeding tools including Next Generation Sequencing (NGS), Rapid Generation Advance (RGA), digital data capture, precision phenotyping, as well as gene editing should be deployed appropriately to speed up the varietal development process and enhance our understanding of this disease.
Page 17 of 20 35 and JFR selected the diverse germplasm set. JPC, POA did the genotyping and analysed SNP data. PB undertook the haplotype analysis. DAH, RE and DKO acquired funding for the study. PW and DKO supervised the overall study. All authors read and approved of the final manuscript.

Funding
This study was made possible by the generous support of the American people through the United States Agency for International Development (USAID) through Cooperative Agreement No. 7200AA 18CA00003 to the University of Georgia as management entity for U.S. Feed the Future Innovation Lab for Peanut (2018)(2019)(2020)(2021)(2022)(2023). The contents are the responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. Additional support was provided by International Fund for Agricultural Development (IFAD) through Grant No. 2000001621 to the CGIAR Africa Rice Center (AfricaRice) and the Integrated Breeding Platform (IBP) as management entity for the EBCA (Enhancing institutional breeding capacity in Ghana, Senegal and Uganda to develop climate resilient crops for African smallholder farmers). Author EA was supported by Makerere University Regional Center for Crop Improvement, an East and Southern African Centre of Excellence, funded by the world bank through the International Development Association Report No. PAD 1436, Project ID P151847.

Declarations
Conflict of interest All the authors declare that there is no conflict of interest.
Availability of data and material All data generated or analysed during this study are included in this published article.

Code availability N/A.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.