Introduction

Groundnut is cultivated on 32.72 million ha, with an annual production of 53.93 million tons worldwide (FAOSTAT 2023). It is cultivated primarly as a rain-fed crop in the semi-arid tropics and sub-tropical regions where recurrent drought is widespread. The potential production and productivity of groundnut (Arachis hypogaea L.; 2n = 4x = 40) is limited due to recurrent and severe drought stress associated with climate change. Developing and deploying groundnut varieties with drought tolerance and desirable product profiles is vital for food security and global trade. Groundnuts significantly improve the nutritional status of humankind. Groundnut seeds are rich sources of carbohydrates, protein, lipids, vitamins, minerals and fiber. The seed contains all the essential amino acids making them a critical component of the human diet, especially in communities where animal-derived protein sources are not readily available (Mupunga et al. 2017).

Groundnut is relatively tolerant to drought stress than other traditional leguminous species (Wan et al. 2014). The quantity and quality of grain, seed oil and fodder values of groudnut are affected by drought stress (Abady et al. 2021). Severe drought stress occurring during the reproductive growth stage can lead to a yield loss of up to 33% (Pereira et al. 2016; Carvalho et al. 2017). Drought during grain filling and maturity reduced groundnut's total oil and linoleic acid contents (Dwived et al. 1996). Additionally, drought affects the inherent symbiotic nitrogen fixation capacity of crops, limiting grain yield and quality and affecting the feed quality of the haulm such as its nitrogen content, digestibility and metabolizable energy (Blümmel et al. 2012). Therefore, there is a need to develop and deploy drought-tolerant and locally adapted groundnut varieties to mitigate the effects of drought on groundnut yield and product profiles.

Reportedly, groundnut has marked genetic variability for drought tolerance and component traits for genetic improvement programs (Azevedo et al. 2010; Abady et al. 2021). Several candidate groundnut varieties with improved drought tolerance and agronomic traits have been developed through conventional breeding by the International Crop Research Institute for the Semi-Arid Tropics (ICRISAT) and through national breeding programs (Janila et al. 2016). However, the globally pace of new variety design, release and adoption of drought-tolerant groundnut is slow for several reasons. Drought tolerance is a polygenic trait conditioned by multiple genes with minor genetic effects and is subjected to genotype by the environmental interaction dragging selection gains (Ravi et al. 2011). Furthermore, the genetic base of the cultivated groundnut is narrow, and the introgression of genes from the wild species has limited success due to the ploidy differences providing unpredicted progeny segregations and selection response (Foncéka et al. 2009; Janila et al. 2016). Drought-tolerant and high-yielding varieties have yet to be developed and deployed globally. This is dependent on the identification of agronomic traits associated with drought tolerance and transferring of the genes underlying the target traits to locally adapted genotypes (Edae et al. 2014).

Advanced breeding and genetic innovations such as marker-assisted selection (MAS), genomic selection (GS), and targeted gene editing would accelerate groundnut variety design, and commercialization (Hasan et al. 2021; Pandey et al. 2014). These tools have been valuable in breeding programs to facilitate the identification of drought-resistance genes from germplamincluding landraces and wild relatives. The candidate genes can be transferred, pyramided, and fast-tracked in advanced breeding lines (Salgotra and Stewart 2020). Genetic and genomic tools are valuable resources for precision and speed breeding to release drought-resilient and market-preferred varieties.

Identification of genetic markers associated with drought tolerance and economic traits including high kernel yield, oil and oleic acid contents, and haulm yields and quality attributes, disease and insect resistanceis crucial for the development new variety with essential traits (Shaibu et al. 2020; Devate et al. 2022). To date, limited genes have been reported based on groundnut genome-wide association studies for drought tolerance (Bertioli et al. 2016). Zhou et al. (2021) identified SNP markers significantly associated with pod and hundred seed weights in groundnut. Shaibu et al. (2020) identified SNP markers for drought surrogate traits using the soil plant analysis development (SPAD) chlorophyll meter reading and leaf area index in groundnut. Zou et al. (2022) identified five SNP markers significantly associated with chlorophyll content in the groundnut. Pandey et al. (2014) identified one SSR marker associated with rust resistance and higher yield in groundnut. There is a limited knowledge on the number of genetic markers and marker-traits association for drought tolerance and economic traits based on diverse genetic pool of groundnut. Knowledge on marker-traits associations is crucial for marker-assisted selection, trait integration and precision breeding in groundnut. Thus, the objective of the current study was to identify genomic regions and candidate genes associated with drought tolerance and component traits for gene introgression, and to guide marker-assisted breeding of drought-tolerant groundnut varieties.

Material and methods

Plant material

Ninety-nine genetically diverse groundnut genotypes acquired from ICRISAT in Patancheru, India were used for the study. The genotypes were selected based on desirable traits, including drought tolerance, resistance to foliar diseases such as late leaf spot and rust, high oil and oleic acid contents, and early-to-medium maturation. This study used a high-yielding groundnut cultivar ICGV98412 released in India, Ghana and Ethiopia as a comparative control. The details of the genotypes are described in Supplementary Table 1. The genotypes were evaluated under drought-stressed (DS) and non-stressed (NS) conditions at ICRISAT (latitude, 17.51°N, longitude, 78.27°E, and altitude 545 m) during the 2018/2019 and 2019/2020 post-rainy cropping seasons using a 10 × 10 alpha lattice design with two replications. The plants were phenotyped in five environments, including four experiments [drought-stressed and non-stressed conditions in two seasons (2018/19 and 2019/20)] under field conditions and using the leasyScan platform under non-stressed conditions with four replications.

Phenotypic evaluation and data analysis

Phenotypic data were collected on days to 50% flowering (DF), chlorophyll meter reading (SCMR), specific leaf area (cm2 g−1), leaf relative water content (LRWC), plant height (PH, expressed in cm), number of primary branches (PB), pod yield per plant (PY, expressed in g plant−1), shelling percentage (SHP, expressed in %), seed yield per plant (SY, expressed in g plant−1), total biomass per plant (TBM, expressed in g plant−1) and harvest index (HI) (%). From the LeasyScan experiment, leaf area (LA), projected leaf area (PLA), leaf area index (LAI), and light penetration depth (LPD), digital plant height (DPH) and digital biomass (DBM) data were collected. The phenotypic data were subjected to analysis of variance. The homogeneity of error variances was tested using the Bartlett test before pooled analysis of variance. The means of the treatments were separated using the least significant difference (LSD) procedure at the 5% significant level.

Genotyping

The 99 groundnut genotypes were grown under field conditions at ICRISAT, Hyderabad, India. Genomic DNA was extracted from the leaves of three weeks old seedlings at the Center of Excellence in Genomics and Systems Biology at ICRISAT. The DNA was extracted using the modified cetyl trimethyl ammonium bromide (CTAB) method (Mace et al. 2003). DNA was mixed with a loading dye and quantified by loading 1 μl DNA on the 0.8% (w/v) agarose gel containing 10 μl ethidium bromide (10 mg/ml) and run at 80 V for 30–45 min. Subsequently, the DNA was visualized under a UV transilluminator (Bio-Rad Universal Hood II Gel Doc System). DNA quality and concentration were estimated using NanoDrop Spectrometry (UV 160 A, Japan). A DNA sample of 47 ng/µl per genotype was submitted for genotyping. The DNA samples were genotyped with a 48 K Afymetrix SNP array (‘Axiom_ Arachis’) (Wankhade et al. 2023).

SNP data were analyzed using the Axiom analysis suite (Thermo 2018). SNP markers with more than 20% of missing data and minor allele frequencies lower than 0.05 were eliminated. This resulted in 15,575 SNP markers, which were used for further analysis. Ninety-nine genotypes were used after the data imputation. The genotype data filtering was performed using TASSEL version 5.2.86 software.

Population structure and principal component analysis

The population structure pattern and admixture detection were inferred using a Bayesian model-based clustering algorithm implemented in STRUCTURE version 2.3.4 (Pritchard et al. 2000). The length of the burn-in period and Markov Chain Monte Carlo (MCMC) were set at 10,000 iterations (Evanno et al. 2005). The K value was set between 1 and 10 to generate the number of subpopulations in the genotypes. Twenty runs were performed for each K-value to accurately estimate the number of populations. Delta K values were calculated, and the appropriate K value was determined by the Evanno et al. (2005) method using the STRUCTURE Harvester program (Earl et al. 2012). SNP marker-based PCA and kinship analysis were subsequently conducted with GAPIT (Lipka et al. 2012).

Genome-wide association analysis (GWAS)

GWAS was performed with Tassel 5.2.86. Six models were evaluated in the marker-trait association analysis, including the naïve, Q, K, PCA, PCA + K, and Q + K. Association signals were observed on PCA + K and Q + K models using a mixed linear model (MLM). Quantile–quantile (QQ) plots were presented with –log 10 (P) of each SNP and expected P value, and the Manhattan plots were generated using TASSEL 5.2.86. Marker-trait association with or above 20% phenotypic variance explained (PVE) was considered to be a major association. Candidate genes covering major SNPs within a 50 kb region upstream or downstream of peak SNPs were selected from the PeanutBase website tool (https://www.peanutbase.org).

Linkage disequilibrium (LD) and decay

The LD between polymorphic SNPs retained after filtering at a cutoff of MAF 0.05, 0.1, and 0.2 was calculated in the form of r2 using TASSEL 5.2.86. LD decay plots were generated using the R script written by Remington et al. (2001) using R Studio (2021.09.0 Build 351© 2009–2021 R Studio, PBC).

Results

Phenotypic variation

Significant genetic variation were recorded for yield and yield components among the tested groundnut genotypes evaluated under drought-stressed and non-stressed conditions (Abady et al. 2021). Analysis of variance for canopy-related traits phenotyped using LeasyScan planform showed highly significant (P < 0.001) genotype differences (Table 1). Mean performance of the groundnut genotypes for 13 phenotypic traits under drought-stressed and non-stressed conditions, and six canopy-related traits under non-stressed conditions are presented in Supplementary Tables 2, 3, and 4 in that order. Wide phenotypic variations existed for all the assessed traits.

Table 1 Descriptive statistics, mean squares and significant tests for six phenotypic traits among 99 groundnut genotypes evaluated under LeasyScan non-stressed conditions during the 2019 rainy season at ICRISAT, Patancheru, India

Population structure, principal component analysis (PCA) and linkage disequilibrium (LD)

Population structure analysis of the 99 groundnut genotypes resolved three sub-populations with 32% admixture genotypes (Abady et al. 2021). Allocation into clusters was done at 70% ancestry. Twenty-four, 22 and 21% of the genotypes were assigned to sub-populations 1, 2, and 3, respectively. The PCA based on SNP marker data also confirmed the presence of three subgroups, corresponding with the population structure results (Fig. 1). The first three principal components accounted for 32% of the total variation (Fig. 1a) and revealed three distinct clusters in the population (Fig. 1b).

Fig. 1
figure 1

Principal component analysis of the 99 groundnut genotypes based on 15,575 high-quality SNPs with MAF > 0.05 using the first three principal components. The first three principal components indicated 32% of the variation as indicated on the scree plot (a). The genotypes were grouped into three distinct clusters (b)

Three different threshold cutoff levels of MAF, i.e., 0.05, 0.1 and 0.2 were used to explore the effect of minor alleles on the nature and decay of genome-wide LD and resented in Fig. 2a, b and c in that order. LD was found to be decreasing with increasing bin distance. LD declined to half of its original value in three different threshold cutoff levels of MAF, i.e., 0.05, 0.1 and 0.2 was 3.98, 6.33 and 14.48 Mb, respectively.

Fig. 2
figure 2

Effect of three MAFs, 0.05 (a), 0.1 (b) and 0.2 (c), on the nature of LDs and their decay in advanced breeding lines

Marker trait association

Forty-seven and 13 SNP markers were significantly associated with DLA, LAI, SLA, LRWC, PB and HSW, were identified using PCA + K and Q + K models, respectively (Table 2). Among the significantly associated SNP markers, nine were identified by both models. Forty-five SNP markers were significantly associated with LRWC and seven SNP markers were associated with one or two traits. Thus, in this study, 50 SNP markers were identified (Table 2 and Fig. 3) Graphical representation of significant SNPs identified for the assessed traits were depicted with a Manhattan map along with QQ plots (Fig. 3). The QQ plots showed that the deviation between observed and expected P values was very small, suggesting a true positive association between the SNPs and the traits.

Fig. 3
figure 3figure 3figure 3

Manhattan map and QQ plots showing SNP markers associated with different agronomic traits among 99 groundnut genotypes based on the PCA + K and Q + K models. Note: a) and b denote digital leaf area under non-stressed (NS) conditions; c and d digital leaf area index under NS conditions; e and f specific leaf area under NS conditions; g and h leaf relative water content under drought-stressed (DS) conditions; i and j number of primary branches under DS conditions; k and l number of primary branches under NS conditions; m and n hundred seed weight under DS conditions

Traits

The GWAS output identified one SNP marker with significant association with both DLA and LAI under non-stressed (NS) conditions using the Q + K model (Fig. 3a and c). The phenotypic variance of these traits explained by the marker was 21 and 20%, respectively. PCA + K and Q + K models detected one SNP marker significantly associated with SLA under drought-stressed (DS) conditions (Fig. 3e and f). The phenotypic variance of SLA explained by the significant marker ranged from 22 to 23%. Similarly, PCA + K and Q + K models detected two SNPs markers with significant association with PB under DS (Fig. 3i and j) and NS (Fig. 3k and l) conditions. The phenotypic variance of the trait explained by the markers ranged from 20 to 23%. The study identified 43 SNPs with significant association with LRWC under DS conditions through either the PCA + K or Q + K or, both models (Table 2, Fig. 3g and h). The phenotypic variance of the trait explained by the significant SNPs ranged from 20 to 31%. Further, the GWAS analysis detected one SNP significant association with HSW under DS conditions. Both PCA + K and Q + K models were identified for this marker (Fig. 3m and n). The phenotypic variance of the trait explained by the significant SNP marker ranged from 28 to 31%.

Table 2 Association mapping with PCA + K and Q + K models identified marker-trait associations and predicted genes within 50 kb of marker positions

Discussion

Phenotypic variability

Drought is the leading abiotic stress, which limits groundnut production and productivity globally. Significant progress were reported on groundnut pre-breeding for drought tolerance through conventional breeding methods (Janila et al., 2016). However, the pace of drought tolerance breeding and variety release has been slow due to the complex nature of gene action and the genotype by environment by management interaction effect (Ravi et al. 2011). Deploying drought surrogate traits is critical for effective drought tolerance breeding in crop genetic resources, including groundnut. Furthermore, understanding the genetic base of physiological and yield-related traits in groundnut could provide an opportunity to develop drought-tolerant cultivars (Pereira et al. 2016). Wankhade et al. (2022) proposed an integrated phenotyping approach for screening groundnut genotypes for drought tolerance. The authors reported early generation selection gains using the LeasyScan method with complementary drought stress indices under managed stress environment.

The present study revealed wide genetic variability for the assessed physiological and yield-related traits among the tested groundnut genotypes which were evaluated under drought-stressed and non-stressed conditions (Abady et al. 2021). The analysis of variance revealed highly significant genotypic differences for canopy-related traits, including digital leaf area (LA), digital leaf area index (LAI), specific leaf area (SLA), leaf relative water content (LRWC), digital plant height (PH), digital biomass (DBM) (Table 1). Traits related to canopy development are tightly associated with plant water use (Vadez et al. 2015). The leaf area influences the rate of transpiration as the wider the leaf area, the greater the rate of transpiration because broad leaves tend to have more stomata (Maylani et al. 2020). Thus, selecting genotypes with small leaf area could enhance groundnut productivity under DS conditions. Diffuse light penetrates deeper into a plant canopy, and increases photosynthesis and crop production (Zhange et al. 2022). Reportedly, there is a strong positive association between biomass production and transpiration efficiency under drought-stressed (DS) conditions due to the genotypes’ root system to mobilize water from the soil for stem elongation and biomass accumulation (Vadez et al. 2016). Previous findings indicated a positive association between reduced SLA and increased leaf thickness under DS conditions. This correlation results in thicker cell wall, which helps to prevent water loss by evaporation and achieve higher water use efficiency (Zhou et al. 2020). LRWC is the most useful parameter to measure plant water status in terms of the physiological consequence of cellular water deficit (Barr and Weatherley 1962). This parameter represents the balance between the water supply to the leaf tissue and the transpiration rate (Lugojan and Ciulca 2011). Thus, maintenance of higher LRWC under -stressed conditions could be a good indication of drought tolerance. The observed genetic variability in this study could be utilized in groundnut breeding programs to develop drought-tolerant and high-yielding varieties.

Population structure and PCA

In the present study, the population structure of the 99 groundnut genotypes revealed the presence of three sub-populations (Abady et al. 2021). Similarly, the PCA results displayed the presence of three sub-groups (Fig. 1b). The low number of sub-groups indicates low genetic differentiation, given that most genotypes were India collections. Combining information generated from the genetic population structure and PCA is useful for the selection of various parents in breeding programs and the mapping of marker-trait associations.

Linkage disequilibrium in groundnut

LD is the non-random co-occurrence of two or more alleles (Lewontin and Kojima 1960) Determination of LD and its decay with the genetic distance helps to assess the resolution of association mapping and desirable numbers of SNPs on arrays (Vos et al. 2017). LD decay depends on cultivation patterns, breeding methods, breeding history, and evolutionary history (Devate et al. 2022). Higher LD decay was observed in the present study than in the previous findings (Pandey et al. 2014; Otyama et al. 2019). This could be attributed to possible intercrosses among the advanced breeding lines. This suggests that the utilization of more SNP markers and population size could enhance the power and efficiency of MTAs in the groundnut breeding programs.

Candidate genes associated with SNPs

Identifying genomic regions using genome-wide SNP markers is a vital approach for developing climate-resilient varieties. In this study, a pleiotropic gene effect was detected between digital leaf area and leaf area index using the marker AX-177643135, indicating collinearity between leaf canopy traits (Table 2, Fig. 3a and c).

Significant SNPs for LRWC were identified on chromosomes A02, A03, A05, A10, BO3, B08 and B09. In addition, the following five SNP markers with major effect were identified: AX-176804539, AX-176794990, AX-177641299, AX-176795390 and AX-176822255 located on chromosome A03 at 49.5 kb (PVE = 20%), BO3 at 45 kb (PVE = 31%) and BO9 at 45 kb (PVE = 21%), A03 at 36.5 kb (PVE = 20%) and B03 at 36.5 kb (PVE = 20%), in that order (Table 2, Fig. 3g and h). These markers showed strong association signals for LRWC under drought-stressed conditions.

For leaf relative water content (LRWC) under drought-stressed conditions, the SNP SNP AX-176794990 [chromosome B03; -log10(P value of 5.68 to 5.79)] was located within 21.68 kb of the Araip.9NG64 gene, which encodes for an RNA-binding protein (RBP) 24-like protein family, involved in RNA processing, export and stability. Muthuswamy et al. (2021) and Yan et al. (2022) reviewed the role of RBPs in abiotic stress response and proposed that the proteins regulate stress response through RNA metabolism. Based on BLAST analysis of the gene sequence of Araip.9NG64, a match was found with UBP1-associated protein 2C (UBP2c) with two RNA recognition motifs of 85 amino acids each, reportedly playing a crucial role in leaf senescence (Na et al. 2015). Li et al. (2002) reported that ABA Activated Protein Kinase (AAPK), which is present in guard cells, interacts with the AAPK Interacting protein (AAPKIP 1), which is a RBP that interacts with mRNA of dehyhdrin, a protein implicated in drought stress.

The formation of cuticular layers with increased wax and cutin content on leaf surfaces is closely related to drought tolerance. Identification of drought tolerance-associated wax components and cutin monomers and the genes responsible for their biosynthesis is essential for understanding the physiological and genetic mechanisms underlying drought tolerance and improving crop drought resistance (Yang et al. 2022). SNP, AX-147235264 located on chromosome B10 (− log10 (P value of 3.88) accounted for 20% of the variance in LRWC under drought-stressed conditions (ST_LRWC). It was present near the Aradu.PKW10 gene, which encodes for a CD2 antigen cytoplasmic tail-binding-like protein. CD2 cytoplasmic tail binding protein 2 is a component of the U5 snRNP complex involved in RNA splicing. PSTPIP1, encodes CD2 antigen-binding protein 1 (CD2BP1), also known as proline/serine/threonine phosphatase-interacting protein 1 (PSTPIP1). SNP, AX-147235264 was reported by Otyama et al. (2022) to be responsible for linolenic acid accumulation and wax formation under drought-stress conditions (Yang et al., 2022).

The current association analyses identified a SNP, AX- AX-176816874 (chromosome B09; − log10 (P value of 4.29) affecting ST_LRWC. It was present in the Araip57P4D gene, which encodes for a Chitinase (Class V). Some chitinases are expressed in response to abiotic stress (Hamid 2013; Zhou et al. 2020). Lv et al. (2022) found that drought stress treatment induces significant upregulation of Class V chitinases.

SNP, AX-147244306 (chromosome B03; − log10 (P value of 4.30) was identified as associated with ST_LRWC. It was present in the Araip.4J8RL gene, which encodes for a polynucleotide phosphatase/kinase type with Intracellular protein interaction domains with a role in abiotic stress tolerance (Dasuni and Nailwal 2020). It could assist in the selection for drought tolerance in groundnut.

SNP, AX-147244415 specific to ST_LRWC (chromosome B03; − log10 (P value) of 4.29) was present within the Araip.SVH5H gene encodes for a G family of Abscisic acid (ABC) transporter protein of the half-size transporters. The SNPs are expressed in the vascular tissues and are mainly involved in the translocation of ABA across the plasma membrane and tonoplast (Jarzyniak and Jasiński 2014). Kuromori et al. (2011) reported that a mutant version of this gene results in increased transpiration losses and drought susceptibility.

SNP AX-176798839 (chromosome A05; − log10 (P value) of 3.94) associated with ST_LRWC was present in the Aradu.VIU0I gene encoding for a zinc finger MYM-type 1 like protein. It is responsible for signalling and regulation under abiotic stress. Zinc finger proteins enhance plant drought resistance by increasing the levels of osmotic adjustment substances (Han et al. 2020).

SNP AX-176817979 [(chromosome A02;+log10 (P value of 4.85 and 3.98)] was identified as associated with the number of primary branches under non-stressed conditions (Table 2, Fig. 3i and j). It was present within 4.0 kb of the Aradu.ML3P3 gene encodes for a P-type ATPase of the Arabidopsis 2 protein. P-type ATPase type 2 belongs to the haloacid dehalogenase (HAD) superfamily and is split into four groups, i.e., Na+, K+, H+ Ca2+, Mg2+ and phospholipids (Thever and Saier 2009). Animals and fungi have Na+/K+-ATPases (P2C ATPases) and Na+-ATPases (P2D ATPases), respectively, that carry Na+ exclusion (Axelsen and Palmgren, 2001). The Na + /k + -ATPase helps to maintain low Na + and high K + concentrations within the cells.

In addition, for SN AX-177639302, which is on chromosome B09 at 4.05 kb (PVE = 28%) shows a strong association signal for seed weight under drought-stressed conditions (Table 2, Fig. 3m and n). Similarly, Gangurde et al. (2020 and 2022) reported a seed weight-associated genomic region on chromosome B09 in groundnut. This marker could provide an opportunity for seed size improvement in groundnut.

Conclusions

The study identified SNP-traits associations through association mapping in Arachis hypogaea. Forty-eight significant associated regions were detected for important physiological and yield-related traits using the PCA + K and Q + K models. Forty-seven SNPs significantly associated with leaf area, leaf area index, specific leaf area, leaf relative water content, number of primary branches and hundred seed weight under drought-stressed conditions were identified. The identified MTAs and candidate genes in this study could be used to understand the genetic basis of genomic regions of important physiological and yield-related traits and to accelerate the development of drought-tolerant and high yielding groundnut cultivars. Furthermore, the markers could be validated and deployed in groundnut breeding programs for gene pyramiding and trait integration.