Development and evaluation of the utility of GenoBaits Peanut 40K for a peanut MAGIC population

Population and genotype data are essential for genetic mapping. The multi-parent advanced generation intercross (MAGIC) population is a permanent mapping population used for precisely mapping quantitative trait loci. Moreover, genotyping-by-target sequencing (GBTS) is a robust high-throughput genotyping technology characterized by its low cost, flexibility, and limited requirements for information management and support. In this study, an 8-way MAGIC population was constructed using eight elite founder lines. In addition, GenoBaits Peanut 40K was developed and utilized for the constructed MAGIC population. A subset (297 lines) of the MAGIC population at the S2 stage was genotyped using GenoBaits Peanut 40K. Furthermore, these lines and the eight parents were analyzed in terms of pod length, width, area, and perimeter. A total of 27 single nucleotide polymorphisms (SNPs) were revealed to be significantly associated with peanut pod size-related traits according to a genome-wide association study. The GenoBaits Peanut 40K provided herein and the constructed MAGIC population will be applicable for future research to identify the key genes responsible for important peanut traits. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-023-01417-w.


Introduction
Peanut (Arachis hypogaea L.) is one of the most important oil crops worldwide (Lu et al. 2019).Developing high-quality and disease-resistant varieties with high yields has been a major goal of peanut breeding programs.Molecular marker-assisted selection (MAS) is one of the most effective plant breeding methods (Hasan et al. 2021).Elucidating the genetic basis of important peanut traits will help to improve specific characteristics of peanut cultivars through MAS.The development of next-generation sequencing technology and the availability of reference genomes for the cultivated groundnut and its ancestral species (Bertioli et al. 2016;Bertioli et al. 2019) have facilitated the mapping of genes mediating important peanut traits (Liu et al. 2020, Sun et al. 2022, Qi et al. 2022).
Ziqi Sun and Zheng Zheng contributed equally to this work.
Single nucleotide polymorphism (SNP) arrays are robust high-throughput genotyping tools that are less expensive than next-generation sequencing platforms (Liu et al. 2022).Pandey et al. (2017) developed a high-density "Axiom_Arachis" genotyping array with 58K SNPs, which greatly promoted the mapping of genes related to key peanut characteristics.For example, this array was used to identify the genes mediating the resistance to late leaf spot (Moretzsohn et al. 2023).It has also been used along with a recombinant inbred line (RIL) population to reveal the genomic regions and candidate genes associated with the seed weight and shelling percentage of groundnut (Gangurde et al. 2023) as well as with the African core groundnut collection to detect novel loci for the resistance to groundnut rosette disease on the basis of a genome-wide association study (GWAS) (Achola et al. 2023).Additionally, the 58K SNP array was used to analyze the genetic diversity of Korean peanut germplasm (Nabi et al. 2021).
However, the targeted SNPs cannot be adjusted after the SNP probes are fixed in the routine chip SNP array (Liu et al. 2022).Therefore, Xu et al. (2020) developed genotyping-by-target sequencing (GBTS), which involves liquid chip technology and is characterized by its low cost, limited demands on facilities, highly flexible marker types, sharable and accumulative marker data, and limited requirements for information management and support.Moreover, this technology is widely applicable to the following areas including evaluating germplasm, constructing high-density genetic linkage maps, genetic mapping, and protecting intellectual property rights associated with crop varieties (Xu et al. 2020).To date, several GenoBaits marker panels have been developed for animals and plants, including GenoBaits Maize 20K (Guo et al. 2019), GenoBaits Rice 10K (Hussain et al. 2022), GenoBaits Soy40K (Liu et al. 2022), GenoBaits Wheat 16K (Huang et al. 2022), and GenoBaits Porcine SNP 50K (Wang et al. 2022).
Conventionally, populations used for quantitative trait locus (QTL) mapping have included RIL populations, doubled haploid backcrossed populations, or F 2 populations derived from two parents, which can only be used to analyze two alleles and limits the genetic recombination and resolution for detecting QTLs (Bandillo et al. 2013).To overcome the limitations of bi-parental populations, a multi-parent advanced generation intercross (MAGIC) strategy was initially proposed for crops by Mackay and Powell (2007).This strategy can be used to analyze multiple alleles and to increase recombination rates and mapping resolutions (Cavanagh et al. 2008).Several MAGIC populations are available for diverse crops, including rice (Meng et al. 2016), maize (Dell'Acqua et al. 2015), and wheat (Stadlmeier et al. 2018).These populations have been used for the high-resolution dissection of the QTLs and genes responsible for complex agronomic traits.
To identify the genes associated with important peanut traits and develop useful markers for breeding, a MAGIC population was constructed and GenoBaits Peanut 40K was developed in this study.The objectives of this study were to (1) construct an 8-way MAGIC population using eight elite founder lines; (2) develop a liquid chip array GenoBaits Peanut 40K for peanut; (3) conduct a genetic analysis for the eight founder lines and the MAGIC population at the S2 stage; and (4) perform a GWAS for four traits related to peanut pod size using the MAGIC population at the S2 stage and its parents.
Fuhuasheng and Silihong (F, G) are landraces originated from Shandong and Liaoning provinces in China, respectively (Table 1).Fuhuasheng is one of the most prominent parental varieties included in the pedigrees of most peanut varieties in China.Silihong, which produces pods that typically contain three or four seeds with red seed coat, is widely cultivated in northeastern China.NC94022 (H) is a late-maturing breeding line with a prostrate growth habit and originated in the USA (Shrestha et al. 2013).

SNP selection and array design for GenoBaits Peanut 40K
A diverse set comprising 353 peanut germplasms that underwent a whole-genome re-sequencing (20×) analysis was used to select SNPs.Approximately 0.93 million high-quality SNPs and insertions/deletions (Arachis hypogaea cv.Tifrunner version 1) were identified after the quality control and filtering: missing rate > 0.05 (any alleles with fewer than five supporting reads were marked as missing), minor allele frequency (MAF) < 0.01, and number of heterozygous alleles > 10 ( Zheng et al. 2022).The SNP sites were selected according to the following criteria: (1) unique for each of the eight founder lines used as the parents of the MAGIC population (e.g., the genotype of one parent was A:A, whereas the genotype of the other seven parents was G:G); (2) evenly distributed across 20 chromosomes (as much as possible).The selected SNP sites were evaluated by MolBreeding Biotechnology Co., Ltd.(Shijiazhuang, China).Probes that were designed on the basis of the flanking sequences and targeted capture sequencing technology were subsequently synthesized.The effects of the selected SNPs on genes were predicted using SNPEff v5.0 (Cingolani et al. 2012).

Plant materials and phenotypes
The 297 S2 plants derived from one 8-way cross (A/E//D/G///B/C//F/H) and further two generation of single seed descent and the eight founder lines were used to evaluate GenoBaits Peanut 40K and DNA isolation and genotyping with GenoBaits Peanut 40K Genomic DNA was extracted from young unfolded leaves using the Plant Genomic DNA Extraction Kit (Tiangen Biotech, Beijing, China).The purity and integrity of the extracted DNA was evaluated by 1% agarose gel electrophoresis, whereas the DNA concentration was precisely determined using Qubit.
The high-quality DNA samples were sequenced using GenoBaits Peanut 40K by MolBreeding Biotechnology Co., Ltd.(Shijiazhuang, China).The raw data were filtered for quality using the FASTQ software (Chen et al. 2018) and then aligned to the peanut reference genome (Arachis hypogaea cv.Tifrunner version 1) using the BWA software (Li and Durbin 2009).The standard pipeline of the GATK software (Poplin et al. 2018) was used to detect SNPs for genotyping.Finally, the SNP set was filtered according to the following parameters: missing rate < 0.3 and MAF > 0.05.

Diversity and population structure analyses and GWAS
The diversity of the 297 MAGIC lines and eight founder lines was analyzed using the UPGMA algorithm implemented in the TASSEL v5.0 software (Bradbury et al. 2007).The phylogenetic tree was drawn using the online program iTOL v6.7.3 (Letunic and Bork 2021).The population structure was deduced using ADMIXTURE v1.30 (K = 1-20) (Alexander and Lange 2011).The mixed linear model (MLM) implemented in TASSEL v5.0 (Bradbury et al. 2007) was used for the association analysis and the GWAS threshold was set as 0.05/n, with n representing the number of markers.

Construction of a MAGIC population for peanut
A population was obtained from an 8-way cross involving the eight elite founder lines (Table 1).According to the method described by Bandillo et al. (2013), a half-allele mating system was used for the three stages required for the construction of the MAGIC population.At the first stage, 28 biparental crosses were conducted by inter-mating the eight founder lines.To obtain enough hybrids for the subsequent crosses, 30 seeds from each parent were sown.The resulting 28 F 1 lines were inter-crossed for the 4-way cross (i.e., all 210 of the possible crosses).
The combinations were set so that no parent was represented more than once in the 4-way cross.The 210 4-way F 1 lines were inter-crossed for the 8-way cross (i.e., all 315 possible crosses were completed in the same manner).
For the 8-way cross, 4-36 confirmed hybrids were obtained from each of the crosses and advanced by selfing, with an average of approximately 250 seeds harvested per cross at the S2 stage.A subset of the 8-way cross consisting of 35 crosses with a population size of approximately 200 (or 500 for one cross) was selected and used for advancing generations via single seed descent.Thus, the target population comprised 7000 lines (i.e., 35 × 200).The subset was selected in such a manner that only one of the nine possible crosses was chosen (e.g., one of A/B//C/ D///E/F//G/H, A/B//C/D///E/G//F/H, A/B//C/D///E/ H//F/G, A/C//B/D///E/F//G/H, A/C//B/D///E/G//F/H, A/C//B/D///E/H//F/G, A/D//B/C///E/F//G/H, A/D//B/ C///E/G//F/H, and A/D//B/C///E/H//F/G).The other crosses were stored at −20 °C for later use.
To more precisely genotype the MAGIC population, 30,082 sites that were specific to one of the eight founder lines were designated as 1:7 (i.e., the genotype of one parent differed from that of the other seven parents), whereas 9918 sites were designated as 2:6 (i.e., the genotype of two parents differed from  Vol:. ( 1234567890) that of the other six parents) to ensure the 40,000 SNPs were evenly distributed on the 20 chromosomes (Supplementary Table S2).The founder line with the most unique sites was N741, followed by N745.
The founder lines with the fewest unique sites were N734 and N743 (Supplementary Table S2).The number of polymorphic SNPs between each pair of the eight founder lines (28 combinations) ranged from 1815 (between N739 and N744) to 18,458 (between N741 and N745) (Supplementary Table S3).Probes were designed for each SNP in both the forward and reverse direction, but three primers were designed for nine SNP sites (Supplementary Table S4).

Accuracy of the GBTS technology
The accuracy of the GBTS technology was evaluated by comparing the genotypes of the eight founder lines revealed by GenoBaits Peanut 40K and the previously reported genotypes determined on the basis of whole-genome resequencing technology (Zheng et al. 2022).More specifically, the number of consistent SNPs between the two technologies was divided by the total number of SNP (i.e., 40,000).The accuracy ranged from 96.57 to 99.33% for N709, N730, N734, N743, N744, and N745, while those of the other two founder lines (N739 and N741) was only about 84% (Table 3).The lower accuracy for N739 and N741 was likely due to excessive heterozygous and missing sites, respectively (Table 3), which may be related to the differences between the genomes of these two lines and the reference genome.

Genetic analysis of the MAGIC population at the S2 stage
The 297 lines of the 8-way cross were genotyped at the S2 stage using GenoBaits Peanut 40K.A total of 18,816 filtered SNPs with a missing rate < 0.3 and MAF > 0.05 were used for the genetic analysis.A phylogenetic tree with the 297 lines and eight founder lines was constructed.The 305 lines were roughly divided into five clusters, which were differentiated by color in the phylogenetic tree (brown, red, blue, green, and purple) (Fig. 2).The parent N741 and line 216 were clustered into clade 1 and were far away from the other seven parents (clade 5) (Fig. 2), which due to that N741 is a landrace from ssp. fastigiata var.fastigiata and exhibits a huge genetic difference with other seven parents (i.e., 13,655 unique sites in Supplementary Table S2).Among the seven parents in clade 5, three parents from ssp. fastigiata (N709, N730, and N745) were grouped together and then clustered with the parents from ssp. hypogaea (N734, N743, N744, and N739) (Fig. 2).Except 216, the eight founder lines were not cluster together with the S2 lines, the reason may be that the S2 population is still highly heterozygous as well as the parents are homozygous.
Population structures were analyzed using ADMIXTURE v1.30, with K = 1-20.The 305 lines were grouped into nine clusters because the CV error reached the smallest when K = 9 (Fig. 3A-B).The eight founder lines were grouped into five clusters, with N709, N730, and N741 in separate clusters, N734, N739, and N745 in the same cluster, and N743 and N744 in another cluster.
Genome-wide association study for pod size-related traits The 18,816 filtered SNPs were screened for SNPs significantly associated with the peanut pod area, perimeter, length, and width according to the MLM model.The Q file for K = 9 generated during the population structure analysis was used as the covariate (Q) in the MLM model.Kinship (K) was calculated using TASSEL v5.0.A total of 27 SNPs significantly associated with at least two of the four pod size-related traits were identified at the threshold of 5.50 [−log(0.05/18,186)](Table 4, Fig. 4, and Supplementary Table S5).Of these SNPs, 10 were on chromosome 7, 16 were on chromosome 12, and one was on chromosome 17 (Table 4 and Supplementary Table S5).The significant SNPs on chromosomes 7 and 12 were linked, respectively.The site Arahy17:625720  (2.61 Mb) covered by the significant SNPs on chromosome 12.Some of the key genes in this region encoded a C6HC-type zinc finger RING/U-box protein (Arahy.UZLY68), a homeobox-leucine zipper protein (Arahy.V0IP08), and a MYB transcription factor (Arahy.7ML2J7) (Supplementary Table S6).The significant SNP on chromosome 17 was located in the exon of Arahy.5QP4QH, which encodes a rho GDP-dissociation inhibitor 1-like protein (Supplementary Table S6).

Applicability of GenoBaits Peanut 40K
The greatest advantage of liquid chip technology-based marker panels over the alternatives is that the number of markers (e.g., 10, 20, and 40K) can vary depending on how the marker panels are being used.Moreover, they are useful for genotyping regardless of the number of samples (i.e., unlimited sample size) (Xu et al. 2020).
Although GenoBaits Peanut 40K was designed for the MAGIC population, it has many other uses.For example, it is applicable for evaluating germplasm diversity, performing a linkage analysis of bi-parental populations, and conducting GWAS.The number of polymorphic SNPs between almost each pair of the two parents exceeded 3000 (approximately 9900 on average), ensuring to some extent its wide applicability (Supplementary Table S3).
The GenoBaits Peanut 40K panel may be used to analyze most peanut germplasms, although there are a few exceptions.The accuracy of the genotyping of N741 and N739 was relatively low (i.e., 84%) because of the number of missing or heterozygous sites in N741 (6076 missing sites) and N739 (4825 missing sites and 1232 heterozygous sites) (Table 3).This may be associated with the fact that N741 belonged to ssp.fastigiata var.fastigiata and one parent of N739 belonged to ssp.hypogaea var.hirsuta (Shrestha et al. 2013), the genome of which may differ substantially from the reference genome used in the present study.Therefore, an increase in sequencing depth may be required to capture the missing sites.

Benefits of the MAGIC population
Constructing MAGIC populations is a new approach for exploiting the diversity in plant genetic resources (Arrones et al. 2020).These populations are very useful for dissecting complex traits, selecting elite lines for breeding, and constructing genomic prediction models (Arrones et al. 2020;Puglisi et al. 2021).The identified region on chromosome 7 was consistent with the QTL reported by Alyr et al. (2020).The candidate gene identified on chromosome 17 was mapped to the significant region of chromosome 7 using the updated reference genome.To the best of our knowledge, the identified region on chromosome 12 has not been reported.The significant SNPs on chromosome 7 may resulted from the difference between two parents (N734 and N739) and the other six parents, whereas the sites identified on chromosome 12 may due to the difference between N741 and the other seven parents.In addition to pod size, the constructed MAGIC population may also be used to investigate other important peanut traits because the progenies of the population vary in terms of growth habit, seed coat color, pod shell type, oil content, and other characteristics.

Functions of the candidate genes
A total of 79 candidate genes influencing pod size were detected in the three significant regions (Supplementary Table S6).The fasciclin-like arabinogalactan family protein gene (Arahy.P7DY53) has been reportedly related to fundamental aspects of embryogenesis and seed development across angiosperms (Costa et al. 2019).A previous study showed that this gene is involved in the regulation of the Brassica napus L. silique length (Wang et al. 2019).The transcriptional regulator STERILE APETALA-like and the F-box domain encoded by Arahy.5EZV1I was reported to regulate the peanut pod and seed sizes (Alyr et al. 2020).Furthermore, Arahy.UZLY68 encodes a C6HC-type zinc finger RING/U-box protein that may modulate the peanut pod size via ubiquitination according to a recent report on rice (Yang et al. 2021).The homeobox-leucine zipper protein gene (Arahy.V0IP08) affects maize kernel size and weight (Sun et al. 2022).The MYB transcription factor gene (Arahy.7ML2J7)controls the size of Arabidopsis thaliana seeds (Zhang et al. 2013).Additionally, the possibility the gene encoding a rho GDP-dissociation inhibitor 1-like protein (Arahy.5QP4QH)may affect peanut pod size is supported by the findings of an earlier study, which revealed the Rho-family GTPase-encoding gene OsRac1 controls rice grain size and yield by regulating cell division (Zhang et al. 2019).

Fig. 1
Fig. 1 Distribution of the 40K SNPs on 20 chromosomes (A) and genomic positions of selected SNPs (B)

Fig. 2
Fig. 2 Phylogenetic tree comprising the 297 lines of the MAGIC population at the S2 stage and eight founder lines

Fig. 5
Fig. 5 Differences in the peanut pod area, perimeter, length, and width between lines with different genotypes (G:G or T:T) at Arahy07:292285, but the same genotype (C:C) at Arahy12:9486492

Table 1
Characteristics of the eight founder lines used for developing the MAGIC population

Table 2
Number of variants on 20 chromosomes