An SNP based GWAS analysis of seed longevity in wheat

Worldwide, ex situ genebanks are given the task to store seeds to prevent the danger of extinction of plant genetic resources. A regular monitoring of their germination capacity is central to any genebank and any drop in that beyond a certain threshold determines their regeneration cycle. Seed longevity varies among different species and is a quantitative trait. New molecular marker data covering hitherto empty genomic regions may provide new insights into the inheritance of this trait. Using genetic information of SNPs in two wheat panels, a total of 72 marker trait associations were discovered which could be confined to 24 quantitative trait loci (QTLs) based on marker proximity to each other. Among them, 13 QTLs are potentially novel. We also determined that with the pyramiding of favorable alleles, an increase of 12.8% in seed longevity could be achieved.


Introduction
Worldwide, ex situ genebanks are given the task to store seeds (and to some extent other plant material) to prevent the danger of extinction of plant genetic resources (Linington and Pritchard 2001). Today, genebanks are storing > 7.4 million accessions (FAO 2010) where ~ 45% are cereal species (Börner et al. 2014). A regular monitoring of their germination capacity is central to any genebank and any drop in that beyond a certain threshold determines their regeneration intervals. Seed longevity is defined as the maximum time period that seeds maintain germination viability (Sano et al. 2016).
Seed longevity varies among different species and could be influenced by several environmental factors during seed formation, harvest and storage ).
The deterioration in viability could be due to damage of the membranes, the DNA and to the action of a variety of enzymes and other proteins (Coolbear 1995;McDonald 1999). Among the agents responsible for seed ageing identified to date, lipid peroxidation seems most potent (Davies 2005;Wiebach et al. 2020), in addition to the damage of DNA and proteins (Rao et al. 1987;Bailly et al. 2008).
Although, seed longevity is a quantitative trait, certain major loci seem to exist. For example, mutations within the genes of DOG1 (DELAY OF GERMINATION1), and SNL1/2 (SWI-INDEPENDENT3-LIKE) in the model plant Arabidopsis are associated with seed longevity (Bentsink et al. 2006;Wang et al. 2013). In tobacco, Heat Shock Factor A9 over expression has been shown to enhance seed longevity by increasing the amount of heat shock proteins (Prieto-Dapena et al. 2006;Kotak et al. 2007).
Genetic analysis of seed longevity in crop plants was first initiated in rice (Miura et al. 2002), followed by soybean (Singh et al. 2008), barley (Nagel et al. 2009) and maize (Revilla et al. 2009). In bread wheat (Triticum aestivum), genetic research on seed longevity started with the use of microsatellite loci in a set of common wheat lines carrying D genome introgression segments of the wild ancestor Aegilops tauschii by Landejva et al. (2010) followed by Rehman Arif et al. (2012a) where the authors used bi-parental (RFLP, SSR markers) and association mapping (DArT markers) approaches to elucidate genetic loci for longevity in wheat. Furthermore, Rehman-Arif et al. (2017) mapped a range of loci in germplasm (183 accessions) selected from genebank of Gatersleben using DArT markers. More recently, using a population of 246 recombinant inbred lines (RILs), Zuo et al. (2018) identified 96 loci for seed vigorrelated traits under artificial aging. To add to it, a further 23 longevity loci were uncovered in 166 RILs by Zuo et al. (2019). The last two studies were conducted using single nucleotide polymorphism (SNP) markers. Also in durum wheat (Triticum durum), loci linked to longevity were identified (Rehman Arif and Börner 2019). Nevertheless, genetic studies of seed longevity in wheat are still in early stages and new molecular marker data covering empty regions of previous studies may provide new insights into this trait.
Here, we report a re-analysis of two association mapping panels (a winter wheat and spring wheat collection investigated by Rehman Arif et al. (2012aArif et al. ( , 2017, respectively) using the phenotypic data already available but newly created SNP marker data to look for potential novel loci linked to longevity, to search for possible candidate genes and to obtain a better understanding of the mechanisms in seed deterioration in wheat.

Materials
The first reassessed germplasm set is composed of 96 winter wheat advanced lines (WW) which have been extensively investigated for agronomic traits, longevity, dormancy and pre-harvest sprouting (Neumann et al. 2011;Rehman Arif et al. 2012a, b). The second reassessed germplasm set is composed of 111 spring wheat accessions (SW) (Table S1) selected from the panel of 183 accessions reported by Rehman Arif et al. (2017). All 207 accessions of WP and SP were analyzed using a 15 K Infinium SNP array, which is an optimized and reduced version of the 90 K iSELECT SNP-chip described by Wang et al. (2014). Data of 11,139 and 9804 SNPs from public domain of IPK (http:// dx. doi. org/ 10. 5447/ IPK/ 2017/4) were used to find out the linked markers with seed longevity in WW and SW, respectively.

Methods
Phenotypic data from Rehman Arif et al. (2012a, 2017)) for WW and SW, respectively, were used. Briefly, to assess longevity, three replicates of 100 seeds each from both WW and SW were subjected to accelerated ageing (AA) and controlled deterioration (CDT) tests. For AA, seeds were exposed to 43 ± 1 °C for 3 days at 100% relative humidity followed by standard International Seed Testing Association (ISTA) germination test where three replicates of 100 seeds were placed between two layers of wet filter paper, formed into rolls and stood on Jacobsen apparatus at 25 ± 1 °C during the day and 23 ± 1 °C during the night. For CDT, moisture contents of the seed lots were brought to 18% after which they were sealed in an aluminum foil bag and exposed to 43 ± 1 °C for 3 days. The germination percentages were recorded after 7 days. Initial germination (IG; control), germination after AA (GAA) and germination after CDT (GCD) were determined. Relative values were calculated by dividing the GAA (RAA) and GCD (RCD) by IG × 100 for both WW and SW. RAA and RCD were used for association mapping analysis to determine longevity loci.
Genotypic data of both WW and SW were subjected to population structure analysis prior to association mapping. This was done using a subset of 241 and 229 evenly spaced SNPs for WW and SW, respectively. STRU CTU RE v.2.3.4 software (Pritchard et al. 2000) was utilized applying the admixture model, a burn-in of 100,000 iterations and 100,000 MCMC duration to test for a K value in the range 1-15. The results were subjected to Structure Harvester (Earl 2012) to get the clear image of the sub-populations in both germplasm sets.
Association mapping was carried out using the program TASSEL 5.2.43 (Bradbury et al. 2007) employing mixed linear model (MLM) (Yu et al. 2006) which takes into account population structure (calculated from STRU CTU RE v.2.3.4) and kinship (calculated from TASSEL 5.2.43). Significant p-values were calculated by taking reciprocal of number of markers for each set. Thus, p-values of 8.97 × 10 −5 and 1.019 × 10 −4 were considered significant to claim an association to be true in WW and SW, respectively. The flanking sequences of SNPs associated with longevity were obtained from the Wheat 90 K SNP array database (Wang et al. 2014). Gene ontology (GO) was assessed using BLAST2GO v.3 software (https:// www. blast 2go. com/).

Genotypic characterization
The distribution of SNPs in both collections was almost similar. A total of 11,139 SNPs were mapped to WW covering a distance of 3639.8 cM (3.06 SNPs/cM). Likewise, there were 9804 SNPs mapped to SW covering a distance of 3624.71 cM (2.70 SNPs/cM) (Table S2). Marker density was not uniform as B genome carried the highest number of SNPs (5479 in WW and 4831 in SW) followed by A genome (4313 in WW and 3843 in SW) whereas D genome was sparsely covered (1347 in WW and 1130 in SW) (Fig. S1).

Discussion
Genetic markers are very useful to identify regions and genes involved in seed longevity as they can provide information to germplasm curators and plant breeders when is it time to regenerate the seeds of a given accession of any species. Moreover, some candidate genes have been identified to influence the trait (Debeaujon et al. 2000;Clerkx et al. 2004;Sattler et al. 2004;Xu et al. 2004;Bentsink et al. 2006;Prieto-Dapena et al. 2006;Devaiah et al. 2007;Ogé et al. 2008;Rajjou et al. 2008;Almoguera et al. 2009).
In wheat, some studies have been reported linking longevity with SNPs (Zuo et al. 2018(Zuo et al. , 2019 investigating biparental mapping populations. In this study, we used SNPs in two association mapping wheat collections. The marker trait associations detected using the SNP data of WW and SW were low as compared to those reported by Rehman Arif et al. (2012aArif et al. ( , 2017 because both of the reported studies used a default criterion of p value of 0.05 or 0.01. However, in this study, we have used p-values of 8.97 × 10 −5 (for WW) and 1.019 × 10 −4 (for SW) calculated by taking the reciprocal of number of tests (markers) performed per panel to help us to find true associations.
The associations discovered could be confined to 24 QTLs based on the marker proximities to each other (Fig. 3). Among them, 4 QTLs were observed in WW, 18 QTLs in SW and 2 QTLs were common to both WW and SW (Fig. 3). The loci were distributed on chromosomes 1A (2 QTLs), 1B, (2018) and this study are stem rust resistance protein Rpg1 and NBS-LRR resistance-like protein. Likewise, Zuo et al. (2019) reported three candidate genes for longevity in wheat from 23 QTLs in 166 RILs. Common candidate gene to Zuo et al. (2019) and this study is FAR1-related sequence 6-like protein which is expressed in hypocotyls, rosette and cauline leaves, inflorescences stems, flowers and is linked to positive regulation of circadian rhythm and transcription (Lin and Wang 2004). Moreover, it is also reported to be involved in ABA signal transduction and abiotic stress response pathways (Ma and Li 2018).
In SW, we divided the accessions in two groups carrying 35 accessions each. The first group (a) carried between 3 and 11 favorable alleles and the second group (b) carried between 14 and 17 favorable alleles for the 20 QTLs reported in SW. Mean RAA and RCD in group (a) were 69.25 ± 18.90 and 56.85 ± 26.37, respectively. Likewise, mean RAA and RCD in group (b) were 84.25 ± 15.94 and 77.86 ± 12.83, respectively. Thus, with the pyramiding of favorable alleles, an increase of 5.47% and 12.79% could be witnessed, respectively, in RAA and RCD (Table 1). This confirms that seed longevity is a polygenic trait with each locus imparting some improvement in an additive manner (Zuo et al. 2019) with the accessions carrying more favorable alleles showed higher longevity.

Conclusion
Our analysis discovered 13 potentially novel loci for seed longevity using SNP whole genome mapping in two different association mapping populations in wheat. These novel loci were unnoticed in previous reports. Furthermore, this highlights the importance of dense genetic maps covering the otherwise uncovered genome parts to detect novel loci for seed longevity. Moreover, since more and more populations are being characterized with SNPs, the results of this investigation will help genebank curators and plant breeders to decide about regenerating their germplasm.
Author contributions MARA and AB conceived the idea. MARA performed the analysis and wrote the manuscript. AB reviewed the manuscript.
Funding Open Access funding enabled and organized by Projekt DEAL.

Compliance with ethical standards
Conflict of interest Both authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long The new QTLs in this study highlight the importance of proper genome coverage to identify almost of the related loci influencing the trait of interest. BLAST analysis of the 55 marker sequences in association with longevity belonging to 16 QTLs revealed a total number of 37 genes probably involved in seed longevity (Table S3). Using the deletion bin confinement of DArt markers, Rehman Arif et al. (2012aArif et al. ( , 2017 have reported a number of probable candidate genes for longevity. In this report, we confined the probable candidate genes to 37 which can be potentially targeted for advanced molecular research towards seed longevity in wheat. Zuo et al. (2018) reported five candidate genes from the analysis of 96 QTLs in RILs. The common candidate genes between Zuo et al. as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.