Genome-wide association analysis and validation with KASP markers for nut and shell traits in almond (Prunus dulcis [Mill.] D.A.Webb)

Almond shell hardness is an economically important quality trait that protects the integrity of the kernel. It is directly related with shelling or crack-out percent. In the U.S. and Australian markets, soft to semi-hard shells varieties have been used in the industry, while in the European market, hard shells are dominantly sold and distributed. The origin of a hard shell is from wild almond species. Soft shells in the USA and Australia have been selected for a higher crack-out percent or larger kernel and ease of processing. Here, we examined breeding populations totaling 264 trees that were genotyped with genotyping-by-sequencing (GBS) where shell hardness was measured with a texture analyzer, and shell weight, in-shell weights and shell thickness were measured and crack-out percent was calculated. In a genome-wide association, seven genetic loci were found to be associated with shell hardness, crack-out percent, shell thickness and shell weight. Of these seven, one was shared between multiple traits. Kompetitive allele-specific PCR (KASP) assays were developed and tested in unrelated breeding lines from Australia, Spain, France and Italy. One KASP assay was found to be highly predictive of shell hardness in this diverse panel and can be applied for use in marker-assisted breeding.


Introduction
Almond [Prunus dulcis (Mill.) D.A.Webb] is an economically important crop cultivated in more than 50 countries, with 95% of production in California, Australia, and the Mediterranean Basin. In California, almonds are the most produced tree crop and the second highest agricultural commodity with a value of $6.1 billion per year covering 1.2 million acres (CA Dept Food Ag 2019). The UC Davis almond breeding program has historically used phenotypic selection, which has several limitations, most prominently the amount of time required to observe and select for advantageous and accurate phenotypes, and has also developed several molecular markers for pedigree validation. More recently, identification, development and deployment of additional molecular markers has been initiated for several critical traits, enabling more robust and speedier selection. With advances in both genomic and phenotyping technologies, accuracy in selection can be made more expediently.
Molecular markers have been used for decades and are widely used in genetic based studies due to their capability to distinguish different genotypes at a locus.

Communicated by D. Chagné
Genotyping-by-sequencing (GBS) is a next-generation technology with reduced representation of a genome where restriction enzymes are used, DNA fragments are ligated with barcoded adapters and subsequently sequenced (Elshire et al. 2011). The result is that large numbers of single nucleotide polymorphisms (SNPs) can be discovered for evaluating genetic diversity, QTL mapping, linkage mapping and association studies. Kompetitive allele-specific PCR (KASP) technology (LGC Limited) can be used for SNP genotyping based on dual fluorescent resonance energy transfer (FRET) cassettes. The KASP chemistry combines the use of a highly specific 5′-3′ exonuclease Taq DNA polymerase with two competitive, allele specific, tailed forward primers and one common reverse primer. It has been used for selection of traits and development of KASP assays to validate marker studies for evaluating root-knot nematode resistance in almond rootstock (Duval et al. 2018), fruit ring rot disease in apple (Shen et al. 2019) and shell hardness in almond (Goonetilleke et al. 2018).
Almond shells account for around 35-75% of the total fruit weight (Li et al. 2018).
The shell is made up of cellulose, hemicellulose and lignin. The content of cellulose in almond shells is 38%, while lignin content of almond is 44% (Li et al. 2018). Depending upon the industry, in a corresponding region there is a preference for either a soft or a hard shelled almond. For example, in the USA, the California industry prefers a softer shell, as evident by the dominant cultivar 'Nonpareil', which is considered a paper-thin shell, as rubber rollers are used for processing. In the Mediterranean regions such as Spain and Italy, hard shell varieties such as 'Tuono' and 'Penta' dominate the industry. These regions are usually dryland farmed and the hard shells provide protections from insect, birds, rodents and even rancidity (Verdu et al. 2017). In general, the harder the shell, the smaller the crack-out percent and relative nut size is, but this is not always the case; there can be hard, thinner shells with high crack-out percent, and even large kernels/crack-out percent in hard shells. A softer shell can have an increased susceptibility to insect damage such as by navel orange worm. In the U.S., insecticides are generally applied to the almond tree canopy twice (10-14 days apart) at the onset of hullsplit (1-5%), and once again a week prior to harvest to prevent insect damage to kernel.
Shell hardness has been measured by shell thickness, shell weight, shelling percentage and with a texture analyzer. A texture analyzer is an instrument that measures resistance to force of something and has been used to evaluate shell hardness in pistachio (Nikzadeh and Sedaghat 2008), walnut (Sideli et al. 2020) and almond (Fornes Comas et al. 2019).
The objective of this research was to perform genomewide association for seven almond families, common cultivars and peach accessions; collect phenotypic measurements across two years, develop KASP molecular markers, and test and validate those markers on an unrelated set of trees. The results from this study aim to aid in marker-assisted breeding efforts for a higher crack-out percent.

Plant material
A panel consisting of 264 almond trees (12 cultivars, 10 selections and 236 seedlings from crosses made in the UC Davis almond breeding program) and six peach trees (Online Resource 1) were used. The almond trees which were grown at Wolfskill Experimental Orchards in Winters, CA, ranged in age from 7 to 10 years. Trees were grown on their own rootstocks and watered with micro sprinklers. Nutrient management was the same for each rows of trees. There was one experimental tree per genotype. Nuts were collected from varying levels of the canopy and sides of the tree. Nuts were hand harvested after hullsplit (August -September) in 2019 and 2020, dried in an almond dryer and stored inshell in a 0° C ± 0.4 °C fridge with 82.8% ± 2.2% humidity until use. Average temperature and total rainfall for 2020 was higher than in 2019 (Online Resource 2) (University of California Cooperative Extension).
Additional materials from the USA, Australia, Spain, Italy and France (Online Resource 3) were used for evaluation of KASP markers.

Genotyping
Genotyping-by-sequencing (GBS) was performed with DNA samples from the GWAS panel using ApeKI as a restriction enzyme. A custom python script was used to run gbstrim (https:// github. com/ kdm9/ gbstr im) in batches to trim reads to a maximum length of 93. Stacks 2 (Rochette et al. 2019) was used to clean data by truncating reads to obtain expected amplicon lengths. Bowtie2 v2.4.0 (Langmead et al. 2018) was used to index genome and align reads to peach genome V2.1 'Lovell' available at Phytozome (URL: phytozome.jgi. doe.gov). SAMtools 1.12 (Li et al. 2009) was used to convert sam files to sorted bam files. Freebayes 1.3.6 (Garrison and Marth 2012) was used for SNP variant calling. After filtering for mapping quality (minimum 90%), allele balance, mean read depth (maximum 30), minor allele count (minimum 3), minor allele frequency (MAF) (minimum 0.05), missing SNPs (maximum (0.75 per genotype), read mapping quality score (minimum 20) and read depth differences between forward and reverse strands (maximum 100-fold), removing indels, selecting only bi-allelic SNPs and removing trees with more than 50% missing data, there were 12,406 SNPs across 264 trees. Imputation was performed in Tassel 5 (Bradbury et al. 2007) with the LinkImpute LD-KNNI algorithm based on k-nearest neighbor genotype imputation method (Money et al. 2015).

Phenotypic measurements
From among the nuts harvested from each tree, 10 nuts were sampled at random. Each of these nuts was weighed to obtain its in-shell weight, then subjected to texture analysis to determine the force and time needed to rupture the shell. Shell thickness was measured, and the kernel was weighed. Shell weight was determined by difference (in-shell weight -kernel weight) and crack-out percentage was calculated ((kernel weight/in-shell weight) × 100%). In-shell and kernel weights were measured with an Ohaus Scout SPX balance (Parsippany, NJ) using a USB device interface SPDC V2.03 for automatic capture. Shell hardness was measured using texture analyzers (models TA-XT2 and TA-HDPlus; Texture Technologies; Surry, England) and Exponent 6.1.8.0 software (Stable Micro System; London, UK). The application of force (N) and time (seconds) to rupture the almond shell were automatically recorded with a custom macro to capture the maximum force and the integral (the area under the curve of time and force). The TA-XT2 texture analyzer was set as follows: pre-test speed (1.0 mm/s), test speed (2.00 mm/s), post-test speed (10.0 mm/s), trigger force (0.005 kg), return to start distance (4.0 mm for hard shells and strain 20% for softer shells) and calibrated with 2.0 kg weight. For almond shells that were too hard for the TA-XT2 texture analyzer, which has a maximum load cell capacity of 50 kg, a TA-HDPlus texture analyzer with a load cell of 750 kg was used. Shell thickness was measured using Mahr 16ER calipers and Marcom Professional 5.2 -6.2/13.06.2018 software for automatic capture.
The R package 'emmeans' was used to calculate the adjusted means of two years, with family and genotype as fixed effects and genotype and year set as an interaction. The model was: where Y ijkl stands for the lth observation of the family j in the year i, μ is the constant overall mean, yg ik is the year and genotype interaction and ε ijkl is the random error term with mean of 0 and a variance σ 2 ε . Maximum force and integral measurements were log transformed.

Variance component estimates and estimated breeding values
The ASReml-R 4 software tool was used to estimate variance components from the phenotypic data of 264 trees. A historical pedigree was used for the a-matrix. The linear mixed model was: where Y denotes the (n × 1) vector of observations, τ is the (p × 1) vector of fixed treatment effects (year and family), X is the (n × p) design matrix, u is the (q × 1) vector of random effects (pedigree), Z is the (n × q) design matrix and e is the (n × 1) vector of residual error. The pedigree used was a historical pedigree from breeding records. A conditional Wald F-test was used to test the significance of fixed effects in the model. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to choose the best model. Narrow-sense heritability (h 2 ) was estimated using data from 264 trees across 7 families by dividing the estimated additive genetic variance by the total phenotypic variance. Estimated breeding values were calculated in ASReml-R 4 using the predict function on the linear mixed model.

Genome-wide association analysis
Genome-wide association (GWAS) analysis was applied to estimated breeding values using a multi-locus mixed linear model (MLMM) algorithm in the R package GAPIT 3 (Wang and Zhang 2020). The MLMM is a multi-linear model (MLM) where both Q (population structure) + K (kinship matrix) are fitted to the model as random effects, reducing type I errors due to spurious associations from relatedness and population structure. Q-Q plots and Manhattan plots were inspected for evidence of inflation. The maximum number of PCs to include as covariates in the multivariate model was determined by examining the initial PCA and scree-plot, then the number of PCs to include was determined using the function model selection implemented in GAPIT. SNPs with MAF below 0.05 were excluded. SNP data file used as an input for GWAS is available in (Online Resource 4). PLINK (Purcell et al. 2007) was used to calculate the linkage disequilibrium between significant SNPs on the same chromosome. 'LD heatmap' (Shin et al. 2006) package in R was used to visualize regions of LD around significant SNPs.
The gene annotation Prunus persica V2.1 was used to assess annotated genes in the regions around SNPs within 100 kb on each side of SNP, as these regions had R 2 > 0.6 and putative genes were surmised.

KASP assay design and genotyping
For each SNP that was significantly associated with log maximum force, shell thickness and/or crack-out percent, SAMtools (Li et al. 2009) function faidx was used to retrieve a 100-bp sequence containing the SNP. That sequence was analyzed using Kraken software (LGC Limited) to design primers for a KASP assay. Each primer set was named according to the chromosome and position of the SNP for which it was designed.
The newly designed KASP assays were applied to DNA samples from 332 trees (216 from the USA, 14 from Australia, 87 from Spain, 8 from France, 4 from Italy and 3 of unknown origin). In addition, two KASP assays designed by Goonetilleke et al. (2018) were applied to DNA samples from 216 UC Davis varieties/selections that had phenotypic data from this study.
The KASP genotyping was conducted on SNPLine (LGC Limited) platforms according to the manufacturer's instructions. Results were analyzed using Kraken (LGC Limited) software.

Genotyping-by-sequencing and principal component analysis
In sequences generated by GBS, only 30% could be aligned to 'Texas' almond genome sequence assembly, but 83% of them were aligned to version 2.1 of the peach 'Lovell' sequence assembly. Genotyping-by-sequencing of a panel of 280 trees generated a total of 2,162,690 total variants and after filtering, there were 12,406 SNPs across 264 trees. Mean sequence depth was 10. Imputation accuracy was found to be 89.97%. With LD pruning, a subset of 4,137 SNPs was selected for PCA. Application of PCA to this subset clearly separated all almond trees from the six peach trees (Fig. 1). Among the almond progeny, those derived from 'Ferragnès' crosses, 'Nonpareil' × 02,1-271, 'Tardy Nonpareil' × 95,1-26 and 'Winters' × 00,2-3 mostly formed distinct clusters, except that some progeny of 'Winters' × 00,2-3 overlapped with progeny of 'Nonpareil' × 02,1-271 and progeny of 'Tardy Nonpareil' × 95,1-26 clusters and

Shell traits
The distributions of residuals for maximum force and for integral deviated significantly (p < 0.001) from normality (Shapiro Wilk (W) = 0.894 and 0.906, respectively). Log transformation of these two variables reduced skewness, but did not completely eliminate the deviation from normality (W = 0.981, 0.972, p < 0.001 for both). The range, mean and median for shell thickness, log maximum force, log integral, kernel weight, crack-out percent and shell weight are listed in (Online Resource 5).
Estimates of narrow-sense heritability (h 2 ) were high for shell weight (0.88 ± 0.01), shell thickness (0.88 ± 0.02), maximum force (0.77 ± 0.03), integral (0.69 ± 0.04) and crack-out percent (0.81 ± 0.65). Estimated breeding values (EBVs) for shell thickness, log maximum force, log integral and crack-out percent were distributed normally, while EBVs for shell weight were strongly skewed towards low values (Fig. 2). Estimated breeding values for log maximum force were distributed normally. Estimated breeding values for maximum force and log interval were positively correlated with each other (r = 0.95) and with shell thickness (r = 0.66 and 0.71, respectively) and shell weight (r = 0.59 and 0.60, respectively).
Families with 'Ferragnès' as one parent had the largest breeding values for maximum force (up to 1,249.00 N) (Online Resource 6). Families with 'Nonpareil' as one of the parents had lower breeding values for maximum force (77.58 N) and high breeding value for crack-out percent.

Genome-wide association analysis
Significant marker-trait associations were detected on chromosomes 2, 5, 7 and 8 (Table 1 and Fig. 3), involving a total of seven SNPs. Each of these SNPs was within a predicted gene on the peach genome. For shell thickness, the associated SNP on chromosome 5, SPP05_11616743, was found to be within a predicted gene region at position 11,691,415-11,694,929 bp for galactosyltransferase family protein (Table S5)   assembly) was associated with four traits: maximum force, shell thickness, shell weight and crack-out percentage. Almonds from trees with the A:G genotype at this position SPP02_23368051 required greater maximum force and had thicker shells, higher shell weight and lower crack-out percent ( Fig. 4a-d) than those from G:G trees. Both genotypes were observed within the four families of 'Ferragnès' progeny and within the 'Nonpareil' × 02,1-271 family, but only the G:G genotype was observed within the 'Tardy Nonpareil' × 95,1-26 and 'Winters' × 00,2-3 families. Based on examination of genotypic and phenotypic data, it was clear that the significant associations observed for this SNP were largely due to segregation and marker-trait associations within the 'Ferragnès' and 'Nonpareil' families ( Fig. 5 a, b). Two other SNPs, both on chromosome 7, were associated with a requirement for higher maximum force (Table 1 and Fig. 3). Almonds from trees with the T:T genotype at SPP07_14887902 required greater maximum force than those from C:T trees (Fig. 4f). Almonds from trees with the G:G genotype at SNP SPP07_16579940 required greater maximum force than those from A:G trees. These two SNPs were in moderate LD (r 2 = 0.393).
On chromosome 5, two A:G SNPs (SPP05_8708350 and SPP05_11616743) were significantly associated with shell thickness. At each of these SNPs, almonds from A:A trees had thicker shells than those from A:G trees (Fig. 4e). These two SNPs were in moderate LD (r 2 = 0.564).
Two other SNPs were significantly associated with shell weight: one on chromosome 5 and one on chromosome 8. Almonds from trees with the A:G genotype at SPP05_3719741 had higher shell weight (Fig. 4h) than those from A:A trees. Almonds from trees with the T:C genotype at SPP08_22165513 had higher shell weight than those from C:C trees (Fig. 4g). For shell weight, there were additional significant SNPs, one on chromosome 2 and one on chromosome 8 (Fig. 5C), however they were not in LD (r 2 = 0.009, 0.15 respectively) with the most significant SNPs (SPP02_23368051, SPP08_22165513).

Application of KASP assays
All five of the KASP assays designed for SNPs associated with log maximum force, shell thickness and/or crack-out percent (Online Resource 8) successfully discriminated among genotypes within the panel on which they were tested (Online Resources 9 and 10). For members of that panel that had also been analysed by GBS, the KASP genotype calls agreed with those from GBS.
Of the five SNPs for which KASP assays were developed and applied, only SP02_23368051 exhibited the expected association with shell hardness (Table 2, Online Resource 11).
Among 205 trees that were genotyped for WriPdK50 (Goonetilleke et al. 2018) and that could be classified as having soft shells (greater than 40% crack-out) or hard shells (less than 40% crack-out) the proportion of soft-shell types was much higher (0.85) among trees genotyped as C:C than among trees genotyped as C:A (0.07) (Online Resource 12). Among 207 trees that were genotyped for WriPdK50 and that could be classified as having soft or hard shells, the proportion of soft-shell types was higher (0.81) among trees genotyped as G:G than among trees genotyped as T:G (0.56).

Discussion
Almond varieties and germplasm can exhibit considerable variation in allocation of biomass between kernels and shells. Differences in this property are often assessed as crack-out percent (kernel weight as a percentage of nut weight), which in turn is used to classify nuts into hard (lowest crack-out percent), semi-hard, semi-soft, soft and paper-shell types (highest crack-out percent), with different types preferred in different markets. The actual hardness of almond shells can be more directly measured with texture analyzers, which provide estimates of the force required to rupture the shells, while shell thickness can also be measured. Given sufficient variation, it is expected that the force required to rupture shells will be positively associated with shell thickness and negatively associated with crack-out percentage. Consistent with this, and with some previously reported results (Ledbetter (2008) and Sánchez-Pérez et al. (2007)), crack-out percent was negatively correlated with both shell thickness and the force required to rupture almond shells. In contrast, Fornes Comas et al. (2019) did not find a strong correlation between shelling load and kernel percentage, probably because of limited variation within their panel of hard-shelled cultivars.
Using genotypic data obtained by GBS, we conducted GWAS for maximum force, shell weight, shell thickness and crack-out percent, all of which had high heritability. Consistent with strong correlations among traits, one SNP on chromosome 2 (SPP_023368051) was significantly associated with all four traits. QTLs for related traits have previously been reported on chromosome 2 in the biparental crosses 'Ferragnès' × 'Tuono' (Arús et al. 1999), 'R1000' × 'Desmayo Largueta' (Sánchez-Pérez et al. 2007) 'Nonpareil' × 'Lauranne' (Goonetilleke et al. 2018. Six other SNPs (three on chromosome 5, two on chromosome 7 and one on chromosome 8) were also significantly associated with individual traits. Although marker-trait associations had previously been reported on chromosome 5 (Goonetilleke et al. (2018)) and chromosome 8 (Arús et al. (1999);Goonetilleke et al. (2018), the SNPs identified here were over 3 Mbp (chromosome 5) and were over 9.4 Mbp (chromosome 8) from the previously reported loci.   Of six shell-hardness associated SNPs for which that Goonetilleke et al. (2018) designed KASP assays, two are in the same region of chromosome 2 as a SNP (SPP02_23368051) for which trait associations were detected here. The KASP assays WriPdk50 (which assays a SNP at 108 kb from SPP02_2336805) and WriPdK264 (which assays a SNP at 562 kb from SPP02_23368051) respectively, both revealed polymorphism within our GWAS panel, with homozygous genotypes associated with softer shells, consistent with what was reported by Goonetilleke et al. (2018). However, they were not in LD with SPP02_23368051 (Online Resource 12).
A KASP assay developed for SPP02_23358051 which gave results that were highly predictive of shell hardness across a diverse panel of materials, indicating that this marker could be broadly useful in almond breeding. Among KASP assays developed for four other SNPs two (for SPP05_11616743 and SPP05_8708350) were moderately predictive of shell hardness, and two (for SPP07_14887902 and SPP07_16579940) were not predictive of shell hardness.
In this study utilizing a GWAS panel we confirmed major marker-trait associations on chromosome 2, 5 and 8, and found new associations on chromosome 7. The detection of associations on chromosome 7 may be due to the inclusion of many soft-shell selections in our panel, in contrast to the predominance of hard-shell materials in most previous studies. The SPP_023358051 marker had a clear segregation in the 'Ferragnès' and 'Nonpareil' families while the other families, 'Tardy Nonpareil'× 95,1-26 and 'Winters'× 00,2-3 did not segregate for the shell traits because the parents were too similar in their genotype and there was not enough phenotypic variation to determine a significant difference. The families surveyed carry only the heterozygous or one type of homozygous genotypes, or only one homozygous genotype.
One of the trait-associated SNPs identified in this study (SPP05_11616743, which is associated with shell thickness) is located 74,672 bp from a predicted gene (Prupe.5G113400) that may be involved in cellulose production. This is of interest given that cellulose is a major component contributing to the thickness of the almond shell (Li et al. 2018). Prupe.5G113400/ XM_007220948.2 has 74% sequence similarity to the Arabidopsis thaliana gene AT5G53340.1, has been predated to encode galactosyltranferase protein. In A. thaliana, galactosyltransferases GALT7 and GALT8 were found to participate in cellulose biosynthesis, with mutants showing primary cell wall defects with impaired growth and cell expansion in seedlings, and secondary cell wall defects with collapsed xylem wall thickness (Nibbering et al. 2022).
KASP assays are particularly useful for applications, such as marker-assisted selection, that require genotyping of large populations for relatively few markers. Here, we developed KASP assays from SNP markers that were discovered with GBS and found to be associated with traits of interest. By applying the KASP assays to a diverse panel of materials we found that that SPP_023358051 is highly predictive of shell hardness, indicating that this marker could be broadly useful in almond breeding. This is based on its strong associations with phenotypic traits measured here, and on its validation using a KASP assay applied to a different panel and materials. Other KASP assays that were developed and/or applied here were effective in discriminating among SNP genotypes, but did not exhibit any associations with traits.
Among the results reported here, the association of seven SNPs with kernel and nut traits provides a basis for further exploration of the genetic control of these traits in almond, and a KASP marker developed for SPP_023358051 may be immediately useful in almond breeding.