Construction of an SNP-based high-density genetic map for Japanese plum in a Chinese population using specific length fragment sequencing

The Japanese plum (Prunus salicina Lindl.) is one of the most important stone fruit crops in China. High-density linkage map is valuable resources which enhance functional genomics and genetic breeding studies. So far several Japanese plum linkage maps have been reported using different kinds of molecular markers; however, the marker numbers and chromosome coverage are limited. Recently, a newly developed strategy which genome sequencing towards specific-locus amplified fragments (SLAF) markers, has been proven to be powerful for rapid genotyping of genome-wide markers and for high-density genetic map construction. In this study, SLAF was used to genotype markers with 114 F1 seedlings from the ‘09–16’ × ‘Fortune’ cross. Suitable SLAF markers (160,344 out of 343,436,902 pair-end reads) were chosen to conduct genetic map construction, 16.31% of which were polymorphic. The overall integrated map contained 3,341 high quality SLAFs and 720 loci that were grouped in eight genetic linkage groups with a total length of 869.9 cM and an average distance of 1.21 cM, and only five gaps with a genetic distance > 5 cM between adjacent markers occurred in linkage group (LG) 3 and LG6. The number of markers with each LG ranged from 82.3 cM (LG3) to 138.3 cM (LG1). Aligning the map against the peach reference genome sequence (Prunus persica L.) indicated a strictly co-linear relationship between the LGs and peach genome, demonstrating the markers on ours LGs were well ordered. Overall, our studies identified large-scale of genetic markers and constructed high-density linkage maps for Japanese plum, which will obviously provide a solid foundation for marker-assisted selection and sequence assembly of the Japanese plum reference genome.


Introduction
The Japanese plum (Prunus salicina Lindl.) has a diploid genome (2n = 2x = 16) and belongs to the subgenus poor appearance quality and shelf life. In this century, a number of breeding cultivars introduced from abroad (mainly the USA) quickly became the main planted cultivars in commercial production due to good appearance quality and suitable storage performance, though they bring a poor gene pool, flavour and adaptability . For over a hundred years, most Japanese plum breeding launched in the USA was based on a few cultivars, which were derived from a cross between P. salicina and several plum species of the same subgenus. Therefore, it is necessary to diversify the range of cultivars to accelerate breeding programs using Chinese native cultivars.
Marker-assisted selection (MAS) is an important biotechnological tool used in breeding programs to improve the efficiency of selected genotypes (Tavassolian et al. 2010). Genetic map construction is the basis for quantitative trait locus (QTL) mapping and MAS breeding, and it can also be used for genomeassisted assembly and evolutionary comparisons between closely related species. In the case of the Japanese plum, however, only a few studies have focused on genetic linkage map construction. A genetic linkage-mapping program was initiated by Vieira et al. (2005) and the genetic linkage maps of two parents contained only 56 markers and 84 markers using amplified fragment length polymorphism (AFLP) markers. Because of limited markers, these linkage maps were sparse with a total length of 905 cM (Chatard) and 1349.6 cM (Santa Rosa), respectively. Salazar et al. (2017) developed two genetic maps of '98-99' (479 SNPs) and 'Angeleno' (502 SNPs) using SNP markers through genotyping by sequencing, (covering a genetic distance of 688.8 and 647.03 cM, respectively. The most saturated map is the 'Angeleno' x 'Aurora' map that has been reported by Carrasco et al. (2018). This consensus map was built using 732 SNPs, which spanned 617 cM with an average of 0.96 cM between adjacent markers; however, this map contained more large gaps (> 10 cM). All the parents described above are commercial varieties selected by crossbreeding. There may be more markers for the local varieties of Chinese plums that are heterozygous, so that more markers are mapped. In fact, the genetic linkage map for local varieties of Chinese plums has not been reported, possibly owing to a limited number of markers and suitable genetic populations.
High-density genetic linkage maps require not only appropriate genetic segregation populations but also high-throughput SNP detection technology, as they represent the whole genome information regarding molecular markers. In recent years, SNP markers using a high-throughput sequencing technique have gained popularity in genetic mapping because of their stability, heritability, low cost, high-throughput efficiency and abundant genetics variations (Guo et al. 2015). Specific-locus amplified fragment sequencing (SLAF-seq) markers is a high-throughput technique that has been widely adopted in different species for multiple purposes, including genetic diversity and population structure in sweet potato (Su et al. 2017) and genome-wide association studies in rapeseed (Zhou et al. 2017). In the last 5 years, SLAF-seq has been used successfully for construction of high-density genetic maps in many crops, including tea (Ma et al. 2015), tree peony (Cai et al. 2015), sorghum (Ji et al. 2017), sweet osmanthus (He et al. 2017) and poplars (Fang et al. 2018). Exploiting this approach to scan the whole Japanese plum genome has great importance for plum breeding through high density marker development and gene mining.
Over last two decades, we have performed Japanese plum hybridization breeding research, which has established a large number of segregation populations for the Japanese plum. In this study, we have drawn a high-density genetic linkage map in P. salicina derived from the native cultivar and a commercial cultivar using the SLAF-seq method. This genetic map makes it possible to help QTLs analysis and MAS strategies in breeding programs, facilitating future de novo chromosome assembly for the Japanese plum.

Plant materials
An F1 population of 114 progenies was generated from the cross between '09-16' and 'Fortune' cultivars in 2015. The diploid seed parent material "09-16" was the hybrid seedling of the 'Wanshu Xiangjiaoli' (a famous native cultivar) and 'Akihime' (a Japanese cultivar). The diploid pollen parent came from "Fortune", which is an important breeding cultivar in commercial production. The seedlings of the F1 progeny were planted in the Chinese National Germplasm Repository for Plums and Apricots located at Liaoning Institute of Pomology (Liaoning, China) in 2015.
Young leaf samples were harvested from each individual F1 plant and the two parents. These samples were immediately frozen in liquid nitrogen and kept at − 80°C. Genomic DNA was extracted using the CTAB method (Doyle and Doyle 1990). DNA samples were checked for quality and concentration and used for further experiments.

SLAF library construction and high-throughput sequencing
An improved SLAF-seq strategy was utilized in our experiment. Genomic DNA from each F1 individual and each parent was digested with the A1-2+A1-6 according to the Prunus persica genome (https://www.rosaceae.org/species/prunus_persica/ genome_v2.0.a1) (Arús et al. 2012), the library was constructed utilizing the SLAF-seq strategy with some modifications, and an enzyme fragment length of 264-364 bp sequence was defined as the SLAF labels. The details of SLAF-seq strategy have been described by Zhang et al. (2015). The pair-end sequencing (125 bp on each end) was performed on an Illumina HiSeq 2500 system (Illumina, Inc.; San Diego, CA, USA) according to the manufacturer's recommendations.

Sequence data mapping and genotyping
The high-quality Illumina sequences were used for SLAF marker identification and genotyping with procedures described by Sun et al. (2013). The quality of the raw 160-bp pair-end data was evaluated using FastQC with default parameters. Adaptor sequences, low quality reads, and bases with a Phred score ≤ 20 were then removed with the Cutadapt program (version 1.10). The sequences with above 90% similarity would be grouped and indentified as one SLAF marker (locus). The alleles of each SLAF were defined according to their parents, and individuals were genotyped by sequence similarity to their parents. The marker codes of the polymorphic SLAFs were analysed according to the population type cp, which consisted of segregation types 'aa×bb'. Single nucleotide polymorphism (SNP) loci of each SLAF locus were then detected between parents, and SLAFs with more or equal 3 SNPs were filtered out firstly. For diploid species, one SLAF locus can contain at most 4 genotypes, so SLAF loci with more than 4 alleles were defined as repetitive SLAFs and subsequently discarded. Only SLAFs with 2 to 4 alleles were identified as polymorphic and considered potential markers.
Genotype scoring was then performed using a Bayesian approach to further ensure the genotyping quality. Firstly, a posteriori (conditional) probability was calculated using the coverage of each allele and the number of single nucleotide polymorphisms. Then, the genotyping quality score translated from the probability was used to select qualified markers for subsequent analysis as described by Sun et al. (2013). Low-quality markers for each marker and each individual were counted, and the inefficient marker or individual were deleted during the dynamic process. When the average genotype quality scores of all SLAF markers reached the cutoff value, the process was stopped.
Three strict criteria were used for filtering individuals and SLAF markers in order to obtain the high-quality genetic mapping: (1) average sequence depths should > 10 fold in each progeny and > 20 fold in the parents; (2) More than 11 missing genotype markers were filtered out and (3) markers with significant segregation ratios (1:2:1 or 1:1) were initially excluded from the map construction using a chi-square test (P < 0.05).
We first defined a bin to be a locus within which two or more marker cannot be ordered (Vision et al. 2000). This bin, consisting of multiple markers with no breakpoints among any individual of a given set of progenies, has limited the increase in map density through the number of markers.

Genetic linkage map analysis
Loci were separated into three types: (1) those showing segregation for the female parent (heterozygous), (2) those showing segregation for the male parent (heterozygous), and (3) those heterozygous in for both. Heterozygous genetic markers present in one parent but not the other, plus markers heterozygous for both parents, were used to construct separate genetic linkage maps for the female and male parents, using the two-way pseudo-testcross strategy. Two approaches were used for map construction. Markers that showed a segregation-ratio distortion, departing from the Mendelian ratio at the 0.05 level, were added to the maps after mapping the markers that did not depart from the Mendelian ratio. A second approach consisted of both types of markers in the analysis. Marker loci were partitioned primarily into LGs by the modified logarithm of odds (MLOD) scores > 5.
Construction of the high-density and high-quality map used the HighMap strategy for map calculation (Li et al. 2008). Map distances in centimorgans were estimated using the Kosambi mapping function. Skewed markers were then added into this map by applying a multipoint method of maximum likelihood. The SMOOTH error correction strategy of was then conducted according to the parental contribution of genotypes, and a knearest neighbour algorithm was applied to impute missing genotypes ( Van et al. 2005). Double recombination standard is based on a particular flanking locus defined window; the size of the window is each 15 loci before and after the locus. The observed marker score at locus is calculated as the weighted value according to (Van et al., 2005). The threshold value is set as 0.9, and if the threshold value is exceeded, the existence of double recombination recombination is judged. Haplotype maps and heat maps were used to evaluate the quality of the map following the description by West et al. (2006).

Collinearity analysis
In order to assess the correspondence of each LG in P. salicina to the chromosome in P. persica, the tag sequences used for mapping were extracted and aligned with the peach genome_v2.0 scaffolds (Arús et al. 2012) using BLAST. The Pearson correlation between the linkage groups and corresponding chromosomes was calculated using the R script (Fang et al. 2018), and the consensus of their positions was represented using a dot-plot diagram.

SLAF sequencing
We generated 65.36 Gb high-quality cleaned data containing 343,436,902 pair-end reads by the SLAF library from the 114 progeny and the two parents using Illumina HiSeq 2500 system. The average high-quality data in parents and individuals were 2.33 and 0.53 Gb, respectively. After eliminating the index sequences, both ends of each read were about 30 bp in length. Among them, the average high-quality bases (Q Score > 20) ratio was 97.96 %, and the average guanine-cytosine (GC) content was 38.52 % (Table S1). Of these high-quality data, 1.97 Gb were from the female parent '09-16' with 9,859,875 reads and 2.70 Gb from male parent 'Fortune' with 2,698,346 reads. We obtained 327.20 million reads from the libraries for the 114 individuals, where the number of reads for the individuals ranged from 1.54 to 6.67 M with an average of 2.67 M reads (Fig. 1).

SLAF markers discovery and genotype definition
Based on sequence similarity, all the reads were clustered into SLAFs. After eliminating the low-depth and repeat-suspicious SLAFs, 46,086,386 high-quality reads were assigned to 160,344 SLAF loci, of which 107,678 (71.92 %) and 115,319 (67.15 %) were detected in the female and male parents, respectively. The numbers of reads for SLAFs detected in the female and male parents were 6,053,916 and 8,136,595, respectively, indicating an average read depth of 56.2-and~70.56-fold for each SLAF. In all the 114 F1 population, the number of SLAFs per individual ranged from 71,714 to 105,567 with an average of 88,893, while the reads depth was from~11-to~31-fold with an average of~17.3-fold at the population level. The average sequencing depths were 104.8-fold in the parents and 21.54-fold in the offspring on linked markers. The integrity and depth of the markers were enough to guarantee the accuracy of the genetic map construction.
Among the 160,344 SLAFs that were defined, 68,032 (42.43%) were polymorphic, while 91,781 (57.34%) were non-polymorphic, and only 531 SLAFs were labeled as repetitive sequences. Based on parental genotypes, the #Marker pairs with zero recombination on each linkage group were considered to belong to the same genetic bin §Percentages of locus intervals where the distance between adjacent loci was smaller than 5 cM loci missing information were flitted out. The 30,168 polymorphic SLAFs were obtained and successfully classified into eight segregation patterns according to the criteria in the "Materials and methods". As shown in Fig.  2, only 4015 SLAF markers (13.31%) were homozygous in the two parents with genotype aa or bb, which were not segregated in the offspring. Most markers (86.69%) conformed to the CP population segregation codes, including ab×cd, ef×eg, hk×hk, lm×ll, nn×np, ab×cc and cc×ab. Stringent filtering criteria were applied to identify and remove markers, with low sequence depths (less than 10-fold) or > = 2 SNPs SLAF tags, a high level of missing data or Mendelian errors. After the chi-square test, 5,134 (7.55%) SLAFs were used to construct the final genetic map. The marker codes lm×ll and nn×np represent markers with one parent heterozygous (providing 1:1 segregation ratios), where hk×hk or ef×eg represents markers with both parents heterozygous (providing 1:2:1 segregation ratios

Linkage map construction
Using HighMap to construct the genetic map, 3,341 SNP markers were assigned to eight linkage groups (LG1-LG8) for the integrated map of the Japanese plum based on the '09-16' × 'Fortune' progeny with a grouping LOD value of 5 ( Figure S1 and Table S2). The genetic map spanned 869.9 cM in whole genomic region, implying an average markers distance of 0.27 cM. Of the mapped SNP markers, the marker numbers of per loci varied from 1 to 45 with an average 4.64. After linkage analysis, 3,166 (94.76 %) mapped markers were grouped into 545 bins and 175 markers were singletons, indicating there were the total 720 loci on the map with an interval between two adjacent loci of 1.21 cM (Table 1 and Fig. 3). The longest chromosome was LG1 with 138.3 cM, whereas the shortest chromosome was the LG3 with 82.3 cM in the integrated map.
A high-quality integrated linkage map should have short distances between adjacent loci and evenly distributed markers. The average intervals between two adjacent mapped loci ranged from 1.09 (LG4) to 1.62 cM (LG6) for this integrated maps (Table 1). Our results show that most of the SLAF markers were evenly mapped in eight groups. Among the 417 intervals (58.57%) were smaller than 1 cM, and only 4 intervals (0.56%) were larger than 5 cM. The largest gap in no marker coverage on this map was located in LG3 with a length of 7.5 cM. These results indicate that this genetic map is of high quality and high resolution.

Map evaluation
We developed an integrated genetic map of the Japanese plum with 3,341 SLAF markers, which shows a high depth of coverage in the parents (mean of~106-fold) and the F1 progeny (mean of~22-fold), as well as high genotype integrity in the mapping population with an average value of 99.93% (Table  S3). The results of the χ 2 test indicated that 245 (7.33%) of the 3,341 markers showed significant segregation distortion (P < 0.05) on the integrated linkage map ( Figure S1 and Table S2), where most of the distorted markers were distributed unevenly on seven out of eight LGs, and not distributed on LG4. A total of 84, 56 and 45 distorted markers were clustered on LG2, LG6 and LG3, respectively. The majority (94.29%) of distorted markers mapped were similarly located on some particular LG.
From the haplotype map ( Figure S2), the results show that the source of each individual was consistent in the large segment of each linkage group. This result shows that there is almost no double recombination in individual off-spring. As   Fig. 3 Integrated linkage map of the Japanese plum for "'09--16"' × ×"'Fortune"' composed of 720 loci. The font of the locus name in red indicates that the locus has at least a segregation distortion marker. An asterisk in the locus name indicates that only segregation distortion markers are present at the locus.
can be seen from the heat map, the markers in each linkage group were ordered ( Figure S3), indicating the high quality of the genetic map.

Collinearity analysis
The Japanese plum (P. salicina) and peach (P. persica) both belong to the genus Prunus, and have a high level of genetic similarity. The publication of the peach genome v2.0a1 provides an excellent opportunity for genetic and genomic studies of the Japanese plum. In order to assess the quality of this genetic map, the tags used for mapping were compared with the peach reference genome. Eight hundred and eighty six of the SLAF markers in the integrated map were mapped to different physical chromosome of the peach genome V2.1a using BLAST (Table S4). A dot-plot diagram was used to evaluate the physical distances of the mapped SLAF tags on the peach genome sequence plotted against their genetic positions on the Japanese plum genetic map (Fig. 4). The Spearman correlation coefficient (r 2 value) of LGs ranged from 0.714 (LG1) to 0.980 (LG5) showing all linkage groups have good linear agreement between the physical and genetic maps (Table 2). However, there was also nonconformity of some regions, such as LG1 (0.714) and LG2 (0.808). The reason might be genome sequence differences between the different species between Japanese plum and peach.

Discussion
Due to abundant genetic variation and heritability, SNP markers have become increasingly popular in genetic mapping, evolutionary relationship analysis, population structure analysis and association analysis. With the development of next-generation sequencing (NGS) technologies, large-scale SNPs have been successfully discovered in Japanese plum genomes (Salazar et al. 2017;Marti et al. 2018;Carrasco et al. 2018). The SLAF-seq approach is one of the most important SNP detection technologies. SLAF-seq provides an accurate, effective and low-cost method for developing molecular markers. In this study, SLAF sequencing was successfully employed for large-scale SNP discovery and genotyping in an F1 population of Japanese plum. We constructed a SLAF library of the Japanese plum, and obtained 68.6 Gb of data containing 343 M pair-end reads from this library, and 160,344 SLAF tags were detected, and 68,032 polymorphic markers were identified, such that the SLAF markers in this study were far more abundant than the SNP markers obtained through GBS technology (Salazar et al. 2017;Carrasco et al. 2018). This result indicates that the markers from SLAF library would be a useful tool for genetic linkage map, genetic diversity and association analysis. LG2 − 0.8081 LG3 − 0.8219 LG4 − 0.8765 LG5 − 0.9484 LG6 − 0.8255 LG7 − 0.9691 LG8 − 0.9798 Tree Genetics & Genomes (2020) 16: 18 Page 7 of 10 18 LG 1 LG 2 LG 3 LG 4 LG 5 LG 6 LG 7 LG 8 Fig. 4 Collinear analysis of the consensus between integrated Japanese plum genetic maps and peach genome. The x-axis indicates the genetic position of each SLAF marker; the y-axis indicates the physical position of each SLAF marker partial separation, and these markers were distributed unevenly on eight groups, expect LG4. The genus Prunus has shown conserved intraspecific and intragenic co-linearity in the Rosaceae; therefore, we attempted a collinearity analysis between the peach and Japanese plum in order to compare these mapped marker sequences to the peach genome_v2.0 scaffolds (Arús et al. 2012). The results of collinearity show the correlation of all eight groups was greater than 0.8. These results indicated a high level of collinearity between the Japanese plum and peach genomes, in agreement with the results from Carrasco et al. (2018). Most of the markers showed good collinear agreement with the peach genome on LG5, LG7, and LG8. Some markers on LG1 and LG2 were uniform with the peach genome because of genome structure differences between the different species. These collinear markers on this map will provide a wealth of genomic information, which may aid in de novo genome assembly and comparative linkage mapping in the plum plant.
In conclusion, we have constructed the refined high-density linkage map for Japanese plum using the SLAF-seq method. The map spans 869.9 cM and is divided into eight linkage groups corresponding to the number of P. salicina chromosomes, with an average distance of 1.21 cM among 720 SLAF loci. Our results further demonstrate that SLAF-seq is a very effective method for developing markers and constructing high-density linkage maps. The most marker loci on the pseudochromosomes showed good collinear agreement with the physical maps of the Prunus persica. Our study provides a valuable genetic resource for QTL analysis, MAS, map-based gene cloning and the sequence assembly of the Japanese plum reference genome.