Genome size, ploidy levels, and development of novel SSR primer to evaluate genetic diversity of Corylopsis Siebold & Zucc. germplasm collections

Corylopsis Seibold & Zucc. is a genus of flowering shrubs with few cultivars and little information on genome size, ploidy levels, and genetic relationships among taxa. The United States National Arboretum and collaborating gardens in North America maintain a diverse germplasm collection of Corylopsis including cultivated and wild-collected taxa. The objectives of this study were to determine genome size, ploidy, and genetic diversity in Corylopsis. Flow cytometry revealed a ploidy series of diploid, tetraploid, pentaploid, and hexaploid taxa in Corylopsis with 2C genome sizes ranging from 1.9 pg to 5.7 pg. Representative taxa were selected for root tip cytology, revealing diploid (C. platypetala Rehder & E.H. Wilson), tetraploid (C. gotoana Makino), pentaploid (unnamed hybrid selection), and hexaploid (C. willmottiae Rehder & E.H. Wilson) taxa. Seventeen genomic simple sequence repeat markers were developed and used to determine the genetic diversity of Corylopsis. These markers had an average of 7.4 alleles per locus with a range of three to 17 alleles. A dendrogram was generated by unweighted pair group method with arithmetic mean (UPGMA) cluster analysis using the Jaccard similarity coefficient to visualize genetic relationships and ploidy series within taxa. Information gained from this study will further breeding efforts and aid management of germplasm collections of Corylopsis.

fragrance is also variable across genotypes, ranging from unpleasant to a strong, sweet primrose fragrance (Bean and Aniśko 2014). Foliage color offers further interest for Corylopsis breeding, as selections with blue-green or golden leaves have been reported (Bean and Aniśko 2014). Corylopsis are also appreciated in the landscape for fall color and spring flush, providing several layers of seasonal interest (Bean and Aniśko 2014;Hop 2011;Morley 1977).
Polyploidy is rare within the Hamamelidaceae, with only one other genus, Fothergilla Murr., reported to have a ploidy series (Ranney et al. 2007;Qi et al. 2015). Some variable morphology observed in Corylopsis may be due to the range of ploidy documented throughout the genus, which include diploid, tetraploid, pentaploid, and hexaploid taxa (Anderson and Sax 1935;Kim et al. 2008;Kono et al. 2008;Mehra and Khosla 1969;Oginuma 1991;Oginuma and Tobe 1991;Santamour 1965). In addition to ploidy, the position of the centromere has been examined to determine phylogenetic relationships in Corylopsis (Kono et al. 2008), and several molecular studies have provided further insight into relationships within Corylopsis (Li et al. 1997;Li 2008;Roh et al. 2007;Yamanaka et al 2008); however, to date, microsatellites, or simple sequence repeats (SSRs) have not been developed specifically for Corylopsis.
The objectives of this study were to (1) determine genome size and ploidy among Corylopsis taxa and (2) develop novel microsatellite markers to assess genetic diversity of living collections to assist with collection management and facilitate breeding.

Plant materials
Fresh expanded leaves were sampled from living collections at botanical gardens and arboreta in the United States (Table 1). Samples represent material that originally came from plant hunting expeditions, botanic gardens, and Index Semina.

Flow cytometry
Approximately 2 cm 2 samples of fresh leaf tissue were chopped by razor blade in a polystyrene petri dish containing 500 μl of nuclei extraction buffer  (Johnston et al. 1999). The Corylopsis-Pisum nuclei suspension was passed through a 30-μm gauze filter (Sysmex Partec), and 2 ml of staining solution was added to the filtrate (CyStain PI Absolute P, Sysmex Partec). The stained nuclei suspension was processed using a flow cytometer (Cyflow Ploidy Analyzer, Sysmex Partec) to a minimum of 3000 nuclei per sample. For each accession, three samples were combined to represent a random sample of nuclei from each plant. Terminology describing genome size and ploidy levels follows Greilhuber et al. (2005): Relative 2C genome size = (DNA content of size standard) × (mean fluorescence value of sample)/(mean fluorescence value of standard). The monoploid (1C) value was calculated as the 2C value divided by the ploidy.

Cytology
Early and late season cuttings were taken from current year's growth and rooted using 1000 ppm K-IBA (Sigma-Aldrich, St. Louis, MO) to produce root tips for cytology. Based on initial flow cytometry results, four taxa were chosen to represent each genome size group, corresponding to putative ploidy levels of diploid, tetraploid, pentaploid, and hexaploid. Cuttings were maintained under mist in a polyhouse in Beltsville, MD. Samples included C. platypetala (NA66131), C. gotoana var. coreana (NA56375-02C3), C. willmottiae (NA69317), and the F 1 interspecific hybrid C. gotoana 'March Jewel' × C. spicata (NA6-10EH). Root tips were digested and chromosomes were stained and spread following protocols of Lattier et al. (2017). Chromosomes were counted using a light microscope (Nikon Eclipse 50i; Tokyo, Japan). A minimum of 15 highly resolved cells were captured per taxon. Photomicrographs were prepared by focus stacking and blending multiple images to produce high-resolution chromosome images (Photoshop CC 2014; Adobe, San Jose, CA).

Microsatellite identification and primer design
Genomic DNA was extracted from C. himalayana (NA78641) using the methods described in the next section, and was subjected to shotgun genome sequencing on a Roche 454 high-throughput sequencer at the Ohio State University Plant-Microbe Genomics Facility. The resulting 65,000 reads were then analyzed using QDD-2 software (Meglécz et al. 2010) following the approach of Takayama et al. (2011) to identify potential microsatellites, build consensus sequences, and design optimal primer pairs. In using the software, all defaults were maintained. From the 1000 + optimal primer pairs identified by QDD software, we selected 100 for testing on Corylopsis, focusing especially on loci with 3 or longer repeat motifs as these have been shown to produce well separated peaks and less ambiguous scoring in an automated sequencing environment in comparison with dinucleotide repeat motifs. PCR primers were manufactured by Integrated DNA Technologies (Coralville, IA). The 17 primers used for genetic diversity analysis (   P1  AGCT  FAM-TCC TTA GCT TGC ACG TAC CC  57  89-103  3  Allele scores were converted to a binary matrix with "0" for absence of an allele and "1" for presence of an allele. The R 'pvclust' package, version 1.3-2, (Suzuki et al. 2017) was used to construct the dendrogram, with genetic distance based on the Jaccard similarity coefficient ("binary") and clustering using the 'unweighted pair group method with arithmetic mean' (UPGMA) algorithm. Confidence levels for branches of the dendrogram were calculated using approximately unbiased (AU) p-values with multiscale bootstrap resampling based on 20,000 replications, as implemented in the R 'pvclust' package (Suzuki et al. 2017). Primer statistics (PIC, Ho, and He) were calculated according to standard algorithms (Botstein et al. 1980;Nei 1987) coded in the R language.

Cytology
Root tip cytology of a single sample of C. platypetala (NA66131) confirmed that it was diploid with 24 chromosomes (Fig. 1A) contributing to its 2C genome size of 1.9 pg (Table 1). Similarly, root tip cytology of a sample of C. gotoana var. coreana (NA56375-02C3) confirmed that it was tetraploid with 48 chromosomes (Fig. 1B) contributing to its 2C genome size of 3.5 pg. An interploid hybrid (NA6-10EH) from the USNA breeding program was included in the study, representing a hybrid between the tetraploid C. gotoana 'March Jewel' and hexaploid C. spicata. Root tip chromosome counts confirmed that this selection is indeed pentaploid (2n = 5x = 60), as predicted based on the interploid cross (Fig. 1C). Root tip chromosome counts also confirmed a hexaploid sample, C. willmottiae (NA69317), with 2n = 6x = 72 (Fig. 1D).

Microsatellite analysis
After we screened microsatellite primers developed for other genera in the Hamamelidaceae (Hatmaker et al. 2015) and found inconsistent results in Corylopsis, we developed primers specifically for Corylopsis. Of the original 100 primer pairs tested, we chose 17 that produced consistent, polymorphic, easily scorable amplification products across all species tested. These loci amplified an average of 7.3 alleles (range 3-17) across all species tested (Table 2). These 17 microsatellite primer pairs generated 120 scorable fragments ranging in size from 89 to 226 bp across the 106 Corylopsis accessions tested. Polymorphic Information Content (PIC) ranged from 0.1969 for P54 to 0.8530 for P2; observed heterozygosity (H o ) ranged from 0.0660 for P54 to 0.9340 for P62; and expected heterozygosity (H e ) ranged from 0.217 for P1 to 0.866 for P2 (Table 2).
UPGMA cluster analysis revealed high levels of genetic diversity among accessions, even among some accessions collected from the same area (Fig. 2). Clustering of accessions roughly followed species designations, although exceptions were common, and likely represented misidentification or possibly undetected hybridizations. While some clusters were supported by high bootstrap values (> 80%, Fig. 2), some of the lower-order clusters were not robust.

Ploidy
Our finding of ploidy series within and between species of Corylopsis supports prior studies, but also represents previously unreported ploidy levels in some species. In C. glabrescens, we found four diploid samples, including the cultivar 'Longwood Chimes' (Table 1), consistent with previous reports of diploids in this species by Kono et al. (2008), who investigated somatic chromosomes in meristematic cells of young leaves. Two samples of C. glabrescens (MS1987-116*B and MS1998-185*A) were identified as tetraploids, consistent with a previous report by Santamour (1965), based on cytological studies. Pentaploid and hexaploid C. glabrescens were also revealed with our study, representing the first report of pentaploids and hexaploids in this species.
Two samples of C. gotoana were diploid with a genome size of 1.9 pg, representing the first report of a diploid. One sample was pentaploid (JC060155), representing the first report of a pentaploid. Most samples were tetraploids. This is consistent with reports of tetraploids in C. gotoana found in Japan and Korea (Kim et al. 2008;Kono et al. 2008). Likewise, most samples of C. gotoana var. coreana were also tetraploid, except for one hexaploid accession (NA56375-005). This is the first reported assessment of ploidy in C. gotoana var. coreana.
One sample of C. glandulifera (AA898-81*A) was found to be diploid with a genome size of 1.9 pg, while all other samples were hexaploid (Table 1). This is the first reported assessment of ploidy in C. glandulifera. The single sample of C. himalayana included in our study was diploid with a genome size of 2.1 pg, consistent with previous cytological studies (Mehra and Khosla 1969). The two samples of C. multiflora were both hexaploid, representing the first assessment of ploidy for this species.
All samples of C. pauciflora were diploids, ranging from 1.9 to 2.1 pg, confirming previous reports based on cytology (Anderson and Sax 1935;Oginuma and Tobe 1991;Santamour 1965). Corylopsis × 'Winterthur' was also diploid, which does not support a previous report on its origin as an interploid hybrid of C. spicata (6x) and C. pauciflora (2x) (Bean and Aniśko 2014).
We found one diploid C. platypetala, confirming a report by Santamour based on cytology (1965). However, two samples of C. platypetala (LWL-2769*A and LW1957-0994*B) were tetraploids, which represents the first evidence of polyploidy within this species.
Two samples of C. sinensis were found to be diploid including one cultivar ('Spring Purple'). These results confirm a previous report of diploid individuals of C. sinensis (Santamour 1965). However, except for one pentaploid accession (JC150435-17), and two tetraploids in C. sinensis var. calvescens, all the other C. sinensis accessions we tested were hexaploid (Table 1). This represents the first report of polyploids in C. sinensis.
Two samples of C. willmottiae (LW2016-2002*A and NA85065) were diploid with a genome size of 2.0 pg, conflicting with a previous report of tetraploid C. willmottiae based on cytology (Santamour 1965). This represents the first report of diploid individuals in C. willmottiae. One sample of C. willmottiae (NA67192-H) was found to be tetraploid, and four samples were hexaploid (Table 1).
All samples of C. veitchiana and C. spicata that we tested were hexaploid. Similar reports have come from studies of the collections of the Arnold Arboretum (Anderson and Sax 1935;Oginuma 1991) and Morris Arboretum (Santamour 1965). Hexaploid C. spicata have also been confirmed in cultivated material from Kōchi, Shikoku, Japan (Oginuma 1991).
Although several studies exist for chromosome counts in Corylopsis, only one other study has reported on genome size. Using propidium iodide-based flow cytometry, Garcia et al. (2010) reported a genome size for C. coreana of 2C = 3.93 pg (using Pisum L as an internal standard) and for C. sinensis of 2C = 4.62 pg (using Petunia Juss. as an internal standard). Although genome size and ploidy level were not calibrated using cytology (chromosome counts), it appears that these 2C genome sizes are larger than those calculated in our study. In prior research on other ornamental taxa, a combination of genome size estimates (from flow cytometry) and chromosome counts have revealed substantial species-level variation in 1Cx values (Gillooly and Ranney 2015;Parris et al. 2010;Ranney et al. 2018;Shearer and Ranney 2013). Regardless of the absolute genome size, it is evident from our study that the range of ploidy levels in Corylopsis is greater than previously reported. The ploidy series within species is especially noteworthy as it presents heretofore untapped opportunities for interploid breeding.

Genetic diversity
The primer statistics (PIC, H o and H e ) can be useful in determining the genetic diversity in a population for the alleles at that locus. Because we looked at individual accessions, and not plants from one or more populations in Hardy-Weinberg equilibrium, the primer statistics have little meaning from a population genetics standpoint. However, these properties of the markers may be useful to future population studies in Corylopsis, especially for examining diversity or population structure within a species.
The SSR-based dendrogram (Fig. 2) provides insight into relationships between species, as well as highlights possible misidentification of accessions, both of which are valuable for collections management and breeding. While some accessions of a species grouped together as expected, many grouped in different clusters, particularly those with different ploidy. It is clear that ploidy played at least as large a role in clustering as species and geographic origin (Fig. 3). While it is possible that some accessions were mislabeled, which would lead to inconsistent clustering, it is also possible that some of the seedling material or accessions from nurseries or gardens may actually be interspecific hybrids resulting from outcrossing with other cultivated accessions. Wildcollected accessions (highlighted in bold in Table 1 and Fig. 2) are often deemed the most "authentic" Fig. 2 UPGMA dendrogram showing clustering of 106 Corylopsis accessions based on Jaccard's distance. Bootstrap values greater than 80% are shown above major nodes. Major clusters are marked with letters A-J and are described in the text. Taxa in this figure are identified either by their species names, or for simplicity and for ease of reading, by their variety name if applicable. For example, C. gotoana var. coreana is designated in Fig. 2 as "coreana", followed by the unique accession information from Table 1. Accessions in bold type were wildcollected (See Table 1) ◂ representatives of the species. However, based on our results, it is clear that there is substantial genetic and ploidy variability within species that cannot be explained by outcrossing in a garden setting. The SSR markers developed in this study may be useful for confirming some cases of potential interspecific hybrids in Corylopsis; otherwise, it may be necessary to use SNPs or direct sequencing to determine outcrossing and hybridization in the genus.
Among the accessions we studied, the most distantly related taxa based on the UPGMA algorithm occur in Cluster A (Fig. 2), which contains the only sample of C. himalayana, as well as two diploid and one tetraploid accession of C. willmottiae. Additional accessions of hexaploid C. willmottiae grouped in clusters I and J. The clustering of these species does not appear to be closely related to geographical origin of these accession, which all originated in the westernmost areas of collection (Fig. 3). The remaining accessions are grouped in Cluster B as a well-supported and distinct group that is resolved into two additional clusters, C and D, both with bootstrap values of 98%. Cluster C contains diploid accessions of C. gotoana, C. glabrescens, C. pauciflora, and C. glandulifera. Interestingly, accessions of tetraploid C. gotoana and hexaploid C. glabrescens did not cluster with the diploid accessions. This could be due to additional allelic diversity in the higher ploidy taxa, or possibly interspecific hybridization, or an allopolyploid origin. Previous karyomorphological studies of the centromere location (Kono et al. 2008) indicate possible divergence between Japanese diploids C. pauciflora and C. glabrescens, although our study did not indicate this.
Cluster D is further resolved into Clusters E and F. Cluster E contains only tetraploid C. gotoana var. coreana with high bootstrap support (96%). Previous work by Roh et al. (2007) using nrRNA-ITS sequence showed a tighter relationship among C. gotoana var. coreana, C. glabrescens, C. gotoana, and C. pauciflora than we saw in our study. This is likely due to different accessions and different types of markers used. Cluster F contains the remaining accessions, further divided into Clusters G and H. Cluster G also has high bootstrap support, and contains primarily tetraploid accessions of C. glandulifera, C. sinensis var. calvescens, C. platypetala, and C. gotoana. Previous molecular studies placed C. sinensis var calvescens in a group by itself (Roh et al. 2007) and grouped C. gotoana and C. spicata more closely than our data indicate (Yamanaka et al. 2008). Also in contrast to our study, studies by Kono et al. (2008) revealed similar karyotypes in tetraploid C. gotoana and hexaploid C. spicata, implying possible shared ancestries.
Cluster H is well-resolved (100%), and contains the final clusters, I and J. These clusters contain most of the hexaploid taxa, and are not as well-supported by bootstrap values, at least at the higher clustering levels. All accessions of C. spicata grouped in Cluster F, and most accessions of C. sinensis, hexaploid C. glabrescens, and hexaploid C. veitchiana grouped in Cluster E. A study performed on Hamamelidaceae  Table 1) by Li (2008) using rDNA ITS sequence data grouped C. sinensis (and subspecies) and C. spicata together. While our study placed these two species in distinct clusters, the difference likely reflects the higher order taxa that were used in Li's study, rather than contrasting results.
We used UPGMA to analyze and visualize the data instead of other methods such as STRU CTU RE or Principal Coordinate Analysis (PCoA) because we were looking at individual plants (cultivars or unrelated accessions) and not at populations under selection. The microsatellite markers can be considered random or neutral in terms of selection or fitness, so a phenetic approach (UPGMA) is appropriate. Our primary objective was to examine relationships among accessions from a collections management perspective, rather than to ascertain evolutionary or phylogenetic histories. Indeed, upon visualizing the data in the UPGMA phenogram, we immediately found several accessions that were mislabeled in our collection or that led us to seek additional information on seed source or provenance. It is likely that different clustering algorithms, different plant materials, and different markers would result in different clusters of many accessions and taxa. For example, the study by Roh et al. (2007) based on RAPD markers or ITS sequence data yielded different dendrograms from each other and from the dendrogram presented here. We also performed PCoA on the data using the Jaccard distance coefficient, but found that this method was not useful at explaining variation-it took eleven dimensions to account for 50% of the explainable variation (data not shown). We are therefore confident that the methods we used to analyze the data are appropriate for our markers, plant material, and objective of assessing genetic diversity of living collections to assist with collection management and facilitate breeding.

Conclusion
This study determined ploidy and molecular genetic relationships among 106 Corylopsis accessions representing 14 taxa from botanic garden collections in the U.S. In addition to providing valuable ploidy data that was previously missing from many of these accessions, this study also developed a set of robust Corylopsis-specific SSR markers that will be useful across the genus to make decisions about species identity, collection management, or breeding targets.