Meiotic crossovers are associated with open chromatin and enriched with Stowaway transposons in potato
- 2.4k Downloads
Meiotic recombination is the foundation for genetic variation in natural and artificial populations of eukaryotes. Although genetic maps have been developed for numerous plant species since the late 1980s, few of these maps have provided the necessary resolution needed to investigate the genomic and epigenomic features underlying meiotic crossovers.
Using a whole genome sequencing-based approach, we developed two high-density reference-based haplotype maps using diploid potato clones as parents. The vast majority (81%) of meiotic crossovers were mapped to less than 5 kb. The fine-scale accuracy of crossover detection was validated by Sanger sequencing for a subset of ten crossover events. We demonstrate that crossovers reside in genomic regions of “open chromatin”, which were identified based on hypersensitivity to DNase I digestion and association with H3K4me3-modified nucleosomes. The genomic regions spanning crossovers were significantly enriched with the Stowaway family of miniature inverted-repeat transposable elements (MITEs). The occupancy of Stowaway elements in gene promoters is concomitant with an increase in recombination rate. A generalized linear model identified the presence of Stowaway elements as the third most important genomic or chromatin feature behind genes and open chromatin for predicting crossover formation over 10-kb windows.
Collectively, our results suggest that meiotic crossovers in potato are largely determined by the local chromatin status, marked by accessible chromatin, H3K4me3-modified nucleosomes, and the presence of Stowaway transposons.
Meiosis is a precisely coordinated process where homologous chromosomes undergo pairing and reciprocal exchange of genetic material, ultimately leading to genetically unique haploid gametes. The formation of double strand breaks (DSBs) during the beginning of prophase I marks the initiation of meiosis. Meiotic DSBs are resolved as either crossover (CO) or non-crossover (NCO) events, with the later occurring at a higher frequency . NCOs typically result in the original parental configuration through the synthesis-dependent strand-annealing pathway . COs on the other hand result in the reciprocal exchange of large chromosomal segments between non-sister chromatids and thus contribute to the formation of unique haplotypes and overall genetic diversity for populations of sexually reproducing organisms. COs are also essential for proper chromosomal segregation by providing physical linkages between homologous chromosomes via chiasmata . In most eukaryotes, COs tend to occur in short, 1–2-kb regions, termed crossover hotspots, where crossover rates can be several magnitudes greater than in the surrounding regions [4, 5, 6]. Interestingly, the distribution and strength of crossover hotspots display substantial variation among genera, within species, and between sexes [7, 8, 9, 10, 11].
The majority of our knowledge of recombination originates from experiments conducted in a few model organisms, including yeast, Drosophila melanogaster, and humans [12, 13, 14]. However, several key findings have revealed marked differences in the meiotic process for various organisms. In mammalian species, crossover hotspots are established by the presence of a DNA-binding motif of the PR domain zinc finger protein 9 (PRDM9), which specifically tri-methylates histone H3 on lysine 4 (H3K4me3) and directs nucleosome re-organization at these crossover hotspots during early prophase I [15, 16, 17]. In contrast, yeast does not contain a PRDM9 homolog [18, 19]. Hotspots in yeast are associated with H3K4me3, occur in regions of low nucleosome density near gene promoters, and are not sequence-dependent. Drosophila also lacks PRDM9 and is anomalous due to the absence of crossover hotspots . Meiosis is sex-specific in Drosophila since only female meiosis yields crossovers . The absence of unifying characteristics for crossover hotspots among these species suggests that variation in the determinants of meiotic DSB localization is likely a species-specific phenomenon.
Past studies of recombination in plant genomes relied heavily on detecting crossover events from well-defined pedigrees and more recently from population linkage disequilibrium analysis [22, 23, 24]. Coalescent-based estimates of recombination rates from linkage disequilibrium studies are available for Arabidopsis thaliana, which allowed for the identification of several thousand crossover hotspots . Additionally, it was shown that hotspots in Arabidopsis are controlled by the presence of H2A.Z and H3K4me3 and the absence of DNA methylation in all three contexts (CG, CHG, and CHH, where H stands for any nucleotide except guanine) and preferentially occur in gene promoters [22, 25, 26]. However, coalescent-based approaches infer sex-averaged historical recombination rates from heterogeneous populations and thus cannot reveal information pertaining to crossover events resulting from a single sex-specific meiosis. Pedigree analysis has been used to examine differences in male and female meiosis in animal species [27, 28]. The advent of next-generation sequencing technologies has enabled the discovery of millions (M) of sequence polymorphisms, facilitating gains in crossover map resolution. Here, we report the construction of two high-resolution crossover maps in potato using whole-genome re-sequencing. The fine scale nature of our crossover data sets afforded a unique opportunity to investigate the genomic and chromatin features associated with meiotic crossovers in a plant genome.
Haplotype map construction
A sliding window approach was implemented to phase SNVs from both populations independently, utilizing estimates of linkage disequilibrium (LD) (Fig. 1b). Alternative alleles were well distributed between the haplotypes of RH and US-W4 (Additional file 1: Figure S1). To overcome sequencing errors, low coverage allele bias, and missing data, we conducted haplotyping using sliding windows of 50 SNV increments and Bayesian inference (Additional file 1: Figure S2). The resulting W4M6 and DMRH maps contained 782 and 155 crossovers, with total map distances of 869 cM and 775 cM, respectively, consistent with previous potato mapping reports that varied from 751 to 965 cM (Fig. 1c) [31, 32].
Identification and validation of high-resolution crossovers
The high marker density of our data sets enabled identification of crossovers at high-resolution (Additional file 1: Figure S3), yielding median crossover intervals of 880 and 826 bp, for W4M6 and DMRH, respectively (Additional file 1: Figure S4). Crossover counts in non-overlapping 1-Mb windows were significantly correlated between W4M6 and DMRH, considering the DMRH population had substantially fewer crossovers (Spearman’s rank correlation rho = 0.36, P < 2.2e–16). Subsets of 630 W4M6 and 126 DMRH crossovers were at fine-resolution (less than 5 kb). Of these, 20 and 17 crossovers from the W4M6 and DMRH populations, respectively, had overlapping genomic coordinates. The overlap of W4M6 and DMRH crossovers was significantly greater than by chance (Fisher’s exact test, P < 0.026), suggesting that the positions of crossovers are similarly controlled during maternal US-W4 and paternal RH meiosis. Finally, fine-resolution crossovers from both populations were merged and are collectively denoted as the fine-resolution data set (n = 756).
To assess the accuracy of our fine-resolution data set, ten crossovers with a resolution of less than 1 kb were randomly selected for Sanger sequencing (Additional file 2: Table S3). All ten crossovers predicted by whole-genome re-sequencing were confirmed via Sanger sequencing (Fig. 2b). Additionally, greater than 99% (105/106) of SNVs from all Sanger sequenced crossovers were identical to SNV calls from Illumina reads, indicating that our SNV calling procedure is accurate. These results suggest a robust methodology for accurately calling haplotypes and identifying fine-scale crossover in testcross populations.
Chromosome scale features of meiotic crossovers
Fine-scale genomic characteristics of meiotic crossovers
The fine-resolution of our crossover data set (n = 756) provides a unique opportunity to investigate potential associations of crossovers with various genomic features. We explored the overlap between fine-resolution crossovers and 5′ UTRs, 3′ UTRs, exons, introns, promoters (defined as regions 1 kb upstream of transcription start sites (TSSs)), regions 1 kb downstream from transcription termination sites (TTSs), class I and II transposable elements (TEs), and intergenic regions (regions at least 2 kb away from gene TSSs, TTSs, and excluding TEs). We found that crossovers overlapped 5′ UTRs at a rate of 35.45% (268/756) based on the reference genome DM 1-3 516 R44 (DM) annotated gene set . To test whether this overlap is greater than by chance, we performed a Monte Carlo (MC) simulation (10,000×) by permuting a random set of 756 sequences matched by length to the fine-resolution crossover data set from the DM v4.04 reference, and measured the overlap rate of these random sequences with all 5′ UTRs. The mean overlap rate of the permuted regions with 5′ UTRs was 22.19% with a standard deviation of 1.49%, indicating that the overlap rate of the crossover data set with 5′ UTRs is significantly greater than those of random data sets (empirical, P < 1e–4). Using this methodology for other genomic features, we found that crossovers were significantly associated with 3′ UTRs, exons, introns, promoters, 1 kb regions downstream from TTSs, and DNA transposons (class II TEs), but were negatively associated with RNA transposons (class I TEs) and intergenic regions (Fig. 4c; Additional file 2: Table S5). To make comparisons between crossovers and neighboring regions without crossovers, we constructed a set of 756 genomic locations composed of similar sequence, SNV density, and length as the crossover data set, which were between 10–1000 kb away from a crossover (denoted “cold regions”). In comparison with neighboring “cold regions”, crossovers demonstrated a higher proportion of overlap with 5′ UTRs, 3′ UTRs, exons, introns, promoters, 1-kb downstream regions of TTSs, and a lower proportion of overlap with RNA transposons and intergenic regions, similar to the comparisons with random regions (Fig. 4c; Additional file 2: Table S3). To assess the enrichment of crossovers relative to all genes, we plotted the aggregate crossover density across all genes and their 10-kb surrounding regions (Fig. 4d). Crossovers mainly occurred near TSSs compared to flanking regions, in agreement with the simulation and cold region comparisons.
Next, we investigated the functional annotations of genes or their promoters that overlapped with crossovers. Crossovers were associated with genes related to regulation of biological processes such as “regulation of transcription”, “regulation of a cellular process”, and “transcription factor activity” (Fig. 4e). In contrast, random regions and “cold regions” were not associated with any gene ontology terms (Additional file 2: Table S6). Our results indicate an association between crossovers and genes that play a role in transcriptional regulation.
Crossovers reside in open chromatin
The enrichment of crossovers near genes prompted us to investigate potential chromatin features associated with crossovers. A genomic region that is hypersensitive to cleavage by DNase I is referred to as a DNase I hypersensitive site (DHS), and is a classic mark of open chromatin . DHSs can be identified by partial DNase I digestion followed by high-throughput sequencing . Another mark suggestive of open chromatin, H3K4me3, was recently shown to be enriched within crossover hotspots in A. thaliana, while DNA methylation was notably absent at crossovers . To test for an association between open chromatin features and crossing over, we utilized our recently developed genome-wide DHS data set from DM potato (Zeng ZX, Zhang WL, Marand AP, Buell CR, Jiang JM: Distinct patterns of open chromatin dynamics associated with tissue specifity and response to cold stress, submitted). We used DHSs derived from somatic tissues, as it is not currently feasible to isolate meiocytes for DHS identification in plants. We reason that DHSs consistent across leaf and tuber tissues are likely conserved in most other cell types. A total of 39,205 DHSs was conserved between leaf and tuber tissues (Zeng ZX, Zhang WL, Marand AP, Buell CR, Jiang JM: Distinct patterns of open chromatin dynamics associated with tissue specifity and response to cold stress, submitted). Similarly, we constructed chromatin immunoprecipitation followed by sequencing (ChIP-seq) libraries for H3K4me3 (Zeng ZX, Zhang WL, Marand AP, Buell CR, Jiang JM: Distinct patterns of open chromatin dynamics associated with tissue specifity and response to cold stress, submitted). We identified 14,968 H3K4me3 peaks that were consistent between tuber and leaf tissues.
Crossover events preferentially occurred within euchromatin, where the distributions of crossovers and open chromatin appear similar on the chromosome level. To address whether crossovers exhibit distinct chromatin architecture on a fine scale, we performed MC simulations (10,000×) comparing the mean DNase-seq and H3K4me3 ChIP-seq read counts overlapping crossovers with randomly permuted nearby cold regions (matched regions 10–1000 kb away, with similar length, gene, GC, and SNV density as crossover regions). Interestingly, crossovers were significantly associated with elevated levels of DNase I hypersensitivity (empirical, P < 1e–4) and H3K4me3 (empirical, P < 1e–4) relative to the surrounding euchromatin (Fig. 5b; Additional file 1: Figure S8). Regions of open chromatin are typically regarded as nucleosome-free and are delimited by adjacent nucleosomes. The observation of simultaneously elevated DNase I sensitivity and H3K4me3 in crossovers can be partially explained by taking read count averages across many crossovers coupled with the uncertainty about the precise crossover location within the less than 5-kb intervals. Our data suggest that crossovers preferentially occur in regions of open chromatin and are putatively flanked by neighboring nucleosomes harboring H3K4me3.
DNase I displays a well-known correlation with gene transcription . Similarly, enrichment of H3K4me3 at gene TSSs corresponds to higher gene expression and is frequently associated with active transcription . Of the 756 high-resolution crossovers, 59% (445/756) overlapped a total of 496 genic regions (genes and 1-kb surrounding regions). Crossovers may preferentially occur near or within genes with relatively more accessible chromatin states as a result of active transcription. To test this, we compared the mean DNase-seq and H3K4me3 read counts of crossover-associated genic features (promoters, exons, introns, 5′ UTRs, 3′ UTRs, and 1-kb downstream TTSs) against a similar number of nearby genic features (between 10 and 1000 kb from a crossover) using MC simulations. Interestingly, promoters, 1 kb downstream TTSs, exons, and introns associated with crossovers were significantly enriched with DNase-seq reads compared to simulations (Fig. 5c; Additional file 2: Table S8). H3K4me3 modifications were enriched for regions 1 kb downstream of TTSs, exons, and introns overlapping crossovers, compared to nearby simulated regions (Fig. 5c; Additional file 2: Table S8). Aggregate plots of DNase-seq and H3K4me3 normalized read counts across genic regions associated with crossovers highlight the observation of overall increased chromatin accessibility for genes associated with crossovers (Fig. 5d).
A substantial proportion of crossovers were located near or within genic features. Active genes are generally defined by an overall increase in chromatin accessibility, a feature that may underlie crossover formation. To test whether the association of crossovers with open chromatin is due to genome organization, or an inherent characteristic of crossovers, we surveyed the chromatin state of crossovers specifically residing within intergenic regions. Approximately 77% (583/756) of crossovers overlapped intergenic regions (defined as regions > 2 kb from genes, and excluding TEs). Remarkably, intergenic crossovers were significantly enriched with DNase-seq reads (empirical, P < 9e–3) and associated with higher levels of H3K4me3 ChIP-seq (empirical, P < 1e–4) reads compared to permutations of random intergenic regions (within 10–1000 kb), suggesting a preference of crossovers to occur within open chromatin states regardless of genic context (Additional file 1: Figure S9). Furthermore, intergenic crossovers were distinctly associated with elevated levels of open chromatin relative to surrounding regions (Additional file 1: Figure S10). However, while H3K4me3 was statistically elevated in intergenic crossovers compared to intergenic controls 10–1000 kb away, we did not observe a distinct peak over intergenic crossovers relative to directly flanking regions (Additional file 1: Figure S10). This analysis provides evidence that open chromatin is an intrinsic feature of crossovers and the association of crossovers with genic features may be a byproduct of accessible chromatin configurations typically occurring near genes.
Crossovers are enriched with Stowaway DNA transposons
We then examined the distribution patterns of Stowaway elements genome-wide. A total of 20,247 intact (median length = 224 bp, mean length = 209 bp) Stowaway elements was identified using RepeatMasker . Stowaway transposons were highly enriched upstream of TSSs and highly depleted within gene bodies (Fig. 6b). Counts of Stowaway transposons in non-overlapping 100-kb windows and recombination rate were significantly correlated (Spearman correlation, rho = 0.39, P < 2.20e–16) (Fig. 6c; Additional file 1: Figure S11). While H3K4me3 (Spearman correlation, rho = 0.43, P < 2.20e–16) is the most correlated chromatin feature with recombination rate, the correlation between recombination rate and Stowaway elements was stronger than recombination rate with DNase-seq (Spearman correlation, rho = 0.34, P < 2.20e–16), CHH methylation (Spearman correlation, rho = 0.26, P < 2.20e–16), CHG methylation (Spearman correlation, rho = −0.38, P < 2.20e–16), and CG methylation (Spearman correlation, rho = −0.38, P < 2.20e–16) on a genome-wide survey.
To further investigate broad scale associations of chromatin marks and Stowaway elements with crossovers, a generalized linear model was fit for crossovers per 100 kb using gene and Stowaway counts, DNase-seq reads, H3K4me3 ChIP-seq reads, CG, CHG, and CHH methylation levels in 100-kb windows. Analysis of variance revealed strong significant effects for genes (P < 2.2e–16) and Stowaway elements (P < 2.2e–16) with weaker, yet still significant, effects from DNase I sensitivity, H3K4me3, CHG, and CHH methylation (Additional file 2: Table S10). Fine-scale analysis using the same variables was also fit using 10-kb windows, revealing that at 10 kb resolution, genes, DNase I sensitivity, and Stowaway elements are strong contributors to crossovers, and while still significant, CHH and H3K4me3 had weaker effects on predicting crossover formation (Additional file 2: Table S11).
To determine whether the presence of Stowaways in promoters is quantitatively associated with recombination rate, promoters were grouped by either overlapping the top or the bottom quartile for recombination rate at 100 kb resolution. The counts of Stowaway elements within promoters from the two groups were compared revealing that Stowaway elements are present at a twofold increase within promoters from regions associated with the top quartile of recombination rate compared to regions with low levels of recombination (Wilcoxon rank sum test; P < 2.2e–16; Fig. 6d). Reciprocally, gene promoters were grouped into two sets, promoters containing at least one Stowaway element (n = 3030) and nearby promoters (within 500 kb) lacking Stowaways (n = 3030), and compared their overlying 100-kb-scaled recombination rates using MC simulations. Promoters carrying Stowaways had significantly elevated recombination rates compared to nearby promoters lacking Stowaways (empirical; P < 1.0e–6; Fig. 6e).
Recent studies in several model organisms have revealed an association between regions with low nucleosome density and meiotic crossovers [17, 22, 23, 24, 44]. Historical crossovers in A. thaliana are enriched in TSSs containing elevated levels of H3K4me3 and the non-canonical histone variant H2A.Z, while being depleted of canonical nucleosomes and DNA methylation . Crossovers in our populations occurred frequently near gene TSSs, consistent with the genome-wide correlation of recombination rate with genes within 100-kb windows (Spearman’s correlation, rho = 0.39, P < 2.2e–16). We additionally demonstrate an association between crossovers and regions of DNase I hypersensitivity, consistent with previous findings . DHSs mark regions depleted of bulk nucleosomes , which would be favorable to the landing of the recombination machinery during prophase I of meiosis. Crossovers occurred near genes that play a role in transcriptional regulation, possibly implicating specific gene sets that may be actively transcribed during the early stages of meiosis. By partitioning crossovers into groups that overlap genic and intergenic regions, we revealed that chromatin accessibility underlies the establishment of crossovers regardless of the proximity to genes. Additionally, the occurrence of high levels of open chromatin marks is consistent with the presence of cis-regulatory elements, which we speculate may play a role in meiotic crossover determination. H3K4me3 was enriched at all crossovers, genic and intergenic, but lacked a distinct signal within intergenic regions relative to flanking sites. This may be due to widespread H3K4me3 marks over larger genomic regions, comprising an overall activating chromatin state that is generally favorable to crossover formation. It is important to consider that the DNase-seq and H3K4me3 ChIP-seq data sets were derived from somatic tissues, and although we only utilize conserved reads between tissue types, our data may not reflect the chromatin architecture of meiotic cells.
Repetitive DNA elements guide recombination events in a sequence-specific manner in humans [41, 42]. In mammalian species, recombination is directed by PRDM9, a C2H2 zinc finger protein which specifically tri-methylates H3K4 during prophase I of meiosis [15, 17]. PRDM9 binds to the CCNCCNTNNCCNC degenerate 13-mer motif, which originated from THE1A/B retrovirus-like retrotransposons, providing some of the first evidence implicating repetitive DNA elements in crossover site determination [15, 16, 42]. We detected a significantly enriched MITE within our crossover data set, an association observed in two independent populations and validated by Sanger sequencing. We demonstrate that the presence of Stowaway transposons positively influences recombination rate in gene promoters, although it is critical to note that our estimates of recombination rate are derived from interpolations on 100-kb windows, and thus the low resolution may generalize our results. Genomic regions associated with the top quartile of recombination rate were associated with nearly twice as many Stowaway elements per gene promoter compared to promoters underlying regions from the bottom quartile. It is known that Stowaway preferentially inserts within gene promoters and forms stable secondary structures  and may still be active in potato . The terminal inverted repeats of Stowaway elements share an 11-bp consensus sequence, CTCCTCCGTT, which bears a striking similarity to the crossover-associated motifs CTT-repeat and CCN-repeat, identified in A. thaliana [22, 24], as well as the human CCNCCNTNNCCNC 13-mer recombination hotspot motif . We found both populations enriched with Stowaway elements in crossover intervals. This result may reflect the preference of crossovers and Stowaway elements to occur within open chromatin given the preference of Stowaway elements to insert in TA-rich sequences depleted of nucleosomes. Furthermore, Stowaway transposons leave TA target site duplications, sequence content known to exclude nucleosome binding, leaving chromatin more accessible to meiotic regulatory factors.
A recent experiment revealed that hypomethylated transposons have the potential to contribute functional de novo cis-regulatory elements to nearby genes in the form of enhancers, insulators, or repressors . Such events have the possibility to re-wire transcriptional networks in a development and/or environment-specific fashion. Establishment of novel cis-elements in transposons is accompanied by increases in regulatory epigenetic marks to the local chromatin. Stowaway elements have been previously shown to contain cis-regulatory sequences in potato and tomato, including an embryogenesis nuclear factor binding site in carrot . The positional preference of Stowaway transposons upstream of TSSs suggests that they may be associated with open chromatin. However, we were unable to assess chromatin accessibility at Stowaway transposons due to the short read length (20 bp) of DNase-seq reads, coupled with the high copy and repetitiveness of this TE family. Furthermore, using 20-nucleotide simulated reads derived from the reference genome, we found that less than 46% of Stowaway elements had even a single read align, and less than 0.5% of Stowaway elements contained the coverage expected. This suggests that our current data set is not suitable for TE analysis, possibly resulting in misleading conclusions. However, with longer ChIP-seq reads (150 nucleotides), we did find an enrichment of H3K4me3 at crossover-associated Stowaway elements compared to MC simulations of non-crossover Stowaway elements (empirical, P < 3.0e–2), but the overall levels of H3K4me3 in these groups were substantially lower than random and flanking regions (Additional file 1: Figure S12a). This highlights the well-documented absence of this transcriptional chromatin mark within transposons and may reflect a general depletion of nucleosomes within this TE family. Although we cannot assess chromatin accessibility within Stowaway elements, we can, however, examine open chromatin configurations flanking Stowaway transposons. Chromatin accessibility was greater for Stowaway elements that comprise the top quartile of recombination rate compared to Stowaway elements from the bottom quartile (Additional file 1: Figure S12a). Additionally, genes which contain a Stowaway element within their promoter have overall elevated chromatin accessibility upstream of TTSs and higher H3K4me3 levels within the gene bodies compared to genes lacking Stowaway elements (Additional file 1: Figure S12b). Increased chromatin accessibility for genes associated with Stowaway transposons may be more suitable targets for crossovers.
These results raise several questions as to the roles cis-regulatory elements and active transcriptional epigenetic marks play prior to DSB formation in meiosis. Stowaway elements share a striking number of fine-scale chromatin and genomic features with crossovers, such as preference for nucleosome-depleted regions, AT-rich sequence, and proximity to gene TSSs and H3K4me3-modified nucleosomes. Although Stowaway elements are more correlated with recombination rate than several established meiotic chromatin marks, further experimentation will be necessary to establish a mechanistic role for Stowaway elements in crossover site determination.
Meiotic crossovers are largely determined by open chromatin, marked by DNase I sensitivity and the presence of H3K4me3 in the potato genome. Stowaway MITEs were also significantly enriched within meiotic crossovers from two autonomous populations, and associated with increased recombination rate. Crossovers and Stowaway transposons share a remarkable number of fine-scale chromatin and genomic characteristics. Further investigation into the functional aspects of the chromatin landscape, Stowaway elements, and their relationships will be necessary to determine the precise mechanism of crossover site determination in a plant genome.
Plant materials, genomic DNA isolation, and library preparation
A population of 90 F1 individuals was developed from an interspecific cross between US-W4 (2n = 2x = 24), a heterozygous S. tuberosum dihaploid derived from a tetraploid Minnesota breeding clone, and M6, a S. chacoense clone produced by seven generations of self-pollination. Genomic DNA was isolated from young, emerging leaves of greenhouse grown plants using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA).
Genomic DNA was sheared to 300 bp using a Covaris ultrasonicator. Fragmented DNA was then end repaired, A-tailed, ligated to Illumina compatible adaptors, and PCR amplified for eight cycles. Equal amounts of each library were pooled for gel extraction; the 350–450-bp region was excised from the gel and purified using the QIAquick Gel Extraction Kit (Qiagen). The individual genomic libraries from the F1 population were pooled and sequenced in paired-end (PE) mode generating 150-nucleotide reads on the Illumina HiSeq platform. We obtained approximately four to ten million (~6.3 million average) read pairs per individual. Genomic libraries for US-W4 and M6 were prepared as the W4M6 F1 population, and sequenced with 100- and 150-nucleotide PE reads, respectively. DMRH raw reads were downloaded from NCBI BioProject number PRJNA335820.
Read processing and alignment
The quality of the raw Illumina sequencing reads was assessed with FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were trimmed and filtered using CutAdapt (v1.8.3) , requiring minimum base quality of 10 and minimum read length greater than or equal to 75 nucleotides. All reads were aligned to the potato reference sequence genome, DM v4.04 , an updated version of the S. tuberosum Gp. Phureja DM 1-3 516 R44 whole genome assembly . Reads were aligned using BWA-MEM (v0.7.10) on a per-sample basis using the default parameters  and processed with SAMtools (v0.1.19)  to extract unique, properly paired reads with mapping quality (MQ) greater than 40. Duplicate reads were marked and removed with PicardTools (v1.119; http://broadinstitute.github.io/picard/). Local realignment of reads around insertion/deletions (indels) was performed using GATK IndelRealigner (v3.4.0) .
SNV calling and filtering
Raw SNVs in both populations were called with FreeBayes (v1.0.2-16-gd466dde), with default parameters . The processing of variants is identical in both populations unless noted otherwise. SNVs overlapping repetitive regions identified by RepeatMasker (v4.0.5; Viridiplantae clade database) were removed . Variants with significant segregation distortion identified by a chi-square test with Benjamini–Hochberg false discovery rate (FDR) set to 0.05 were removed from further analysis. Alleles in a bi-parental population should be highly linked. Therefore, for each variant, we randomly sampled 25 polymorphisms between 1 and 10,000 kb away and estimated linkage disequilibrium in the form of r2. Variants with median r2 values less than 0.2 were removed. Additionally, since the average coverage for each individual was low (~2× for the W4M6 population), it is likely that only one allele is represented at any given locus. Thus, we set homozygous reference genotypes to missing if the read depth was less than 2 and 5, for the W4M6 and DMRH populations, respectively. The threshold of missing genotype calls allowed per variant site was set to 40 and 5% in the W4M6 and DMRH populations, respectively, owing to the smaller sample size and higher coverage of the DMRH population. All homozygous alternative allele calls (1/1) were converted to heterozygous genotypes (0/1). The rate of homozygous alternative allele calls to the total number of calls was used as the error rate (Et).
Haplotype phase reconstruction
A sliding window was implemented to phase SNVs based on pairwise patterns of linkage disequilibrium. Windows of 100 SNVs with a 1-SNV shift were used to estimate the correlation of linkage disequilibrium (r2) to determine associated alleles. The first pair of markers was used to arbitrarily set the haplotypes, by taking the most likely haplotypes as the allelic pairs with the greatest frequencies. Subsequent comparisons were made using a variant that had already been assigned a haplotype. Therefore, subsequent haplotype assignments rely on identifying the two haplotypes with the greatest frequencies and setting the haplotypes of the new marker alleles to the haplotypes of the linked marker alleles. This allows all variant haplotypes to be aligned within a window. When the window slides, the new window overlaps the previous window by 99 SNVs and thus allows the continuation of the phasing algorithm until the end of the chromosome. This results in haplotype assignments for each individual at all SNV positions.
Adjacent windows with identical segregation patterns were merged, leaving uniquely segregating haplotype skeleton bins. More details on the algorithm, as well as the source code, can be found at https://github.com/plantformatics/phaseLD.
Determination of crossover breakpoints and recombination rate
Putative crossovers were identified by selecting the haplotype skeleton bins flanking a crossover (adjacent windows of different haplotype assignments). Due to chromosome rearrangements and/or misplaced scaffolds, particularly on chromosome 1, we ordered bins by their genetic positions using MSTMap . Since we are only concerned with crossovers on a fine-scale, we ignored putative crossovers that resulted from flanking haplotype bins that did not physically overlap given their coordinates on the potato pseudomolecules. To determine the precise crossover interval between two SNVs, SNV haplotype calls contained within overlapping adjacent haplotype bins were extracted from the recombinant individual, with SNVs containing missing haplotypes excluded from analysis. Logistic regression was implemented to assign crossover probabilities to each SNV genomic coordinate (independent variable) scored as a binary haplotype (dependent variant, 0 or 1) (Additional file 1: Figure S1). Since we can only define crossovers as occurring between adjacent markers, we estimate the probability of a crossover more precisely as the absolute difference in probabilities assigned to adjacent SNVs. Starting from the SNV pair with the largest crossover probability, we extend outwards to flanking SNVs until the probability of a crossover is greater than 0.95. The extension is accomplished one SNV at a time, always selecting the SNV that improves the crossover probability. We take the crossover intervals as the pair of SNVs that have a probability of a crossover greater than 0.95.
A traditional genetic map was constructed using each 50-SNV haplotype window as a genetic marker, and estimating genetic linkage using MSTMap . The centers of each window were used to create a Marey map with the respective position in centimorgans. Recombination rate was then interpolated onto 100-kb windows using the cubic spline function from the R package MareyMap .
Map validation by Sanger sequencing and PCR analysis
PCR primers were designed to surround ten crossovers identified collectively from the W4M6 and DMRH populations (Additional file 2: Table S3). PCR was performed for 36 cycles of heat denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 30 s after an initial heat denaturation at 95 °C for 3 min. The PCR mix (20 μl) consisted of 1× Ex Taq Buffer (Mg2+ plus), 0.2 mM dNTP mixture, 0.5 μM primers, 1 U of TaKaRa Ex Taq polymerase (Clontech, Mountain View, CA, USA), and 250 ng of genomic DNA. PCR products were visualized using gel electrophoresis to confirm the presence of the target band. PCR products were then cloned into the pCR4 plasmid using TOPO TA cloning (Invitrogen). Plasmids with the correct insert size identified using M13 primers (Invitrogen) were subjected to BigDye (Thermo Fisher Scientific) sequencing reactions consisting of an initial 95 °C for 1 min, followed by 44 cycles of heat denaturation at 95 °C for 10 s and 58 °C for 4 min. Reaction clean-up, capillary gel electrophoresis, and laser detection were performed by the Biotechnology Center at the University of Wisconsin-Madison. The predicted haplotype calls and associated nucleotides based on Illumina reads were compared to the aligned Sanger sequencing product.
Heterozygous deletions on chromosome 1 of US-W4 were identified using the software DELLY . PCR primers were designed surrounding 13 heterozygous deletions (Additional file 2: Table S1). Genomic DNA was collected from a subset of individuals (n = 56) of the F1 population and from the two parents (US-W4 and M6) using the DNeasy Plant Mini Kit (Qiagen) following the manufacturer's instructions. We initially screened these markers in the parents, and found that ~ 54% (7/13) of these deletions were heterozygous in US-W4 and homozygous in M6. DNA samples were amplified by PCR using these seven primers (Additional file 2: Table S2). PCR was performed for 36 cycles of heat denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 30 s after an initial heat denaturation at 95 °C for 3 min. The PCR mix (20 μl) consisted of 1× Ex Taq Buffer (Mg2+ plus), 0.2 mM dNTP mixture, 0.5 μM primers, 1 U of TaKaRa Ex Taq polymerase (Clontech, Mountain View, California), and 250 ng of genomic DNA.
Assessing the association of crossovers with genomic features
We used a 1-bp minimum overlap between crossovers and various genomic features, and measured the overlapping rate of crossovers with 5′ UTRs, 3′ UTRs, exons, introns, promoters, 1 kb downstream genes, class I and II TEs, and intergenic regions. We then randomly permuted genomic regions using BEDtools shuffle (v2.25.0)  and assessed the overlap of these random regions with each feature. This simulation was conducted 10,000 times.
In order to compare the fine resolution crossover data set (n = 756) with a non-crossover control, we developed a set of random, recombination cold regions (n = 756) which satisfied the following criteria: (i) similar GC content within 10% of the matched crossover interval; (ii) within 10–1000 kb of a crossover; (iii) exact same fragment length as the matched crossover interval; (iv) similar SNV density (within 10%) to the matched crossover interval; and (v) on the same chromosome as the matched crossover interval. The crossover and recombination cold region data sets were assessed for their overlap with 5′ UTRs, 3′ UTRs, exons, introns, 1 kb upstream of TSSs, 1 kb downstream of TTSs, class I and II TEs, and intergenic regions. If a fragment overlapped more than one feature, it was counted toward all overlapped features. Evaluation of these intersections was performed using BEDtools intersect . TEs were annotated using RepeatMasker (v4.05)  and the Viridiplantae clade repeat annotation from the RepBase database (http://www.girinst.org).
Gene ontology enrichment
Genes or their promoters (n = 937) that overlapped crossovers were screened for enriched gene ontology terms using agriGO . A total of 937 random genes were used as the randomized control for gene ontology enrichment. Matched nearby cold regions were also screened for gene ontology enrichment of genes or their promoters they overlapped (n = 937), similarly as for the crossover data set. Enrichment analysis was performed using Fisher’s exact test and the Benjamini–Hochberg FDR P value normalization. Background terms were set to all annotated potato genes for each enrichment test.
Chromatin state analysis
DNase-seq and H3K4me3 ChIP-seq reads were aligned to the DM v4.04 potato reference genome using Bowtie  with default parameters. Only reads aligning uniquely were retained. DNase-seq and H3K4me3 reads that overlapped between tuber and leaf data sets were kept for further analysis. For genome-wide correlation analysis, read counts in 100-kb non-overlapping bins were summed for DNase-seq and H3K4me3, and averaged for DNA methylation and compared via Spearman’s correlation.
For aggregate plots, epigenetic mark data were averaged in 10-bp non-overlapping bins flanking up to 5 kb either side of each crossover interval. Since each center feature was variable in length, we divided each feature into 50 windows and normalized the epigenetic mark data based on a per nucleotide basis. The final plots represent normalized averages across noted regions for the given epigenetic mark. The shaded regions surrounding random or cold values demarcate two standard deviations, calculated from the empirical distribution of 100 permutations.
Monte Carlo simulations were used to evaluate the statistical association of DNase-seq and H3K4me3 reads over genic features (UTRs, promoter, exons, introns, and 1 kb downstream TTS) that overlapped crossovers. The mean normalized read count for crossover-associated features were compared to distributions of the same features stemming from permuted regions 10–1000 kb away. A similar approach was utilized for comparing the mean read counts of DNase-seq and H3K4me3 reads overlapping all crossovers, gene-associated crossovers, and intergenic crossovers compared to control regions 10–1000 kb away.
We thank David Douches and Joseph Coombs for sharing the SNP array data of US-W4. We thank Ning Jiang for her valuable comments on the manuscript and Katelyn Lohr for technical assistance.
This research was supported by National Science Foundation (NSF) grant ISO-1237969 to CRB, REV, and JJ and Hatch funds to JJ.
Availability of data and materials
All sequencing reads from the W4M6 population are available from NCBI Sequence Read Archive (SRA) under accession number PRJNA356643. Sequence reads from the DMRH population can be found under accession number PRJNA335820. Sanger sequences from the W4M6 and DMRH crossovers have been deposited to NCBI GenBank under the accession numbers MF598836–MF598847.
Haplotype map construction and crossover site identification scripts can be found at https://github.com/plantformatics/phaseLD.
SHJ and JJ designed the research, APM, XZ, ZZ, EC, LN, and AJH performed experiments, APM, HZ, CPL, REV, CRB, and JJ analyzed data, APM and JJ wrote the article. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 30.Hardigan MA, Crisovan E, Hamilton JP, Kim J, Laimbeer P, Leisner CP, Manrique-Carpintero NC, Newton L, Pham GM, Vaillancourt B, et al. Genome reduction uncovers a large dispensable genome and adaptive role for copy number variation in asexually propagated Solanum tuberosum. Plant Cell. 2016;28:388–405.CrossRefPubMedPubMedCentralGoogle Scholar
- 32.van Os H, Andrzejewski S, Bakker E, Barrena I, Bryan GJ, Caromel B, Ghareeb B, Isidore E, de Jong W, van Koert P, et al. Construction of a 10,000-marker ultradense genetic recombination map of potato: Providing a framework for accelerated gene isolation and a genomewide physical map. Genetics. 2006;173:1075–87.CrossRefPubMedPubMedCentralGoogle Scholar
- 38.Giraut L, Falque M, Drouaud J, Pereira L, Martin OC, et al. Genome-Wide Crossover Distribution in Arabidopsis thaliana Meiosis Reveals Sex-Specific Patterns along Chromosomes. PLOS Genetics. 2011;7(11):e1002354.Google Scholar
- 43.RepeatMasker Open-4.0. http://www.repeatmasker.org/.
- 50.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10–12. http://journal.embnet.org/index.php/embnetjournal/article/view/200.
- 51.Potato_Genome_Sequencing_Consortium: Genome sequence and analysis of the tuber crop potato. Nature 2011:189–94.Google Scholar
- 55.Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio.GN]; 2012. https://arxiv.org/abs/1207.3907.
- 62.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.