Introduction

Cultivated rye (Secale cereale L.) is a cross-pollinated, diploid (2n = 14) cereal with seven pairs of chromosomes. Due to its exceptional tolerance to low temperatures in winter and minimal soil requirements, rye can be cultivated in regions with severe climates or in those with light, sandy, low pH, and infertile soils (Bushuk 2001).

Rye breeding efforts are focused on improving grain yield as the most crucial objective. At present, hybrid rye varieties give 10–20% higher yields than open-pollinated varieties grown under the same agrotechnical conditions. One of the most exploited hybrid breeding systems in rye relies on Pampa (P) sterilizing cytoplasm, which originated from Iranian primitive rye and Argentinian landraces (Geiger and Schnell 1970; Geiger and Miedaner 1996). Other types of CMS, exploiting Vavilovii cytoplasm and designated as C, G, and R, are also studied extensively (Melz and Adolf 1991; Kobyljanskij, 1969; Börner et al. 1998; Łapiński and Stojałowski 2003; Stojałowski 2005; Milczarski et al. 2016). However, except for the G cytoplasm (Melz et al. 2001), these resources are not employed in commercial breeding programs.

Restoration of pollen production in sterile plants, including rye hybrids, requires a male parent (pollinator) that holds effective pollen fertility restoration (Rf) nuclear genes. Unfortunately, male fertility restoration genes for the Pampa cytoplasm appear in less than 5% of European rye materials, with effectiveness in the 2–74% range (Miedaner et al. 2005). Moreover, the activities of these genes may be dependent on the environmental conditions present before and during flowering (Geiger and Miedaner 1996). Lack of pollen and the resulting presence of young, unfertilized ovaries facilitate infection by the ergot fungus Claviceps purpurea, which replaces the seeds with dark mycelial masses sclerotia (Miedaner and Geiger 2015). Successful breeding of rye hybrids is determined by the identification of the major nuclear restorer QTLs. So far, mapping analyses have mapped the QTLs with highest restoration ability for CMS Pampa (Rf) to chromosomes 1RS in European rye resources (line L18) and 4RL in non-adapted rye accessions from Iran (IRAN IX) (Rfp1) and Argentina (Pico Gentario) (Rfp2) (Miedaner et al. 2000). The relevant gene on 1RS explained 54% of the phenotypic variation, and the gene on 4RL explained 68% and 59% of phenotypic variation in the IRAN IX and Pico Gentario populations, respectively. Another source of superior pollen fertility restoration is the Rfp3 QTL derived from the Iranian primitive rye “Altevogt 14160,” which mapped within a 2.5 cm segment on the 4RL chromosome (Hackauf et al. 2017). Three minor QTLs from European line L18 were mapped to chromosomes 3RL, 4RL, and 5R and explained 17%, 9%, and 11% of the phenotypic variation, respectively (Miedaner et al. 2000). Additionally, in the Pico Gentario population, a gene that significantly enhanced the expression of the major restorer gene Rfp2 was found on the 6R chromosome (Miedaner et al. 2000).

The Rfp1 and Rfp2 QTLs have been extensively transferred to pollinator elite inbred lines using the backcross breeding method, raising pollen fertility to 55–90% (Miedaner et al. 2005). The marker-assisted selection (MAS) breeding process was streamlined by the development of sequence-characterized amplified region (SCAR) markers based on RAPD and AFLP markers tightly linked to the critical QTLs (Stracke et al. 2003). However, long intervals between the markers flanking the Rfp1 (2.9 cm) and Rfp2 (5.2 cm) QTLs are problematic because the region may encode undesirable genes surrounding the introgressed allele (Frisch and Melchinger, 2000; Miedaner et al. 2017). Recent development of accurate conserved ortholog set (COS) markers (TC300731 and TC256739) delimiting Rfp1 within a 0.7 cm interval (Hackauf et al. 2012) and an EST-derived CAPS marker, c28385, which co-segregates with Rfp3 (Hackauf et al. 2017), was also beneficial. These markers allowed analysis of linkage drag effects between the restorer gene and undesirable gene(s)/QTLs and facilitated investigation of orthologs of the Rf QTLs originating from different genetic rye resources, as well as from barley (Hackauf et al. 2017).

High-density genetic maps are vital to increase the precision of QTL mapping and marker development for efficient MAS programs. Several maps have been constructed for rye. One early map uses low-throughput RFLPs (Korzun et al. 1998; Ma et al. 2001). Another map relies on AFLP and RAPD markers (Masojć et al. 2001; Saal and Wricke, 2002; Bednarek et al. 2003; Milczarski et al. 2007). The average saturation of the current rye maps varies, with a single marker per 3.0–4.0 cm (Korzun et al. 1998; Masojć et al. 2001; Ma et al. 2001; Bednarek et al. 2003). All of the maps have gaps that extend over more than 20 cm (Korzun et al. 1998; Ma et al. 2001; Saal and Wricke 2002; Bednarek et al. 2003). The introduction of SSR markers allowed the development of more saturated maps (Saal and Wricke 1999; Khlestkina et al. 2004; Hackauf and Wehling 2003; Milczarski et al. 2007). However, the introduction of DArT and later DArTseq technologies improved map saturation and reduced gap size to approximately 1.1 cm (Milczarski et al. 2016). There length of the maps were 1245 cm (Bauer et al. 2017), 1593.0 cm (Milczarski et al. 2011), and 3144.6 cm (Bolibok-Brągoszewska et al. 2009). The DArT-based genetic map enriched with GBS markers was also devoted to the localization of the Rfc1 gene that restores male fertility in rye with the C source of sterility-inducing cytoplasm (Milczarski et al. 2016). Recently, Bauer et al. (2017) used a whole-genome shotgun (WGS) sequencing strategy to create a high-density genetic map of rye inbred line Lo7 with an average distance of 0.6 cm between loci.

Despite numerous studies devoted to the pollen fertility restoration trait in rye (Miedaner et al. 2000, Stracke et al. 2003, Hackauf et al. 2012, Miedaner et al. 2017), little is known regarding the genes participating in the phenomenon and their roles in the trait. Based on data from other species, Rf genes may belong to the pentatricopeptide repeat (PPR)-containing gene family (Wang et al. 2006, Klein et al. 2006, Kazama and Toriyama 2003, Brown et al. 2003, Desloire et al. 2003) that encodes proteins required for many posttranscriptional processes in organelles (reviewed in Hammani and Giegé, 2014). In most cases, Rf-PPR proteins prevent the accumulation of the CMS-gene products (Kazama et al. 2008, Uyttewaal et al. 2008, Hu et al. 2012). The other candidate gene implicated in fertility restoration belongs to the mitochondrial transcription termination factor family (mTERF) and was identified in rye (Hackauf et al. 2017) and barley (Bernhard et al. 2019). Fine-mapping of the 4R chromosome region carrying Rfp1, Rfp2, and Rfp3 loci demonstrated its orthology to sub-genomic regions in rice and Brachypodium that contained mTERF and PPR-encoding genes (Hackauf et al. 2012 and 2017).

The study aimed to identify QTLs responsible for male fertility restoration in rye with CMS Pampa using an advanced recombinant inbred line (RIL) population utilizing high-throughput DArTseq marker technology for genetic map construction and evaluation of sequence-specific markers linked to pollen fertility restoration trait. Furthermore, we were also interested in identifying plausible genes participating in pollen fertility restoration in rye with CMS Pampa.

Materials and methods

Plant materials

The RIL F7 mapping population S64/04/01, encompassing 175 individuals, was developed in DANKO Plant Breeding Ltd. (Breeding Department Choryń, Poland) by a single seed descent method from a biparental cross with female parent S305N/00 (maintainer) on non-sterilizing cytoplasm and male parent SO37R/05 on CMS Pampa (restorer line). The F2 seeds obtained from bag-isolated F1 plants were ground-planted and used to develop the RIL mapping population up to the F7 generation.

Phenotyping

Pollen fertility restoration of the RILs was evaluated indirectly via phenotyping of BC1F1 materials derived via backcrossing of maternal line S305P/00 (on CMS Pampa) and RIL F7 lines of mapping population S64/04/01. Individual BC1F1 progeny plants were grown in 2 m long single rows with 25 cm spacing during the 2016/17 vegetation season in a field belonging to the DANKO Plant Breeding Ltd., Breeding Department Choryń, Poland. The evaluation of male fertility was conducted in five plants via visual scoring of three spikes per plant at the flowering stage, according to the 1–9 bonitation scale developed by Geiger and Morgenstern (1975). Fully male sterile plants were scored as 1–3 and referred to non-dehiscent, empty anthers with decreasing levels of degeneration. Partly sterile plants exhibited values 4, 5, and 6 and differed in their percentages of male fertile anthers (< 10%, 11–50%, and < 50%, respectively). Plants with fully pollen-shedding anthers of increasing anther size were scored as 7–9.

The normal distribution of the fertility trait was tested using the Kolmogorov-Smirnov test implemented in XlStat software (XlStat 2019). The χ2 goodness-of-fit test was conducted to verify a trait segregation ratio of 1:1 using MapQTL 5 (Van Ooijen, 2004). In this calculation, the partially fertile plants were included in the fertile class.

DNA isolation

Total genomic DNA was extracted from approximately 100 mg fresh leaf tissue from each line of the 175 RIL: S64/04/01 population using a DNeasy Plant Mini Kit 250 according to the manufacturer’s instructions. DNA integrity and purity were assessed via electrophoresis on 1% agarose gels stained with EtBr (0.5 μg/ml) in TBE buffer. DNA was quantified spectrophotometrically using a NanoDrop (ND-1000) instrument.

Genotyping

The DArTseq platform developed by Diversity Arrays Technology Pty Ltd. (Canberra, Australia) was employed for genotyping. The platform detects SNP (single nucleotide polymorphisms) and silicoDArT markers using PstI and TaqI digestions for the reduction of genome complexity followed by next-generation sequencing (NGS) of short fragments with a HiSeq 2000 sequencing system (Illumina Inc., San Diego, USA) (Sánchez-Sevilla et al. 2015). The resulting marker sequences were filtered for quality, with a cutoff value at 90% confidence. The SNP and silicoDArT markers were coded as “0” or “1,” according to their absence or presence, respectively.

Linkage map construction

The genetic map was constructed using MultiPoint Ultra-Dense software (Ronin et al. 2015). Markers exhibiting > 15% missing data were excluded. All SNP and silico DArT loci that showed no or minimal deviation from the expected 1:1 segregation ratio (χ2 ≤ 19.2) were employed in the analysis.

Genetic map construction consisted of the following steps: (1) Markers with zero distance were grouped, and a “delegate” was selected from each group. Only markers with at least the same number of twins as the predefined threshold were selected as delegates and were defined as “skeleton.” Markers exhibiting identical segregation patterns as the delegate/skeleton markers were assumed redundant. (2) All remaining markers, except for candidate twins, were removed to the heap. (3) Delegate markers (most representative skeletons and their redundant markers) were clustered, and the resultant linkage groups (LGs) were ordered. (4) Gaps were filled, and LG ends were extended using markers from the heap (heap contains markers that due to, i.e., segregation problems or missings were primarily removed from mapping procedure). (5) Markers violating map stability and monotonic growth of distance from a marker and its subsequent neighbors were removed.

Assigning linkage groups to the rye chromosomes

The LG groups were assigned to rye chromosomes based on known chromosomal locations of SNP and silicoDArT markers provided by Diversity Arrays Technology Pty Ltd. The S-L orientation of the LGs on rye chromosomes and the alignment of the LGs to the rye genome were verified using a high-density genetic map of rye inbred line Lo7 presented by Bauer et al. (2017). Similarities between SNP and silicoDArT marker sequences and the sequences of WGS contigs placed on the map were identified for this purpose. The order of common (homologous) markers was tested by Pearson correlation in XlStat software (XlStat, 2019). The map was visualized in MapChart (Voorrips, 2002).

Quantitative trait loci (QTL) analysis

Relationships between the segregation of molecular markers and studied traits were analyzed using a nonparametric Kruskal-Wallis K* test (Lehmann, 1975) using the MapQTL package, version 5.0. (VanOoijen 2004). Genomic regions were considered to contain QTLs if the significance of molecular markers was p ≤ 0.005. Verification of QTL mapping was performed using the composite interval mapping (CIM) method with Windows QTLCartographer software, version 2.5 (Wang et al. 2007). QTL significance was evaluated using a 1000-permutation test with α = 0.05 significance level. The R2 value (phenotypic variance explained (PVE) by a QTL) was calculated as the proportion of each QTL’s phenotypic variation. A backward regression method with a window size of 3 cm, walk speed of 1 cm, and number of control markers equal to five was used for CIM.

Population structure and association mapping

Population structure was investigated using principal components analysis (PCA) in PAST software (Hammer et al. 2001). Association mapping was performed in TASSEL (Bradbury et al. 2007) using all SNP and silicoDArT markers. The General Linear Model (GLM) was tested to evaluate markers associated with pollen fertility restoration genes. Significant associations were indicated by the Bonferroni test with p < 0.01 (0.01/number of markers). The degree of association was represented by the determination coefficient (R2).

Marker sequence homology

The DNA sequences of 435 SNPs and silicoDArTs linked and/or associated with the trait were searched using BLASTn against GenBank in the National Center for Biotechnology Information (NCBI) database. Skeleton, redundant, and added (approximated/regressed on map) markers linked to the QTLs of the trait, and mapped on the 4R and 5R chromosomes, were searched. A similar analysis was performed for markers associated with the trait and their redundant counterparts based on segregation. Classification of the query sequences was based on (1) identity (I, percentage of similarity between the subject and query sequences over the length of the coverage area), (2) query cover (QC, percentage of the query sequence that overlaps the subject sequence), and (3) E value (probability value) criteria. The taxonomic category selected during searches was the Poaceae family.

Sequence similarity between markers and rye contigs (Bauer et al. 2017) was searched using the BLASTn program and Lo7 WGS contigs v2 database (https://webblast.ipk-gatersleben.de/ryeselect/).

Conversion of SNP and silicoDArT markers to PCR-based assays

Twenty-seven SNP and silicoDArT markers linked to or/and associated with fertility restoration were converted into PCR-based assays (Suppl. File S1). DNA sequences of the markers were blasted against the sequences of rye Lo7 WGS contigs (Bauer et al. 2017, https://webblast.ipk-gatersleben.de/ryeselect/). The 69 bp marker sequences were contained within contigs (ca. 1000–7000 bp in size), which made it possible to design primers. For two markers (3,358,169 and 5,037,479), the primers were designed based directly on the sequence of the gene encoding keratin-associated protein (KAP) 5–4-like.

SNPs and silicoDArTs with a 99–100% match to the sequence of rye Lo7 WGS contigs (sequence identities = 68/69 or 69/69) were analyzed in Primer3web software version 4.1.0 (http://bioinfo.ut.ee/primer3/) to identify primer pairs for their amplification. The primer design criteria were as follows: 40–60% GC rich; minimum annealing temperature, 50 °C; no or negligible secondary structures; and product size ≥ 400 bp.

Converted markers were tested using DNA from fertile and sterile parents (SO37R/05 and S305P/00). The optimal annealing temperature was inferred using a gradient PCR with temperatures set between 51.0 and 65.0 °C (Labcycler Gradient, SensoQuest GmbH). Reaction mixtures consisted of 10 ng of total genomic DNA, 50 μmols each of PCR primer, 2.5 mM dNTPs, 2.5 mM MgCl2, 1 × reaction buffer, and 0.25 U of Gold HotStart DNA Polymerase (Syngen Biotech Ltd.) in a final volume of 10 μl. Amplification was performed using the following profile: [95 °C-15′] [95 °C-30″; X°C-45″; 72 °C-45″] × 35 [72 °C-10′] [5 °C ∞], where “X” reflects the annealing temperature selected from PCR gradient profile reactions (Suppl. File S1). The PCR products were separated on 1.2% agarose gels in TBE buffer at 5 V/cm for 1 h. Segregation of the PCR-based markers was tested on DNA samples of 48 lines (23 fertile; 3 partly fertile, and 22 sterile) from the S64/04/01 mapping population. PCR-based markers with segregation consistent with their DArTseq or silicoDArT counterparts were considered suitable for selection purposes. Markers that were successfully converted to sequence-specific PCR-based assays were appended with “c” at the end of their original counterpart names (e.g., 3,358,169 vs. 3358169c).

Results

Phenotyping

Sterile, partly fertile, and fertile genotypes of the BC1F1: S305P/00 × [RIL F7(S64/04/01): S305N/00 × SO37R/05] population were represented by 80, 6, and 59 RILs, respectively. Phenotypic data were not available for 30 of the 175 lines.

Based on phenotypic data, the fertility trait did not follow a normal distribution, as indicated by the Kolmogorov-Smirnov test (D = 0.841; p < 0.0001; α = 0.05).

The whole population was divided into two phenotypic classes: sterile (lines with 1–3 score) and fertile (the remainder of the population). The sterile-to-fertile ratio was 80:65. Chi-square adjustment tests revealed that the population deviated significantly from the expected 1:1 segregation ratio (χ2 = 10,337, p < 0.01 at α < 0.05).

Genetic map

Genotyping of the RIL S64/04/01 mapping population produced approximately 36,000 SNP and 128,000 silicoDArT markers. The constructed map consisted of seven LGs (Suppl. File S2, Fig. 1) with 15,516 markers: 643 skeletons, 2418 redundant, and 12,455 added (Table 1). The most extended linkage group was constructed for chromosome 2R, which spanned 182.8 cm and contained 103 markers with one marker per 1.77 cm on average. The shortest LGs were for the 1R (147.1 cm) and 7R (144.7 cm) chromosomes, with 97 and 93 markers, respectively. In total, the map spanned 1070.5 cm with one skeleton marker per 1.66 cm, on average. Despite the high saturation of the map, some gaps between skeleton markers remained (Fig. 1). The most substantial gap, spanning over 28 cm, was identified on the 4R chromosome. The linkage group S-L orientation was based on the known position of markers on the winter rye inbred line Lo7 genetic map (Bauer et al. 2017). Collinearity assessment of RIL S64/04/01 and the map published by Bauer et al. (2017) showed that correlation indices (p < 0.0001) were in the range 0.822–0.993 (Table 1), with the purest and best correlations for the 2R and 6R chromosomes, respectively.

Fig. 1
figure 1

Genetic map of rye based on the RIL S64/04/01 mapping population. Red boxes indicate genomic regions determining male fertility restoration QTLs conferring pollen fertility. The markers closely linked to the pollen fertility restoration loci are indicated in red

Table 1 Characteristics of the RIL S64/04/01 genetic linkage map based on silicoDArT and SNP markers

Detection of QTLs for pollen fertility restoration

Nonparametric Kruskal-Wallis (K-W) analysis of the RIL S64/04/01 population detected 71 skeleton markers significantly (p ≤ 0.05) associated with the fertility trait. An important QTL region controlling fertility restoration was detected on the distal part of the long arm of the 4R chromosome. Markers 3731389, 3730937, 3730803, 3600544, 3890856, 5215639, 3602675, and 3346064, located on the 4R map between 149.0 cm and 160 cm, were evaluated as the most significant (p ≤ 0.0001) (Table 2). The association values (K*) for these markers were greater than 38.66. Eleven added markers with p values ≤ 0.005 and ≤ 0.001 were also found on 4R, between 129.8 cm and 143.6 cm. Three additional loci were identified on the 3R, 5R, and 7R chromosomes (Table 2). The remaining markers, with a p value ≤ 0.01, were dispersed throughout all chromosomes.

Table 2 Kruskal-Wallis test showing the association between markers and fertility restoration in rye with CMS Pampa. Only markers with significance values p ≤ 0.01 are shown

Composite interval mapping (CIM) was generally congruent with K-W analysis and confirmed the presence of two QTLs conferring fertility restoration in rye with CMS Pampa. A highly significant QTL (QRft-4R) with logarithm of the odds (LOD) score 30.3 (p = 1000; LOD = 3.1) was mapped to the distal part of the long arm of the 4R chromosome (Figs. 1 and 2, Table 3) and spanned over 8 cm. QRft-4R exhibited additive effects (A = 3.19) and explained up to 60.0% of the phenotypic variance for pollen fertility restoration.

Fig. 2
figure 2

Composite interval mapping demonstrating the position of the QTLs identified on the 4R (a) and 5R (b) chromosomes based on the RIL S64/04/01 mapping population

Table 3 Characteristics of pollen fertility restoration QTLs identified by CIM based on the RIL S64/04/01 mapping population. LOD: the logarithm of odds; R2 (%): the percentage of phenotypic variance explained by the given QTL; A is the value of the additive effect of the SO37R/05 allele; RecL and RecR indicate the recombination value of the markers in the nearest vicinity of the QTL LOD function maximum

The silicoDArT markers flanking QRft-4R were located 0.49 cm (3602675) and 3.0 cm (3346064) apart from the LOD maximum (Table 4). The other two closely linked markers were mapped at a distance of 1.75 cm (5215639) and 2.71 cm (3890856). Besides, the QTL region was saturated with six redundant (Table 4) and 202 added markers.

Table 4 Arrangements of skeleton markers closely linked to QRft-4R and QRft-5R in the RIL S64/04/01 mapping population

The second QTL (QRft-5R) was mapped to the 5R chromosome and had a LOD maximum value equal to 4.5 and spanned about 5 cm (Table 3). QRft-5R passed the permutation test (LOD 3.1). The maximum LOD value for QRft-5R occurred at position 49.0 cm of the map. The 3601104 (0.63 cm), 5036750 (0.72 cm), and 4498362 (1.91 cm) markers were the closest silicoDArTs to the QTL LOD function maximum (Table 4). The QTL region was represented by ten redundant and 52 added markers. The QRft-5R was characterized by an additive effect A = 1.31 (Table 3).

Association mapping

Principal components analysis (PCA) failed to identify any signs of population structure (not shown). A General Linear Model (GLM) approach allowed the identification of 176 SNP and silicoDArT markers associated with the pollen fertility trait at p < 0.001. Thirty-seven of those markers had 650 redundant counterparts, increasing the total number of associated markers to 826. The association coefficients (R2) of the markers that passed the Bonferroni test (p = 6.14E-07) varied from 0.162 to 0.583. Markers mapped to chromosome 4R. Of the 30 markers with the highest association values, 21 mapped within the QRft-4R region (Table 5). The 5605675 marker, with the highest value of R2 (0.583, p = 2.21E-28), was 0.49 cm from the LOD function maximum representing QRft-4R. None of the associated markers were assigned to the QRft-5R region.

Table 5 Association mapping results for 30 markers with the highest p values

Marker sequence homology

Of the 435 DArTseq/silicoDArT marker sequences, 77.5% sequences were not available in any database sequences. The remaining marker sequences showed similarity to genomic sequences from Triticum aestivum, Hordeum vulgare, Aegilops tauschii, Oryza sativa, and Brachypodium distachyon (Suppl. Tab. S1). Four of the marker sequences (3887543, 16404848, 3599981, 5500712) approximated to the 4R linkage group in proximity to the 5215639 and 3890856 markers closely linked to QRft-4R (1.75 and 2.71 cm, respectively) and exhibited sequence similarity to Rfm1 (I% = 84, 89, 80, and 90.5, respectively), which is responsible for pollen fertility restoration in barley. Three of the markers (3887543, 16404848, and 5500712) were associated with the fertility trait (R2 = 0.578, 0.473, and 0.443, respectively). The SNP 3362765_16: T > A marker associated with pollen fertility restoration (R2 = 0.459) also shared similarity with the Rfm1 locus (I% = 100; Suppl. Tab. S1). Marker 7500216 mapped in the vicinity of the 3602675 marker, which was located 0.49 cm from the QRft-4R LOD function maximum and was associated with the fertility trait (R2 = 0.460) and had low sequence similarity (I% = 87.9) to the DNA sequence of the fertility restoration gene Rf1 from Aegilops tauschii.

DNA sequence similarities between markers associated/linked with/to the trait and Rfm1 sequences ranged from 1.00E-04 (16404848) to 1.00E-17 (3362765_16:T > A) (Suppl. Tab. S1). Sequence identity between markers and one of the three rye contigs, Lo7_v2_contig_237103 (1127 bp), Lo7_v2_contig_267616 (4543 bp), and Lo7_v2_contig_80366 (3760 bp), was close to 100% (Bauer et al. 2017). These contigs matched three Rfm1 gene regions (5177–8797 bp, 452,246–452,967 bp, and 76,142–76,384 bp) with high probability (E value = 0.0, 177, and 47, respectively).

Four of the markers (3351619, 3357230, 3358064, 3885888) exhibited similarity to the sequences of the mitochondrial transcription termination factor family (mTERF) gene from Aegilops tauschii and also matched one of the three rye Lo7 contigs (Lo7_v2_contig_378957, Lo7_v2_contig_1373077, Lo7_v2_contig_149174) (Bauer et al. 2017). The contigs completely matched the mTERF15 gene sequence (E value = 0.0).

Seven markers (7468019, 3358169, 3590786, 3746061, 5037479, 3348274, 4096992) exhibited similarity to the DNA sequence keratin-associated proteins (KAP) 5–4-like and 5–5-like from Aegilops tauschii. Marker similarities to rye contigs were 73–100%. Only Lo7_v2_contig_419182 exhibited significant (E value = 2E-103) similarity to the 411 bp sequence of keratin-associated 5–5-like protein.

The markers that exhibited similarity to mTERF15 and KAP5–5 were mapped in the vicinity of QRft-4R (Suppl. Tab. S1). These markers also showed high association values with fertility restoration, equal to 0.583 and 0.425 for markers with sequence similarities to mTERFs and KAPs, respectively.

Additional annotated genes corresponding to marker sequences flanking both QRft-4R and QRft-5R are listed in Supplementary Table S1. Among these, the highest E values for sequence similarities were obtained for 3599685 (4R), 5212120_43: A > G (5R), 3342019 (5R), and 4092866 (5R) with probable methyltransferase PMT17 (3.00E-25), probable aldo-keto reductase 2 (1.00E-24), putative disease resistance RPP13-like protein 3 (1.00E-24), and serine/threonine-protein phosphatase 2A 57 kDa regulatory subunit B′ (8.00E-26), respectively.

Marker conversion assays

Of the 35 markers initially selected for conversion, 27 marker sequences were extended based on marker homology to DNA sequences deposited in DNA databases. The extended sequences allowed the development of ten PCR-based markers with polymorphic signals when tested on RILs carrying sterile and fertile QTLs. Of these, five markers localized in the QRft-4R region (3602675c, 3575914c, 4099883c, 3358169c, 5500712c) exhibited identical segregation patterns as their unconverted counterparts (Suppl. File S1). Minor differences in marker segregation reflected missing data in the case of some RILs genotyped with DArTseq markers. Marker 3602675 and its redundant counterparts (3575914, 4099883) mapped at 156.51 cm on the 4R chromosome. These markers are 0.49 cm from the QRft-4R LOD maximum value. Markers 3358169 and 5500712 revealed sequence similarities with (KAP) 5–4-like and Rfm1, respectively. Marker 3358169 approximated to the 4R map at the 152.72 cm position, and marker 5500712 was at 155.25 cm. Of the 23 fertile RILs of the S64/04/01 mapping population, the markers were present in 19 lines. None of the markers were identified in any of the sterile plants. For the three partly fertile RILs, two segregated as fertile and one as sterile RILs.

Discussion

Phenotypic assessment of individuals of the BC1F1 population divided progeny into two main phenotypic classes: male sterile and male fertile. Partially fertile plants were also present in a low number (6 plants: 5–6 score according to the bonitation scale) and, due to the presence of pollen, were included into the fertile class in the segregation analysis. A similar segregation of male fertile and male sterile plants was recently reported for rye populations with C and Pampa cytoplasms, where the presence of a major Rf gene on the 4R chromosome was documented (Stojałowski et al. 2011; Stracke et al. 2003). The ratio of the phenotypic classes in the studied hybrids deviated significantly from the 1:1 segregation ratio typical of the monogenic model of inheritance in a RIL population. The observed data may be explained in several ways: (1) The distortion is due to the lack of phenotypic data of the 30 missing genotypes; (2) phenotyping was performed in a single environment without repeats; and (3) the analyzed population has several QTLs conferring pollen fertility restoration traits in the rye. Although 30 of the 175 cases were not phenotyped, the dataset available from the remaining 145 lines is reasonably large. Thus, missing cases should not significantly affect the segregating ratio. Pollen fertility restoration in rye with CMS Pampa is usually only minimally affected by environmental conditions, at least for the main QTLs (Geiger et al. 1995). As the vast majority of the BC1F1 plants were sterile or fertile, with only a few partially fertile plants, further assessment of phenotype under different environments would be unlikely to affect segregation and was therefore not performed (Geiger et al. 1995). Furthermore, the trait was clearly expressed, suggesting that at least one major QTL was represented in the RIL7 population and that other QTLs were either of minor importance or were modifying genes that were previously reported in rye (Miedaner et al. 2000).

Mapping of agronomically essential traits requires genetic maps with a high density of markers (Cockram and Mackay, 2018). To date, all the mapping populations dedicated to studies of pollen fertility restoration in rye utilized F2 progeny (Miedaner et al., 2000; Stracke et al. 2003; Hackauf et al. 2012; Stojałowski et al. 2017). However, the frequency of polymorphic DArT, silicoDArT, and SNP markers in a rye F2 population (544 × Ot0–20 BC5F2) were 9.3, 8, and 4.6%, respectively, whereas frequencies in a RIL-S population (generation F5) were 19.6, 58.7, and 29.7% (Milczarski et al. 2016). Moreover, RILs better support map resolution due to recombination frequency accumulation during each generative cycle (Xu et al. 2017), and they are immortal populations (Cockram and Mackay, 2018). Thus, the employment of recombinant inbred lines is preferred over F2 populations (Cockram and Mackay, 2018). However, the development of RILs is time-consuming and can be challenging in some crops, like rye, due to inbreeding depression (Singh and Singh, 2015).

In this study, a specially designed RIL-based mapping population consisting of 175 lines on non-sterilizing cytoplasm, but carrying pollen fertility restoration QTLs that originated from contrasting parental lines, was evaluated and exploited for genetic map construction. The final map length was 1070.5 cm, which was 174 cm and 533 cm shorter than the map of rye inbred line Lo7 (Bauer et al. 2017) and the consensus map of five RIL-based mapping populations (Milczarski et al. 2011), respectively. Chromosome lengths ranged from 139.9 cm (7R) to 214.5 cm (5R) and, in case of chromosomes 1R, 2R and 7R, were similar to the lengths of those constructed for Lo7 (Bauer et al. 2017). However, on average, chromosome lengths differed by 25% for 3R, 4R, and 5R. The results presented by Milczarski et al. (2011) showed that the origin of the population strongly influenced the length of individual chromosomes, which differed by up to 220 cm (5R) when the same DArT technology was used for genotyping of five RIL populations originating from nine parental lines. The average map density of RIL S64/04/01 was 1.66 cm, within the 1.1–2.75 cm range described for DArTs in the case of other rye RIL-based populations (Bolibok-Brągoszewska et al. 2009; Milczarski et al. 2011). Somewhat higher map density (0.47 cm) was reported by Milczarski et al. (2016), who succeeded in mapping as many as 2448 silicoDArT and SNP unique loci and 928 DArT markers using 92 individuals of the RIL-S (F5) population. The difference in map density results from the fact that the map density of the RIL7 map was estimated based on highly “stable” (minimum missing and best segregation ratio) skeleton markers (without redundant and added markers). To our knowledge, the RIL7 based genetic map presented here is the first to be dedicated to studies of pollen fertility restoration in rye with CMS Pampa.

As the analyzed fertility trait failed to have a normal distribution, nonparametric Kruskal-Wallis (K-W) analysis was employed for the detection of QTLs (Myśków et al. 2014; Stojałowski et al. 2017). Four genomic regions that mapped to the 4R, 3R, 5R, and 7R chromosomes were detected. Composite interval mapping confirmed the presence of a single highly significant QTL on the long arm of the 4R chromosome and a minor QTL on the 5R chromosome. The major QTL on 4R explained 60% of the variance of fertility restoration, comparable to IRAN IX (68%) and Pico Gentario (59%) based materials (Miedaner et al. 2000).

It is rare for European breeding materials to carry such a strong QTL on chromosome 4R (Miedaner et al. 2000). The QTL was probably introduced to a pollen donor (SO37R/05) of the RIL S64/04/01 population from Iranian or Argentinian sources. The identified region is congruent with earlier reports evaluating Iranian primitive rye populations IRAN IX and Altevogt 14,160, Argentinian landrace Pico Gentario, and European line L18 (Miedaner et al. 2000; Hackauf et al. 2017). Interestingly, the major QTL location is also congruent with studies on pollen fertility restorers in the case of CMS C (Stojałowski et al. 2005) and G (Börner et al. 1998) in rye. However, it is not clear whether the same gene is responsible for pollen fertility restoration in all types of sterilizing cytoplasms.

A second QTL of minor importance was identified on chromosome 5R and explained 5.5% of the variance, and this could justify the lack of monogenic segregation of the trait as indicated by phenotypic data. Similar results concerning a QTL on the 5R chromosome were described previously (Miedaner et al. 2000), where a minor locus explained 11% of the phenotypic variation of fertility restoration in the L18 line. Unfortunately, the two QTLs on chromosome 5R cannot be easily compared because different marker systems were used in the two studies.

The association mapping analysis used to identify markers associated with the trait but not necessarily present on the map was congruent with QTL analysis in the case of the major QTL only. In total, four markers tightly linked to the QTL and 176 markers associated with pollen fertility restoration were identified on the long arm of the 4R chromosome.

A comparison of the marker DNA sequences against sequences stored in various online databases at NCBI (Suppl. Tab. S1) was performed for the identification of their functional annotations. Five markers mapped to the 4R QTL and/or associated with fertility restoration exhibited similarity to the Rfm1 gene sequence mapped to the chromosome 6H in barley (Matsui et al. 2001; Murakami et al. 2005; Rizzolatti et al. 2017). Moreover, the 69 nucleotide-long markers nearly perfectly (99–100% identity) matched the rye Lo7 contigs (Bauer et al. 2017), which exhibited high similarities to the Rfm1 gene sequence. Synteny-based studies showed that the 6HS chromosome distal region (Martis et al. 2013) carrying the restorer Rfm1 gene (Matsui et al. 2001; Murakami et al. 2005) was homologous to the rye 4RL where the Rfp1 and Rfp3 genes were mapped (Hackauf et al. 2012 and 2017). Analysis of homology between these chromosomal regions and 3S in Brachypodium, 4S in sorghum, and 2S in rice revealed that collinearity was maintained among these grass species (Hackauf et al. 2012; Ui et al. 2015). Thus, the rye analog of the Rfm1 gene is a reasonable candidate for the pollen fertility restoration gene in rye with CMS Pampa. Nevertheless, due to the perfect collinearity observed at the genetic map level between the Rfm1 locus in barley and Brachypodium (Ui et al. 2015), and a small number of rearrangements between the Rfp3 genomic region in rye and Brachypodium, Hackauf et al. (2017) concluded that Rfp3 and Rfm1 might represent independent fertility restorer genes. Thus, it is likely that the markers identified in RIL S64/04/01 associate with the Rfp1 or Rfp2 gene sequences. The nucleotide sequences of barley Rfm1, rye Rfp1 and Rfp2, and a segment of Bd3 in Brachypodium that was mapped proximal to Rfp3 (Hackauf et al. 2017) indicate that the locus carries a tandem repeat of a gene encoding a PLS-DYW-class pentatricopeptide repeat (PPR) protein. A major function of PLS PPR proteins possessing C-terminal domains (E or DYW) is C-to-U RNA editing in plant organelles (Hammani and Giegeè 2014; Small et al. 2019), suggesting a potential role for RNA editing in pollen fertility restoration in rye.

Blasting marker sequences (3351619, 3357230, 3358064, 3885888) against DNA databases indicated that a mitochondrial transcription termination factor family (mTERF) gene identified in a rye segment carrying Rfp1 and Rfp3 might also participate in pollen fertility restoration in rye (Hackauf et al. 2012, Hackauf et al. 2017). Recently, a novel restorer locus, Rfm3, was found to be closely linked to mTERF in barley (Bernhard et al. 2019). The CMS unstable mother plants, which were homozygous at the Rfm3 locus, had significantly higher grain setting under elevated temperature until ripening. The results are comparable to those in maize (Zhao et al. 2014) and suggest that mTERF genes are up- and down-regulated depending on their environmental conditions. Thus, in barley, Rfm3 may be responsible for undesired fertility restoration in CMS mother lines in the absence of the functional Rfm1 restorer gene (Bernhard et al. 2019). The putative roles of mTERF proteins in the context of fertility restoration in rye have not yet been determined.

Further studies of marker sequence similarities showed that seven markers identified in the study indicated the role of a third gene (keratin-associated protein (KAP) 5–4-like and 5–5-like) which belongs to the KAP type 5 family (Jenkins and Powell 1994). KAP and homologous KAP gene functions in plants are poorly elucidated. Zhou et al. (2009) showed that qPE9–1, a putative homologous gene of KAP 5–4 in humans, regulated rice panicle erectness and played pleiotropic roles in an array of plant architecture and yield traits. The functional resemblance of the protein encoded by the KAP5–4 gene to the wali6 protein may suggest an involvement to drought resistance (Yang et al. 2012). The presence of the linked markers within the QRft-4R region and strong marker associations with the trait may suggest that keratin could be essential for pollen fertility restoration. However, the similarity of the DArTseq sequence markers and KAPs DNA sequences might be due to a common domain in the structure of the RF2 and keratin proteins. The KAP5 family shows extensive amino acid sequence conservation, and all the proteins are composed almost entirely of cysteine-rich and glycine-rich repeats (Jenkins and Powell 1994). Map-based cloning demonstrated that Rf2 in rice encodes a protein comprising a glycine-rich region (GRP) (Itabashi et al. 2011) that is probably responsible for direct interaction with the CMS-causing protein or which may cooperate with other proteins via the glycine-rich region to form a multi-molecular complex participating in fertility restoration (Itabashi et al. 2011). A further study in rice (Hu et al. 2012) showed that the Rf5 gene encodes a PPR protein that interacts with a glycine-rich domain protein GRP162 to bind to atp6-orfH79 and build restoration of the RFC fertility complex in Hong-Lian CMS lines. Thus, it is possible that the QRft-4R region detected in rye contains several relevant genes, including PPR, mTERF, and GRP proteins.

The identified markers that were linked to or associated with fertility restoration in rye and exhibited similarities to putative pollen fertility genes were converted to sequence-specific PCR conditions to facilitate their use in marker-assisted programs. Conversion efficiency can depend on the type of maker. For example, conversion of RAPD markers is relatively inefficient due to the lack of sequence uniformity of bands forming a marker and the involvement of many practical steps including cloning and sequencing (Mikolajczyk et al., 2008). An added complexity is that not all primers designed for amplification are capable of amplifying expected polymorphisms (Xie et al. 2008, Lee et al. 2010). This is somewhat alleviated when marker sequences derived via NGS are available (Macko-Podgórni et al. 2014; Fiust et al. 2015; Niedziela et al., 2015). However, as only relatively short sequences are generated, their direct conversion (i.e., into sequence-specific ligation amplification markers) is not practical (Milczarski et al., 2016). Analysis of DArTseq/silicoDArT marker sequence similarities allows sequences to be extended, and these longer sequences can be utilized for the development of PCR-based markers for MAS purposes. The efficiency of such conversion can reach 100% and usually 50–60% of these are polymorphic (Niedziela et al., 2015; Fiust et al., 2015). In this study ten of 27 markers were successfully converted. However, only five markers (3602675, 3575914, 4099883, 3358169, 5500712) present within the QRfp-4R region resulted in amplifications that followed expected segregation based on 48 RILs chosen from the S64/04/01 mapping population. One of the tested markers (5500712) revealed sequence similarities to Rfm1, which was identified previously in Hordeum vulgare (Matsui 2001). Although the markers were located in different positions within the QRft-4R region (3602675, 3575914, 4099883: 156.51 cm; 3,358,169: 152.72 cm; 5,500,712: 155.25 cm), their segregation patterns were identical. For MAS purposes, the markers will be tested on a differentiated pool of rye genotypes.

In this study, a QTL located on the 4R chromosome was confirmed as responsible for efficient fertility restoration in rye with CMS Pampa cytoplasm. A set of silicoDArT and SNP markers linked with the QRfp-4R region was identified for the first time. The presence of Rfp and mTERF genes within QRfp-4R was proved based on the sequence homology approach. Five novel markers with practical utility were obtained by conversion of silicoDArTs to single-marker assay formats. Moreover, a QTL with minor effects on fertility was identified on chromosome 5R.