Introduction

Cultivated rye (Secale cereale) is a cross-pollinated, diploid plant species known for its high tolerance to low winter temperatures and better withstanding adverse soil conditions than other cereals. Its cultivation area is focused mainly in the north-eastern and central parts of Europe, Poland, Germany, and the Russian Federation, which deliver over 58% of world grain annually.

The rye breeding programs focus on improving the high seed yield achieved via hybrid breeding based on heterosis. The exploitation of the heterosis effect results in a 20–25% higher grain yield than population breeding using the same agriculture (Geiger and Miedaner 1996; Hansen et al. 2004; Laidig et al. 2017). Hybrids' success depends on selecting inbred parental lines with the superior combining ability and exploiting the cytoplasmic male sterility (CMS) phenomenon. Cytoplasmic pollen sterility relies on the mitochondrial genome dysfunction that prevents the maturation of the male sex organs (stamens), resulting in defective pollen or its lack. Therefore, the mating system in rye hybrid breeding requires three components, a maternal line carrying the CMS allele, a non-restorer germplasm for maintaining the CMS and a paternal line carrying restorer-of-fertility (Rf) nuclear genes which are able to reverse CMS effect.

Distinct cytoplasms have been described in the rye and classified into two groups based on their restoration ability. The easily restored sterilizing cytoplasms are represented by Vavilovii (CMS-V), CMS-R (Kobyljanskij 1969), CMS-C (Łapiński 1972), and CMS-G (Adolf and Winkel 1985), and others, whereas Pampa (CMS-P) cytoplasm is the only representative of a hardly restored source. Among soft CMS only G source was used for the production of hybrid cultivars. The example here is the “Novus” cultivar registered in Germany in 2000 (Melz and Adolf 1991; Melz et al. 2001). Two other cultivars based on the G cytoplasm, “Hellvus” and “Helltop” are no longer exploited. It seems that one of the potentially relevant sources of CMS in rye could be the C cytoplasm (CMS-C) discovered by Łapiński (1972) in the old Polish cultivar “Smolickie.”

The CMS-P, characterized by a robust sterilizing effect, is widely explored in hybrid breeding since the 70’s (Geiger and Schnell 1970; Geiger and Miedaner 1996). It was found that restoration of pollen fertility in rye with CMS-P depends on QTLs mapped to the 1R, 3R, 4R, 5R, and 6R chromosomes (Miedaner et al. 2000). In European rye resources, the strong QTL explaining 54% of the phenotypic variation was detected in the German inbred line L18 on chromosome 1RS (Miedaner et al. 2000). The other helpful fertility restoration QTL (Rfp2) was mapped to the long arm of chromosome 4R in the Argentinian cultivar “Pico Gentario” and Iranian primitive rye accessions called IRAN IX (Rfp1) and Altevogt 14160 (Rfp3) (Geiger and Miedaner 1996). The Rfp QTLs in the European populations are rare (1–5%) (Geiger et al. 1995) but were successfully introgressed into elite pollinator germplasm from Iranian resources.

The precise mechanism of pollen fertility restoration and the role of genes involved in this process have not been assessed until now. In most tested species, the Rf genes encode mitochondria-targeted pentatricopeptide repeat (PPR) proteins (Zhao et al. 2018; Wang et al. 2006; Akagi et al. 2004; Brown et al. 2003; Desloire et al. 2003; Klein et al. 2005; Fujii et al. 2016; Melonek et al. 2019; Goryunov et al. 2019). They play a pivotal role in organelle RNA editing, RNA termination, splicing, and translation (Hammani and Giegé 2014). Most of the Rf encoding PPR proteins (Rf-PPR) belong to the P class repressing ORFs. The latter causes mitochondrial dysfunction and pollen abortion in flowering plants (Dahan and Mireau 2013; Kubo et al. 2020). Further, sorghum Rf1 (Klein et al. 2005) and barley Rfm1 (Rizzolatti et al. 2017, Melonek et al. 2019) are classified into the DYW subclass in the PLS subfamily (Lurin et al., 2004). These types of PPR have been associated with C-to-U RNA editing. They thus could prevent the accumulation of the sterility factor by either creating a stop codon within the CMS-inducing transcript or an amino acid substitution in the putative sterility protein (Rizzolatti et al. 2017). Still, to date, such a mechanism has not been supported by experimental data. The other candidate gene implicated in fertility restoration belongs to the mitochondrial transcription termination factor family (mTERF). The mTERF was closely linked to Rfp1 and Rfp3 in the rye (Hackauf et al. 2012, 2017) and Rfm3 in barley (Bernhard et al. 2019); however, its function in the context of fertility restoration has yet not been reported.

A high linkage drag of undesirable genes with the 4R QTL was observed in hybrids (Miedaner et al. 2000, 2017). Two markers, namely Xtc256739 and Xtc300731, flanking the Rfp1 gene at distances of 0.3 cM and 0.4 cM were evaluated (Hackauf et al. 2012). However, such a linkage was insufficient to eliminate the drag effect, and newly developed, more linked markers are not readily available. To identify such markers, either the F2 mapping populations based on CMS Pampa cytoplasm are required, or recombinant inbred line (RIL) mapping population is needed. Vast mapping populations are needed to achieve a high resolution of genetic maps based on the F2 progeny. Their genotyping is expensive if dense maps are needed as large populations need to be genotyped. The other option relies on advanced recombinant inbred lines. However, their evaluation may be problematic due to the inbreeding effect in the rye. Nevertheless, the employment of RILs, if available, should decrease genotyping and phenotyping costs.

The study aims to identify pollen fertility restoration QTLs in rye hybrids with the CMS Pampa cytoplasm. Furthermore, we were interested in identifying genes responsible for pollen fertility restoration. Finally, we converted markers for marker-assisted selection purposes.

Material and methods

Plant materials and phenotypic evaluation

The rye mapping population S60/08 encompassing 94 recombinant inbred lines (RIL F8 (S60/08): S 305 N/00 × SO 2R/05) was derived in Choryń by crossing the S305N/00 (maintainer) plant on non-sterilizing cytoplasm and male parent SO2R/05 with CMS Pampa (restorer line). The single-seed descents method allowed the development of the RIL mapping population up to the F8 generation. The test crosses between the sterile female source of CMS Pampa (S305P/00) and a given recombinant inbred line allow obtaining BC1F1 (S305P/00 × [RIL F8 (S60/08): S 305 N/00 × SO 2R/05]) materials. From 10 to 12 plants for each of the 94 cross combinations were grown during the vegetation season 2018/2019 in the field conditions. The average value of fertility/sterility based on ten plants of BC1F1 was assessed according to the bonitation scale proposed by Geiger and Morgenstern (1975). Extremely male-sterile plants were scored as 1, whereas plants with anthers that intensively released pollen were scored as 9. Partly fertile plants exhibited values in the 4–6 range.

The pollen fertility restoration trait normal distribution was tested using the Kolmogorov–Smirnov test implemented in XlStat software (XlStat 2019). The χ2 goodness-of-fit test was conducted to determine whether the data “fit” the expected 1:1 distribution using MapQTL 5 (Van Ooijen 2004).

DNA extraction

Genomic DNA was extracted from fresh leaf tissue of parental plants, and a single individual represented each of the 94 RILs (F8) using a DNeasy Plant Mini Kit 250 according to the manufacturer’s instructions. DNA concentration and purity were checked using the PicoDrop spectrophotometer. The integrity was assessed by electrophoresis, applying about 100 ng DNA on 1% agarose gels stained with EtBr (0.5 µg/ml) in TBE buffer.

Genotyping

DNA samples were sent to Diversity Arrays Technology (Pty) Ltd. in Canberra, Australia for genotyping using DArTseq™ technology, which is a combination of the DArT complexity reduction method (Jaccoud et al. 2001) and Next-Generation Sequencing (NGS) platform. The DArTseq protocol has been described in detail by Kilian et al. (2012) and Melville et al. (2017). It encompasses the following steps: (1) Digestion of DNA samples using the restriction enzymes (RE) combination most suitable for genome complexity reduction—at least one of the enzymes is methylation-sensitive, directing the analysis to the hypomethylated, gene-rich genome regions; (2) Ligation of specialized adaptors to the digested DNA; (3) Amplification of adaptor-ligated fragments by adapter-mediated polymerase chain reaction (PCR); (4) Sequencing of the amplification product for each sample on an Illumina HiSeq2500 (Illumina Inc., San Diego, USA).

The results consist of two groups of detected markers based on short DNA sequences. The first group is called SilicoDArT and represents markers classified as dominant and scored for the presence or absence of a single allele. Such markers arise due to the presence or absence of restriction sites. The second group encompasses codominant single nucleotide polymorphisms (SNPs) markers carrying a single nucleotide polymorphism identified within marker sequence. As our materials were highly homozygous, both SNP and SilicoDArT markers were coded as dominant markers in a 0–1 binary form.

Linkage map construction

The genetic map was constructed using MultiPoint Ultra-Dense software (Ronin et al. 2015). Markers exhibiting > 15% missing data were excluded. All SNP and silicoDArT loci that showed no or minimal deviation from the expected 1:1 segregation ratio (χ2 ≤ 19.2) were employed in the analysis.

Genetic map construction consisted of the following steps: (1) Markers with zero distance were grouped, and a “delegate” was selected from each group. Only markers with at least the same number of twins as the predefined threshold were selected as delegates and were defined as "skeleton." Markers exhibiting identical segregation patterns as the delegate/skeleton markers were assumed redundant; (2) All remaining markers, except for candidate twins, were removed to the Heap; (3) Most representative skeletons markers and their redundant counterparts were clustered, and the resultant linkage groups (LGs) were ordered; (4) Gaps were filled, and LG ends were extended using markers from the Heap (Heap contains markers due to, i.e., segregation problems or missings primarily removed from mapping procedure). Such markers were referred to as approximated to the map; (5) Markers violating map stability and monotonic growth of distance from a marker and its subsequent neighbors were removed.

Quantitative trait loci (QTL) analysis

Single marker analysis (SMA) using the Kruskal–Wallis rank-sum test (P < 0.005) in MapQTL version 5.0 (Van Ooijen 2004) and the composite interval mapping (CIM) procedure in Windows QTLCartographer software, version 2.5 (Wang et al. 2007) was performed to identify QTLs responsible for fertility restoration in the S60/08 population of rye. A backward regression method with a window size of 5 cM and a walking speed of 1 cM, with control markers equal to five, was used for CIM. The logarithm of the odds (LOD) thresholds for significance was obtained by MapQTL’s permutation test option (1000 permutations). The percentage of phenotypic variation explained by each QTL was estimated. The QTL mapping data are provided in Supplementary File S1.

Verification of the marker’s order for linkage groups with detected QTLs was performed using a high-density genetic map based on rye Lo7xLo225 mapping population and anchoring Lo7 WGS contigs (Bauer et al. 2017). For this purpose, a homology between marker sequences localized on the individual linkage groups of S60/08 and sequences of contigs localized on the individual rye chromosomes of the Lo7xLo225 genetic map was searched. Pearson’s correlation in Statistica ver. 13.3 was used to verify the extent to which the marker’s order of the two maps was comparable (TIBCO Software Inc. 2017).

Association mapping

Principal Coordinates Analysis (PCoA) was performed in PAST software to assess the genetic structure of the RILs forming a mapping population. Associations of SNP and silicoDArT markers with pollen fertility restoration were determined using a General Linear Model (GLM) in TASSEL version 3.0 (Bradbury et al. 2007). Significant associations were indicated by the Bonferroni test with p < 0.01 (0.01/number of markers). The degree of association was illustrated by the determination coefficient (R2). Marker’s position on the S60/08 genetic map and the high-density genetic map constructed by Bauer et al. (2017) were determined.

Bioinformatic analysis of the marker sequence homology

Sequence similarity search algorithm BLASTn available for use online at the National Center for Biotechnology Information (NCBI) website (https://blast.ncbi.nlm.nih.gov) were used to generate alignments between nucleotide sequences of 1098 SNPs and silicoDArTs linked to and/or associated with the fertility restoration and nucleotide sequences within a database. The level of alignments between query and subject sequences was determined by the percentage of the query sequence that covered a sequence in Genbank (QC%), percentage of identity over the length of the coverage area (I%), and probability value (E-value). The taxonomic category selected during searches was the Poaceae family.

Marker conversion to PCR-based assays

Markers linked to or/and associated with fertility restoration were converted into PCR-based assay. Initially, due to very short sequences (69-bp), SNPs and silicoDArTs were blasted with rye Lo7 WGS contigs (Bauer et al. 2017; https://webblast.ipk-gatersleben.de/ryeselect/). Lo7 WGS contigs which show 100% sequence homology with the marker sequence, were analyzed in Primer3web software version 4.1.0 (http://bioinfo.ut.ee/primer3/) to identify primer pairs for marker amplification. The primer design’s main criteria were as follows: primer size 18–22 bp, GC content 40–60%, no or negligible secondary structures, and product size ≥ 400 bp. The optimal annealing temperature was inferred using a gradient PCR with temperatures set between 51.0 and 65.0°C (Labcycler Gradient, SensoQuest GmbH). DNA of sterile (S305P/00) and fertile (SO2R/05) parental plants were used to test marker polymorphism. Reaction mixtures had a final volume of 10 μl consisting of 10 ng of total genomic DNA, 25 μM each of PCR primers, 2.5 mM dNTPs, 2.5 mM MgCl2, 1 X reaction buffer, and 0.25 U of Gold HotStart DNA Polymerase (Syngen Biotech Ltd.). The following reaction profile was applied: [95 °C–15′] [95 °C-30″; X°C–45″; 72 °C–45″] × 35 [72 °C–10′] [5 °C ∞], when “X” is the temperature selected based on PCR reaction in gradient profile. Amplification products were separated by electrophoresis at 5 V/cm for 1.5 h in TBE buffer in 1.2% agarose gels containing 0.5 µg/ml of ethidium bromide. PCR-based markers’ segregation profile was tested using DNA of 20 sterile and 20 fertile lines randomly chosen from a set of the S60/08 mapping population. Converted PCR-based markers have extended names with “c” in the end (i.e., 3744672 vs. 3744672c).

Results

Phenotyping

The BC1F1: S305P/00 × [RIL F8 (S60/08): S 305 N/00 × SO 2R/05] crosses were characterized by either sterile, partly fertile, or fully fertile values of the trait. According to the established criteria, 58 sterile, 3 partly fertile, and 30 fertile RILs were identified (Supplementary File S1). Phenotypes were not obtained for 3 out of the 94 cases. Statistical analysis showed that the trait does not follow normal distribution (W = 0.780, p value < 0.0001, α = 0.05).

Chi-square adjustment tests revealed that the population deviated significantly from the expected 1:1 sterile-to-fertile segregation ratio (χ2 = 4090, p < 0.01 at α < 0.05) when contrasting sterile (lines with 1–3 bonitation) and fertile (lines with 4–9 bonitation) phenotypic classes were analyzed.

Genotyping and genetic map construction

The segregation patterns for 47960 SNP and 165163 silicoDArT markers were tested using DArT-Seq™ technology to genotyping the S60/08 mapping population. After removing markers with monomorphic signals and exhibiting >75% missing data, about 16,100 SNP and 50 200 silicoDArT markers were employed for MultiPoint Ultra-Dense and Tassel software.

The genetic map of RIL F8: [(S60/08): S 305 N/00 × SO 2R/05] consists of seven linkage groups (LG) defined by 35,168 markers (272 skeletons, 2731 redundant, 6837 approximated, and 25,328 redundant to approximated markers) and covers 880.3 cM (Table 1, Figure S1). The most extended LG was constructed for chromosome 2R and spanned over 183.9 cM. The shortest LGs were for the 3R (65.9 cM) and 7R (69.1 cM) ones, with 23 and 36 markers, respectively. On average, one skeleton marker per 3.2 cM was located on the map. The average saturation range from one marker per 1.91 cM (LG-7R) to 4.48 cM (LG-2R).

Table 1 Arrangement of mapping data evaluated for the S60/08 mapping population

Pollen fertility restoration QTLs

The Kruskal–Wallis (K-W) analysis in the RIL F8: [(S60/08): S 305 N/00 × SO 2R/05] population revealed two, one, thirteen, one, and three significant markers (p ≤ 0.01) mapped to chromosomes 1R, 3R, 4R, 6R, and 7R, respectively. The maximum K-W statistic values for markers mapped to the 1R, 3R, 6R, and 7R chromosomes ranged from 7.4 to 10.6. The most significant markers (p ≤ 0.005) fall in the interval 112.9 cM-136.82 cM (Table 2) of the 4R chromosome. The region corresponds to the 4R QTL responsible for fertility restoration. The association values (Kmax2) for those markers were ≥ 9.28, with the highest value (Kmax2 = 29.414) for marker 7464414 at the position of 134.85 cM. Moreover markers 7468190 (136.16 cM), 5505157 (136.82 cM), 3744723 (128.85 cM) and 5044143 (130.87 cM) exceeded the Kmax2 value of twenty.

Table 2 The association between markers and fertility restoration trait in rye with CMS Pampa based on the Kruskal–Wallis test for markers with significance values (p) equal or less than 0.01

Composite interval mapping (CIM) indicated the presence of a QTL conferring fertility restoration in rye with CMS Pampa in the range of 128.85–136.8 cM. A highly significant QTL (QRfp-4R) with the LOD score of 10.15 (p = 1000; LOD = 3.1) mapped to the distal part of the long arm of the 4R chromosome (Fig. 1, Table 3). The QTL exhibited additive effects (A = 2.14) and explained 42.3% of the phenotypic variance for pollen fertility restoration. The silicoDArT markers 3588233 and 3577181 surrounded the QTL from both sides at a distance of 0.84 cM and 1.13 cM apart from the LOD maximum, respectively (Table 3). Besides, the QTL region was saturated with 25 redundant and 878 approximated markers.

Fig. 1
figure 1

Composite interval mapping (CIM) demonstrating the position (A) of the QRft-4R identified on the 4R chromosome (B) based on the RIL S60/08 mapping population. The main effect detected on chromosome 4R is an additive effect arising in restorer line SO2R/05 (C)

Table 3 Composite interval mapping (CIM) outcomes arrangement

A distance of 1.97 cM separates the markers 3588233 and 3577181 (and their redundant and approximated markers) mapped within the QRfp-4R. The marker sequences were identified on a high-density genetic map of rye Lo7xLo225 (Bauer et al. 2017) based on markers-contigs sequence homology. The region ranging from 132.88 to 134.85 cM on the RIL8 map corresponds to the 183.52–188.71 cM region on the Lo7xLo225 map as indicated by 56 approximated and two redundant markers. The markers 3341749 and 100060499, redundant to 3577181 highly linked with QRfp-4R, were identified at 183.52 cM and 184.31 cM of the Lo7xLo225 map. The two maps of the 4R chromosome were strongly correlated 0.994 (p < 0.01).

Association mapping

Principal Coordinate Analysis (PCoA) performed for all RIL F8: [(S60/08): S 305 N/00 × SO 2R/05] lines encompassing mapping population failed to detect significant data structuring (not shown). The GLM allowed identifying 241 SNP and silicoDArT markers associated with the pollen fertility trait at α ≤ 0.01. Most of those markers had redundant counterparts, increasing the total number of associated markers to 1098. The association coefficients (R2) of the markers that passed the Bonferroni test’s cut-off value (p = 4.91E − 06) ranged from 0.59 to 0.21 (Table 4). The R2 association value of the silicoDArTs closely linked to the QRfp-4R reached 0.4 and 0.29 for 3588233 and 3577181, respectively. The markers approximated to these two skeleton markers' position were also associated with the fertility restoration trait and reached R2 values from 0.22 to 0.59 (Table S1).

Table 4 SNP and silicoDArT markers with the highest association values for the male-fertility restoration in RIL S60/08 mapping population of rye with CMS Pampa

Blasting marker sequences associated with pollen fertility trait in the RIL8 against the Lo7 WGS contigs (Bauer et al. 2017) showed 65 out of 89 markers localized between 179.5 cM and 193.91 cM. Twenty-three markers with the highest association values (≤ E − 10) with the trait were within the range of 183.52–188.71 cM on the 4R chromosome of the Lo7xLo225 map.

Identifying marker sequence homology with known functional genes

DNA sequences of the markers linked to and/or associated with the pollen fertility restoration QTL assigned on the 4R chromosome were blasted against DNA sequence databases. Of these, 34% of the analyzed sequences showed sequence similarity to genomic sequences from Triticum aestivum, Triticum dicoccoides, Hordeum vulgare subsp. vulgare, and Aegilops tauschii subsp. tauschii (Table S1).

Four of the markers (5221039, 3744672, 5036421, 3749937) associated with the fertility restoration and identified in the QRfp-4R region showed sequences homology to the Rf1 protein-coding sequence from Triticum dicoccoides (Zhu et al. 2019). The markers were assigned to the 4R linkage group at 132.8 cM of the S60/08 and 187.9 cM of the Lo7xLo225 reference map (Bauer et al. 2017). Another twenty-six marker sequences (5500712, 3741145, 5227369_10:C > G, 3591947, 3362765, 3344372, 5037650, 3730066, 100062369, 5227369, 3738490, 5227369_7:G > A, 3585416, 5493400, 100089652, 3895196, 3577569_6:C > G, 3896198_10:G > T, 100104448_33:A > C, 3599981_67:C > G, 5504322, 3349918, 3887543_35:C > A, 12699528, 5227367_28:G > A, 3587492) were homologous to the sequence of Rfm1 restorer locus identified in Hordeum vulgare that carries two, three and one regions encoding pentatricopeptide (PPR) repeat-containing protein, putative beta-fructofuranosidase, and putative zinc finger with peptidase domain protein, respectively (Rizzolatti et al. 2017; GenBank: MF443757.1). Six (3349917, 5227368GA7, 5227368GA28, 5227368CG10, 5493409, 5504321) and two marker sequences (3362765, 3738490) were homologous to putative beta-fructofuranosidase and putative zinc finger with peptidase, respectively. The remaining markers were outside of coding regions, and any of the marker sequences matched the region coding PPR protein. Additionally, fifteen markers were mapped in the QRfp-4R region of the S60/08 genetic map and eleven in the Lo7xLo225 genetic map (183.51 cM and 185.52 cM).

Nineteen of the marker sequences (3357230, 3601082, 3885888, 7463614, 4489713_14:A > C, 3593839, 3351619, 3580257, 3358064, 3730381, 100089972, 3884378, 3358394, 5487641, 5139216, 3592314_41:C > G, 3576186_29:G > A, 100005251, 3576186) were homeologous to the sequences of transcription termination factor family (mTERF) genes from Triticum dicoccoides. Seven of them (3358394, 5487641, 5139216, 3592314_41:C > G, 3576186_29:G > A, 100005251, 3576186) were homologous to the MTERF8 gene (E-value from 1e − 13 to 1e − 27; Query cover 94–100%) and mapped at 180.33 cM position of the reference map, corresponding to the position of Lo7_v2_contig_62134. The markers covered the 23546 bp-long sequence of Lo7_v2_contig_62134 between 15415 bp and 15894 bp. Another seven marker sequences homologous to MTERF15 were identical or highly similar (E-value from 8e − 16 to 3e − 27; query cover 99–100%) to the nucleotide sequences of Lo7_v2_contig_4941 (3885888, 3593837, 3601082), Lo7_v2_contig_2489 (7463614, 3351619) and Lo7_v2_contig_4048 (3580257, 3357230) localized at the positions of 185.93 cM, 185.93 cM, and 186.72 cM of Lo7xLo225 genetic map, respectively.

Eight markers with one of the highest association values (3348274, 3358169, 3746061, 4096992, 3590786, 5037479, 7468019, 3586369_11:C > G) exhibited sequence similarity to the DNA sequence of the keratin-associated protein (KAP) 5–4-like and 5–5-like genes from Aegilops tauschii.

Conversion of SNP and silicoDArT markers to PCR-based assays

There were 38 markers linked to QRfp-4R and/or associated with fertility restoration selected for conversion to specific PCR conditions. The markers were segregated as their unconverted counterparts in ten cases and amplified fragments of expected sizes in the range of 363–632 bp (Table 5). The remaining cases resulted in segregation distinct from their SNP or silicoDArT counterparts (5 markers) or were monomorphic. The efficiency of marker conversion equaled 26.3%.

Table 5 Primer sequences and PCR conditions for markers linked to or/and associated with fertility restoration in rye with CMS Pampa

Four of the successfully converted markers show sequence homology with the Rf1 protein-coding sequence (3744672c) and the sequence of Rfm1 gene locus (5500712c, 3599981c, 3362765c) responsible for fertility restoration in Triticum dicoccoides and Hordeum vulgare, respectively. The 5500712c, 3599981c localized in the non-coding region of Rfm1, while primers based on 3362765 marker sequence amplified 417 bp fragment of the putative zinc finger with peptidase domain protein. One primer pair (Table 5) based on 3593839c marker sequence amplified the 531 bp region of the MTERF15 gene sequence. The amplified region of MTERF15 encompasses sequences complementary to four marker sequences (3593839, 3601082, 4489713_14:A > C, 3351619). Detailed results are presented in Table 5.

Discussion

The phenotype analysis of the cross S305P/00 x [RIL F8 (S60/08): S 305 N/00 × SO 2R/05] classified most BC1F1 crosses into two contrasting classes: entirely male sterile and fully male fertile. Most of the RILs exhibited sterile, whereas fully fertile plants were represented by 33% of the RILs. Interestingly, only three lines were classified as partially fertile. It was also noted that the trait failed to follow a normal distribution. Furthermore, the segregation ratio deviated from the monogenic mode of segregation, indicating that more than a single gene may participate in pollen restoration.

On the other hand, the presence of sterile and fertile RILs in the population and practically lack of partial phenotypes may suggest that a single QTL may act as a dominant one in the presence of some weak QTLs conferring sterility. Although phenotyping was conducted in a single environment, we did not expect significant errors related to the incorrect assessment of plants' fertility/sterility. It was documented that pollen fertility restoration in rye with CMS Pampa determined by QTL with significant effects on the trait’s phenotypic variation is less affected by environmental conditions (Geiger et al. 1995). The high correlation of the phenotyping results in two environments was obtained by Stracke et al. (2003) (r = 0.94 and r = 0.92 for mapping populations included 651 and 498 individuals, respectively) and Miedaner et al. (2000) (r ≥ 0.9 for three F2 populations of 131, 134, and 100 individuals). Genetic map construction is necessary to verify the number of QTLs responsible for pollen fertility restoration in our RIL8 mapping population.

We have successfully applied SNP and silicoDArT markers for genotyping RIL-based mapping populations of rye (Niedziela et al. 2021) and triticale (Wasiak et al. 2021). The NGS markers proved to efficient in identifying many markers that were more or less evenly distributed along species chromosomes saturated maps with 1.66 (rye) and 1.70 (triticale) markers per every one cM (Niedziela et al. 2021; Wasiak et al. 2021). Preliminary, the S60/08 mapping population encompassed 130 lines, but due to the inbreeding depression, a phenomenon typical for allogamous species such as rye (Husband and Schemske 1996), 30 lines of mapping population was lost. As a result, map saturation was lower than in our previous study, whereas many as 175 RILs were used (Niedziela et al. 2021). Still, all linkage groups' marker saturation was sufficient for quantitative analysis, possibly due to many recombinations typical for such populations (Xu et al. 2017). However, some gaps ranging from 6.7 cM and 25.1 cM were not eliminated, but their sizes in most cases were limited. The gaps were present close to centromeric regions, which is quite typical in genetic mapping as these regions recombine with lower frequency (Stapley et al. 2017).

A single marker analysis (SMA) approach was applied to identify genomic regions responsible for pollen fertility restoration present in the S60/08 mapping population. The analysis indicates that the trait might be conferred by regions mapped to the 1R, 3R, 4R, 6R, and 7R chromosomes. However, the central region, represented by the prevailing number of markers conferring ca. 8 cM was present on the 4R chromosome. The result agrees with our phenotypic data suggesting the presence of a single region conferring pollen fertility and some set of other, less critical genes. The presented data are in line with results presented by the others where the same chromosomes conferring the trait were identified (Miedaner et al. 2000).

Furthermore, the biparental population has a significant genomic region on the 4R, which is not typical for the European rye population (Miedaner et al. 2000; Hackauf et al. 2017). The presented data is congruent with composite interval mapping. The QTL explaining the vast part of the trait’s phenotypic variance (42.3%), exhibiting additive effect, was identified on chromosome 4R. The CIM and SMA approach indicated the same genomic region of the chromosome. It should be stressed that for non-normally distributed traits, CIM is not an adequate approach (Broman 2003). However, in the case of pollen fertility, the approach works well and gives comparable results with SMA (Myśków et al. 2014; Stojałowski et al. 2017; Masojć et al. 2020). The markers tightly linked to QTL were either in the QTL maximum (RecL) or 0.02 cM (RecR) apart from the QTL maximum.

Furthermore, CIM (and SMA) showed that the QTL most significant region spans over ca. 2 cM. It was possible to detect such a narrow region thanks to the advanced recombination inbred line-based mapping population. Interesting, the markers covering the 4R chromosome, including those linked to the QRfp-4R, were in the same order in the case of the Lo7xLo225 population (Bauer et al. 2017), supporting congruency of the presented data. Both SMA and CIM succeeded in identifying ca. 1960 markers in the QRfp-4R region between 128.85 and 136.8 cM that could be utilized for marker conversion and then used for marker-assisted selection purposes.

The approach used for identifying markers linked to the trait of interests may result in eliminating some valuable markers due to marker segregation problems or missing data (Ronin et al. 2015). In the current study, as many as 70% of markers were omitted. The very similar results were presented by the others (Bolibok-Brągoszewska et al. 2009; Milczarski et al. 2011; Stojałowski et al. 2017). These drawbacks could be overcome by association mapping. Biparental mapping populations usually should lack any population structure (Che and Xu 2012; Stadlmeier et al. 2018), allowing for implementing the general likelihood method for identifying associated markers. The approach proved to be valuable in the S60/08 mapping population, allowing for recognizing extra 22 markers with association values close to 0.6. The vast majority of the significant markers with the highest values of association with pollen fertility were directly identified in the S60/08 as redundant or approximated markers or were present within the respective region in the case of the Lo7xLo225 reference genetic map (Bauer et al. 2017).

An essential aspect of the study was the opportunity to identify genes involved in pollen fertility restoration in rye with CMS Pampa. Twenty-six marker sequences localized within the QRfp-4R region revealed sequence homology with the sequence of Rfm1 locus responsible for the fertility restoration in barley with male sterility maternal 1 (msm1) sterile cytoplasm (Matsui et al. 2001) and carries, among others, two tandem repeat genes coding for pentatricopeptide repeat (PPR) protein (Rizzolatti et al. 2017). Previous synteny-based studies showed the homology between the barley 6HS chromosome region (Martis et al. 2013) carrying Rfm1 locus and 4RL of rye where the Rfp1 and Rfp3 genes were mapped (Hackauf et al. 2017). However, a slight shift of Rfp1 and Rfp3 to the Rfm1 locus (Hackauf et al. 2012, 2017) suggests that the CMS systems in barley and rye base on tightly linked but independent restorer genes or the shift results from local duplication of the same restorer gene (Rizzolatti et al. 2017). It was documented that Rfm1-orthologous regions of Brachypodium (Bradi3g00900) and rice (Os02g0106300) encodes PPR proteins, while the Rfp1 syntenic interval present in those two species does not harbor any PPR genes (Rizzolatti et al. 2017; Hackauf et al. 2017). The precise position of silicoDArT and SNP sequences within the Rfm1 (GenBank: MF443757.1) showed that any of them matched the region coding PPRs. Nevertheless, nine of the marker sequences (3895196; 100104448_33:A > C; 3896198_10:G > T; 5227369_7:G > A, 5227369_28:G > A, 5227369_10:C > G; 5493400; 5504322; 3349918) matched three regions coding putative beta-fructofuranosidase (proteins id.: AVY91566.1, AVY91567.1, and AVY91565.1) whereas two other marker sequences (3362765; 3738490) were homologous to the region coding putative zinc finger with peptidase domain protein (protein id. AVY91564.1).

The mTERF proteins are the second vital candidates for fertility restoration in cereals. Genome-wide association studies in a multiparental mapping population of barley revealed three markers located very close to the mTERF gene or lying directly within the mTERF gene sequence from the Rfm3 restorer locus (Bernhard et al. 2019). In wheat with male sterility induced by the cytoplasm of Triticum timopheevii, the genes encoding mTERF and PPR proteins were detected within the Rf9 locus on chromosome 6AS (Shahinnia et al. 2020). The candidate genes coding for mTERF proteins located between the flanking markers of the Rf9 in wheat was highly expressed in the spikes and grains. Six additional, tandemly duplicated genes predicted to encode for mTERF proteins are located on rice chromosome 6 within a region orthologous to the rye Rfp1 (Hackauf et al. 2012). Nineteen of the markers exhibiting sequence similarity to the sequence of the mTERF family were present within QRfp-4R. There were approximated to the S60/08 map mainly at the positions 132.88 cM (7 markers) and 134.85 cM (6 markers). Further bioinformatic analysis indicated two possible positions of mTERF on the rye genetic map based on the Lo7xLo225 mapping population (180.33 cM and 185.93 cM) (Bauer et al. 2017). One of the silicoDArT (3593839c) converted into PCR condition assay showed polymorphic segregation compatible with the segregation before conversion. Surprisingly, one of the highest association values (R2 = 0.53) was obtained for markers that exhibited similarity to the keratin-associated protein’s DNA sequence (KAP) 5–4-like and 5–5-like from Aegilops tauschii subsp. tauschii. So far, the role of those proteins in plants has been poorly understood. The expression study results in maize suggested that KAP5-4 participated in response to water-deficit stress (Yang et al. 2012). BLAST analysis show homology (54.84%) of maize KAP5-4 with the qPE9-1 encoding KAP5-5 in rice. The qPE9-1, which is allelic to DEP1 (DENSE AND ERECT PANICLE 1), plays an integral role in the regulation of rice plant architecture, including panicle erectness (Zhou et al. 2009), and it is a positive regulator of rice grain length and weight (Li et al. 2019). Nevertheless, comparative study of KAP 5–4 and 5–5 protein sequences of maize and rice do not indicate significant homology with those proteins for Aegilops tauschii subsp. tauschii.

Besides scientific interest, the current study has practical application. We succeeded in converting some of our linked/associated markers to PCR-specific conditions with nearly 30% of success typical for other plant species (Fiust et al. 2015; Niedziela et al. 2015). In the current study, we used marker sequences to identify their homologous sequences in databases. The respective database sequences were used to design primers amplifying DNA fragments ranging from 363 to 632 bp. Unfortunately, many of the amplified fragments were monomorphic, probably because we failed to amplify polymorphic regions. Some markers had segregation patterns that did not follow the pattern of the original markers. Such a situation is most probably related to copies of the amplified sequence, which is not surprising in the rye (Guidet et al. 1991; Bauer et al. 2017).

In conclusion, the highly significant QTL responsible for fertility restoration in line SO2R/05 with CMS Pampa (restorer line) mapped to the distal part of the 4R chromosome. A group of silicoDArT and SNP markers was tightly linked to the QRfp-4R and/or associated with the trait. Successful conversion led to obtaining ten markers potentially applicable in commercial breeding. There were 5500712c, 3599981c, 3362765c, and 3744672c markers that revealed sequence homology with the sequences of fertility restoration genes identified in the other cereals. The markers can be helpful as a tool for further genetic analyses of the CMS-Pampa system and the QTL selection on the 4R chromosome.