Introduction

The mechanisms of meiosis, with a few notable exceptions, are highly conserved among sexually reproducing eukaryotes such as fungi, plants and animals (Gerton and Hawley 2005; Villeneuve and Hillers 2001). These processes include sister chromatid cohesion, homologous chromosome pairing, formation of the synaptonemal complex, double-stranded break (DSB) formation and processing, cross-over (CO) formation and resolution and two-step segregation of chromosomes, making meiosis special and different from mitosis. Therefore, typically, a common and shared set of specific meiotic genes can be found in all sexually reproducing organisms.

Formation of programmed double-stranded breaks (DSBs) during Prophase I is the upstream of many meiotic processes. First discovered in the budding yeast Saccharomyces cerevisiae, DSB initiation is catalysed by the highly conserved protein, SPO11 (Bergerat et al. 1997; de Massy et al. 1995; Keeney et al. 1997; Keeney and Kleckner 1995; Liu et al. 1995). In plants until now, many proteins have been isolated that function in DSB formation—PHS1/Rec114, PRD1/Mei1, PRD2/Mei4, PRD3/PAIR1/Mer2, DFO, PCH2 and MTOPVIB among which DFO have only been described in plants until now. DSBs are later loaded by the recombinases—RAD51 and DMC1. DMC1-mediated DNA repair using non-sister homologous chromatid appears to be the predominant pathway during Arabidopsis thaliana meiosis (Mercier et al. 2015). Chromosome axis mediates the formation of DSBs and its consecutive repair, resulting in the formation of inter-homolog COs. Cohesin complexes and axial element protein complexes form the components of chromosome axis formation. Cohesion complex is formed by the proteins—SMC1, SMC3, alpha-kleisin unit (SCC1/REC8) and SCC3 (Chelysheva et al. 2005; Onn et al. 2008). ASY1 and ASY2 are the HORMA domain containing axis proteins. ASY3 and ASY4 are the axis core proteins, essential for the recruitment of the HORMA domain proteins and the formation of axis (Caryl et al. 2000; Chambon et al. 2018; Ferdous et al. 2012; Sanchez-Moran et al. 2008, 2007; West et al. 2019). During the progression of prophase I, chromosome synapses and the axes of each homolog pair are connected to each other by coiled-coil transverse filaments (Dong and Roeder 2000; Liu et al. 1996; Meuwissen et al. 1992; Sym et al. 1993). ZYP1A and ZYP1B are identified as the proteins involved in the formation of synaptonemal complex (SC) in A. thaliana (Capilla-Perez et al. 2021; France et al. 2021; Higgins et al. 2005). There are two pathways for the formation of the COs—interference sensitive Class I and interference insensitive Class II pathways. Class I is the major one and depends on ZMM proteins (HEI10, HEIP1, MER3, MSH4, MSH5, PTD, ZIP2/SHOC1, ZIP4) and MLH1, MLH3 (Börner et al. 2004; Chelysheva et al. 2012; Dion et al. 2007; Franklin et al. 2006; Higgins et al. 2004, 2008b; Kuromori et al. 2008; Li et al. 2018; Lu et al. 2014; Macaisne et al. 2008; Mercier et al. 2005). Numerous DSBs are formed among which very few are processed to form COs. CO designation is still poorly understood (Berchowitz et al. 2007; Higgins et al. 2008a).

Understanding meiosis in plants can form a basis for advances in reproduction, fertility, genetics, breeding and thereby accelerate agricultural applications (Sanchez-Moran et al. 2008). Plants are also considered to be a good model system to study meiosis because in meiotic mutants, meiosis proceeds until the end of tetrad formation stage with meiotic defects like massive chromosome segregation defects but without confounding effects from the onset of meiotic arrest and apoptosis like in mammals (Higgins et al. 2004; Mercier and Grelon 2008). The kingdom Plantae or Archaeplastida in a broader sense includes freshwater unicellular algae (glaucophytes), photoautotrophic red algae (rhodophytes) and Viridiplantae which includes the paraphyletic group of green algae (chlorophytes and charophytes) and land plants. Land plants can be further classified into bryophytes (liverworts, hornworts, mosses), lycophytes, pteridophytes (ferns) and spermatophytes (gymnosperms and angiosperms) (Puttick et al. 2018). Plants are quite diverse and land plants alone are suggested to be approximately 500,000 species in comparison against 5400 mammalian species in total (Corlett 2016). Among plants, most studies investigating meiosis have been carried out in angiosperms, and the vast majority of studies characterising meiotic genes is done in the model plant A. thaliana and also in rice, maize, wheat, barley among others (Mercier and Grelon 2008). In total, around 100 genes involved in meiosis have been functionally studied in A. thaliana (Zhang et al. 2018). However, considering the diversity of plants, studying a few angiosperm models alone will not be sufficient to understand the evolution of meiosis in this kingdom. Functionally studying representative meiotic proteins from all plant lineages would be nearly impossible due to practical reasons. However, revolutionary advances in genomics means that sequence information is increasingly accumulating for many members of the Viridiplantae (green plants), and homology search can provide insights about the presence of meiotic machinery orthologs in a wide range of organisms.

To date, there is no comprehensive study that has aimed to search and detect core meiotic genes across all the main groups of the plant kingdom. Therefore, in this study, we searched for homologs of well-studied angiosperm meiotic genes among different plant lineages from algae to angiosperms. We bring to the attention of the readers that this paper discusses only Viridiplantae; however, rhodophytes and glaucophytes were included in our analysis as an outgroup. Our approach has allowed us to trace the conservation of the ancestral molecular machinery of plant meiosis and establish a correlation with the evolution of meiosis and the presence/absence of meiotic homologs across Viridiplantae. We found that proteins involved in DSB formation, chromosome axis formation and ZMM pathway are not detected in some early plant lineages, suggesting they are either missing or evolving rapidly during the diversification of the plant kingdom. Remarkably, our analysis confirms that land plants have two meiosis-expressed SPO11 paralogues, both essential for meiotic DSB formation and likely to act as a heterodimer, but only one homolog is retained in chlorophytes and charophytes. Our study shows how systematic analysis of the similarities and differences in meiotic regulation among plant species can provide insights into the fundamental elements of this critical process across evolution.

Materials and methods

Homology search using NCBI PSI-BLAST and phylogenetic tree construction

Twenty-seven genes with key meiotic function reported in A. thaliana were chosen for this study. Based on its function, the proteins were categorised into four pathways: chromosome axis/synaptonemal complex; double-strand break formation; strand invasion; and ZMM (Table 1). Protein sequences were downloaded from either UniProtKB or TAIR. TAIR has a list of plant homologs for all the proteins derived from the gene families of PANTHER 16.0 release which was used to create the initial multiple alignment file using MAFFT (Katoh and Standley 2013). NCBI PSI-BLAST was performed against selected species (Supplementary table 1) representing all plant lineages using A. thaliana protein sequence as the query. Initial MAFFT alignment was used as a PSSM upload. E-value threshold of maximum 5e-05 and BLOSUM62 matrix was the parameters used for the analysis. PSI-BLAST was continued by increasing the iteration until desired hits were obtained or until no significant hits were able to be found by PSI-BLAST. FASTA sequence of all the hits was downloaded, aligned by MAFFT, trimmed by trimAl (Capella-Gutierrez et al. 2009), and phylogenetic tree was constructed using IQ-TREE (Nguyen et al. 2015). In cases, where the tree could not be resolved, clustering analysis was performed using CLANS (Frickey and Lupas 2004). Cluster containing the initial query was filtered out, and the phylogenetic tree was constructed as described above. The trees were interpreted manually one by one.

Table 1 List of meiotic proteins used in this study

Similarity search with HMMER package and phylogenetic inference

HMMER is a more sensitive approach because it employs a whole profile of sequences as a query for similarity searches (Eddy 2011). This way, the program takes advantage of a diversity of amino acids for each position in order to find sequences with a lower level of conservation or more distantly related sequences. This is particularly important for comparisons of large assemblages of lineages of studies of large-scale evolution. In order to build a profile for HMMER searches, one needs to provide an initial trimmed multiple alignment of sequences, (we used MAFFT (Katoh and Standley 2013) and trimAl (Capella-Gutierrez et al. 2009) for alignment and trimming in this pipeline). This initial file is used as input for hmmbuild tool in order to generate the profile. The profile is then employed for searches against a database using hmmsearch tool. IDs obtained as an output of hmmsearch are selected up to an arbitrary threshold (normally e-6) which are used to recover the complete sequences from the database using another tool of the package, the esl-sfetch tool. Sequences obtained this way may be used for further analyses, especially phylogeny inference. For phylogenetic inferences, the sequences are aligned and trimmed using the same methods above and directed as input files for a powerful program for phylogeny inference, in this case, IQ-Tree (Minh et al. 2020). The phylogenies obtained this way are then analysed one by one for evolution patterns.

A comprehensive homology search was carried out by PSI-BLAST and HMMER throughout Archaeplastida. The results from both the analysis were compiled in the final figure. For a simplistic view, in some cases, only few representatives were mentioned for a lineage in the final figure and the rest were concatenated in the “Others” option (Fig. 1A, B). For further details, we recommend the readers to look into the Supplementary Table 2 and the phylogenetic trees (https://data.cyverse.org/dav-anon/iplant/home/gokilavani/Tracing_the_evolution_of_the_plant_meiotic_molecular_machinery). Glaucophytes and rhodophytes were considered to provide a root for your analyses, and as mentioned above, this paper focusses only on discussing the meiotic machinery in Viridiplantae.

Fig. 1
figure 1

Tracing the conservation of the meiotic machinery among plants. A Representative phylogenetic relationship illustration among the main plant lineages, showing the evolutionary events of important meiotic proteins. Loss of SPO11-1 in Chlorophyta and Charophyta is indicated. Yellow star represents the possible emergence of the meiotic proteins described only in plants till now—HEIP1 and DFO. B Using protein homology searches, PSI-BLAST and HMMER, we inferred either presence (coloured circles) or absence (empty circles) of meiotic-specific proteins in all main Viridiplantae lineages. In case of chlorophytes and charophytes, only representative species are shown and the rest are represented as “Others” for chlorophytes. Members of Glaucophyta and Rhodophyta were included in the analysis and represented as outgroups in the figure. See the supplementary table 1 for the whole list of species used in the analysis. Additional information about non-plant homologs obtained based on literature review is added to the figure. Colour code represents the four meiotic pathways according to which the proteins are classified in our analysis. Fully coloured circles = ortholog is detected in our analysis, light coloured circles = a homolog was obtained as a hit but we are unsure whether it is the right ortholog, white coloured (empty) circle = ortholog was not detected. C Phylogenetic tree of SPO11 showing its pattern of duplication across different lineages. Note that the meiotic-specific SPO11-1 is missing in chlorophytes and charophytes

Results and discussion

Chromosome axis and synaptonemal complex elements are structurally highly conserved but markedly divergent at the sequence level

ASY1, ASY3, REC8 and ZYP1 were detected in all the species or at least in one representative species of all the major Viridiplantae lineages used for the analysis. Exceptionally, we detected ASY4 only in streptophytes, and not in chlorophytes (Fig. 1B). Supporting our analysis, ASY4 was also previously not identified outside land plants (Chambon et al. 2018). On the contrary, ASY3 which interacts with ASY4 (Chambon et al. 2018) was detected in chlorophytes as well. It is important to consider that ASY4 is reported to lack functional domains which constitutes the most conserved region of a protein sequence. Sequence divergence is a feature of the chromosome axis proteins. Axis elements and central elements of the SC exhibit poor similarity between species at the sequence level, but their structure and function are widely conserved (Chambon et al. 2018). The lower sequence conservation could explain why we could not detect A. thaliana homolog of ASY4 in distant algal species. For example, A. thaliana ASY3, mammalian SYCP2 and yeast Red1 ensures the same function but lacks sequence similarity, likewise A. thaliana ASY4 and mammalian SYCP3 (Chambon et al. 2018). Such possibilities cannot be ruled out in this case which is beyond the scope of algorithms used in our analysis.

The evolution of the meiotic DSB machinery in plants

Among the eight DSB formation proteins we analysed, DFO was not detected in Chlorophyta, Charophyta and Bryophyta, PHS1 and PRD2 in Chlorophyta and PRD3 and SPO11-1 in Chlorophyta and Charophyta. The rest of the candidates were detected in all Viridiplantae lineages. DFO is a plant-specific protein involved in the formation of DSBs. It has been not reported in other eukaryotic super-groups yet (Zhang et al. 2012). In our analysis, DFO homologs were detected only in the vascular plants and not in other plant lineages, suggesting that DFO evolved only in the common ancestor of vascular plants. The homologs of the other three missing candidates PHS1/Rec114, PRD2/Mei4 and PRD3/Mer2 were described to interact with each other and form the RMM complex in Saccharomyces cerevisiae (Maleki et al. 2007; Yadav and Claeys Bouuaert 2021). Recently, it has been described, PHS1, PRD2 and the plant-specific DFO forms the RMM-like complex also in A. thaliana. PRD3 does not interact with the RMM-like proteins and is proposed to have a different role, likely in coordinating DSB formation and repair mechanisms in A. thaliana. PHS1/Rec114 is characterised to have role in DSB formation in species studied so far including maize, except A. thaliana where it is proposed not necessary for DSB formation but in regulating meiotic recombination (Vrielynck et al. 2021). Therefore, it becomes evident, and RMM complex has divergent roles in some cases like PRD3 and PHS1. Notably, PHS1/Rec114, PRD2/Mei4, PRD3/Mer2 homologs are conserved across different phyla, but their conservation at the protein sequence level is very weak (Vrielynck et al. 2021). PRD2 and PRD3 have no functional domains reported, except for the presence of several alpha helixes and coiled-coil motifs (De Muyt et al. 2009; Jiang et al. 2009; Vrielynck et al. 2021). The divergence observed among RMM proteins and absence of conserved domains in PRD2, PRD3 explains why we could not detect RMM homologs and plant-specific DFO, part of A. thaliana RMM-like complex in distant relatives of our analysis, reconfirming the minimal conservation of RMM proteins.

SPO11 heterodimerisation has likely evolved in land plants

SPO11 is encoded by a single gene in most organisms (Malik et al. 2007); however, plants differ from yeasts and animals in having several SPO11 homologs: two paralogs (SPO11-1 and SPO11-2) are involved in meiosis of A. thaliana (Grelon et al. 2001; Hartung and Puchta 2001; Hartung et al. 2007; Stacey et al. 2006), where they seem to form a heterodimer that is required for meiotic DSB formation, whereas SPO11-3 is involved in somatic DNA metabolism (Hartung et al. 2007; Sugimoto-Shirasu et al. 2002; Yin et al. 2002). However, the exact origin of SPO11-1 and SPO11-2 duplication and its relation to the heterodimerisation in plants outside A. thaliana remained unanswered. This caught our special attention and we further expanded our phylogenetic analysis by including more non-plant representatives from amoeba and archaea. This helped us in tracing the origin of SPO11 duplication in plants. SPO11-3 (Fig. 1C), which is very similar to archaeal sequences, was detected in all the lineages analysed. Remarkably, among Viridiplantae lineages, our analysis could detect both SPO11-1 and SPO11-2 only in land plants, except for Marchantia polymorpha, whereas chlorophytes and charophytes have only SPO11-2 and they seem to lack SPO11-1 (Fig. 1C). Suggesting two scenarios: 1- heterodimerization of SPO11 evolved in land plants, 2- heterodimerization evolved earlier in eukaryotes but was later lost independently in several lineages and replaced by a homodimer. However, the duplication of SPO11 is ancestral to eukaryotes, or happened very early in the evolution of eukaryotes as suggested by our phylogenetic analysis and is in agreement as reported earlier (Malik et al. 2007). Members of Amoebozoa, glaucophytes and red algae (grouped under other eukaryotes in Fig. 1C, B), share the same duplication with land plants and have both SPO11-1 and SPO11-2 paralogs (Fig. 1C). Thus, we propose that duplication of SPO11 is ancestral to eukaryotes and most likely SPO11-1 gene has been lost in both chlorophyte and charophyte lineages after the duplication event. Whether SPO11 activity function as a homodimer in these two lineages needs further investigation.

Strand invasion is the most conserved meiotic pathway

HOP2, MND1, DMC1, PCH2 are the proteins involved in strand invasion mechanism used for our analysis. It is noteworthy that it is the only group where all the proteins are found in all the lineages in our analysis except some specific cases (Fig. 1B). We observed DMC1 was not detected in glaucophytes analysed but the absence of a complete genome for these species makes it difficult to have a conclusion. DMC1 is the meiotic-specific homolog of bacterial RecA and is required for meiotic homologous recombination. MND1-HOP2 heterodimer promotes DMC1 activity at the DSB sites and promotes stable strand invasion and inter homologue bias (Kerzendorfer et al. 2006). However, some organisms lack DMC1, for example Drosophila melanogaster, Caenorhabditis elegans, Sordaria macrospora, Neurospora crassa, which shows that DMC1 can be dispensable. These organisms also lack the accessary factors HOP2 and MND1. However, Viridiplantae and mammals were reported to have DMC1 (Brown and Bishop 2014; Neale and Keeney 2006). Our analysis also shows that all the major Viridiplantae lineages have DMC1 along with HOP2 and MND1 and it may be essential for meiotic homologous recombination in Viridiplantae. PCH2 has a role in chromosome remodelling during SC formation. The initial characterisation of all these proteins in A. thaliana revealed their conservation among eukaryotes and observed functional similarity with their non-plant orthologs (Couteau et al. 1999; Kerzendorfer et al. 2006; Lambing et al. 2015; Schommer et al. 2003). Our analysis also concludes the same that strand invasion proteins are the most conserved among the other meiotic proteins we analysed, even at the sequence level. We speculate that such high conservation is linked to their enzymatic function.

The ZMM pathway is highly conserved and detectable in all plant lineages

PTD, HEI10, MER3, MLH1, MLH3, MSH4, MSH5, SHOC1, ZIP4 are among the ten ZMM pathway proteins analysed, found to be highly conserved in all the major plant lineages. HEIP1 was not detected in chlorophytes. (Fig. 1B). HEIP1 was identified as an interacting partner of HEI10 and suggested to be a member of ZMM pathway as the mutants showed reduced chiasma frequency in rice. It contains a potential plant-specific domain (GCK domain) and not reported outside the plant kingdom till now (Li et al. 2018).This is confirmed in our analysis, and HEIP1 was not detected outside plants and also in the whole chlorophyte lineage. We could not detect HEIP1 in some cases other than chlorophytes as well but at least one species in all other major Viridiplantae lineages had its ortholog. Based on the pattern observed, we propose, HEIP1 is a member of ZMM pathway with possible emergence during the diversification of chlorophytes. PTD orthologs are distant relatives of ERCC1 proteins which are present in both plants and animals (Lu et al. 2014; Wijeratne et al. 2006). SHOC1, the interacting partner of PTD, is a member of XPF superfamily widely present among eukaryotes (Macaisne et al. 2011) and has also been detected in all plant lineages of our study. However, in our analysis, PTD was absent in most of the chlorophytes. PTD may be lost independently from these algae or the protein sequence may be too diverse to be detected by the algorithms given that PTD lacks the conserved motif for endonuclease activity (Wijeratne et al. 2006). Considering both ERCC1 and XPF are structure-specific endonucleases belonging to the XPF superfamily, this difference in the conservation of PTD and SHOC1 implies that individual proteins of the same complex can have different evolutionary trajectories. Another interesting observation is that MER3 was not detected in Cycas panzhihuaensis and Taxus sinensis. MER3 is highly conserved and A. thaliana orthologs were even detected in the most distant algal species used in our analysis. In this case, it may indicate a possible independent loss in the species mentioned above.

Final remarks

Our comprehensive analysis was able to characterise SPO11 duplication in plant lineages. SPO11-1 is retained and possibly the heterodimerisation of SPO11-1, and SPO11-2 occurs only in land plants of Viridiplantae. We could also trace the possible origin of the meiotic genes, DFO and HEIP1, which is described only in plants till now. Although there is always a possibility that if the proteins are not detected, it does not necessarily mean they are absent. Notwithstanding the ever-growing volume of genome sequence information, some genomes remain incompletely annotated, which may result in the apparent absence of some proteins in the genome/proteome. Thus, although our results are based on more than one homology search approach, the non-detection of protein homologs in our analysis does not always imply their absence in a given species. Indeed, in a few instances, our failure to detect homologs seems suspicious, for example, the absence of MSH5 in Cycas panzhihuaensis, PCH2 in Physcomitrella patens, among others. These candidates are highly conserved and detected in all other species analysed. Here it becomes difficult to conclude, whether this is an independent loss scenario or it indicates an artefact. Such cases need more studies to give a concrete answer while other cases discussed had a clear pattern. ASY4, DFO, PHS1, PRD2, PRD3, HEIP1 are absent from all the species of a particular lineage. Here we can be more confident that they are putatively absent or have high sequence divergence to be identified by the algorithms. If meiosis is an ancestral characteristic of eukaryotes, then this raises the question of why some of the proteins in the highly conserved meiotic pathways are putatively absent/not recognised in certain lineages. Possible explanation would be either they are poorly conserved or evolved in some ancestor of the land plants but are absent in the others. If sequence divergence is the case, then it remains to be determined why, within the same pathway, some proteins are more divergent than others; moreover, such an explanation potentially hints at other, yet to identified, evolutionary pressures determining the evolution of these proteins. Most of the meiotic proteins which have enzymatic function or a described functional domain, for example ASY1, SPO11, HEI10, MLH1, MLH3 among others, are observed to be highly conserved in our analysis, whereas proteins like PRD2, PRD3 and ASY4, where functional domains were reported to be absent and do not have an enzymatic function and were less conserved. What also remains to be elucidated is the relevance of lineage-specific loss/gain of certain proteins for meiotic adaptation. Functional validation of selected candidates will be necessary to answer the unanswered questions and to get a complete picture of the different meiotic strategies that have evolved across the massive plant kingdom but we hope our homology search is an attempt to provide first-hand information about the meiotic core proteins across the kingdom.

Limitations of the study

Arabidopsis thaliana protein sequence was used as the initial query in the analysis. We have considered using yeast homologs as the query. Considering, even though meiotic machinery is conserved, not all the proteins are conserved at sequence level between yeast and plants. In some cases, past studies have reported that the yeast and Arabidopsis homologs have functional conservation but divergent at the sequence level. The other way around, plant-specific protein like DFO is not reported in yeast. Considering the above points, we narrowed down our aim to look only for the proteins reported in the model plant Arabidopsis thaliana among other Viridiplantae lineages and not to look for all the reported meiotic proteins. However, the latter is very exciting but the sequence-based homology search algorithms used in this work will not suffice the needs. Involving structure-based algorithms and carefully looking for functional domains of each protein case by case can be considered but is not the scope of this manuscript.

The sensitivity of the algorithms decreased in the evolutionary distant lineages of Arabidopsis thaliana due to sequence divergence and one may think, this could bias our findings. To increase the chances of finding the orthologs, most of the algae which had omics data were included in our analysis. However, we would like to bring to your kind notice that the data sets available for algae were limited. In many cases, the data set available was either vegetative transcriptome or draft genome. This was particularly the case for Coleochaete and glaucophytes. Since we are dealing with meiotic-specific candidates, the transcriptome data from vegetative phase may not have their expression, and thus, no hits will be obtained. All the cases, where hits were not obtained, were carefully considered. Due to limitations of the analysis used, no hits do not necessarily mean the protein is absent. Only the cases, where hits were not obtained in the whole lineage was considered as a clear pattern unless specifically mentioned and interpreted further.

Author contribution statement

GT and PGH performed the analysis. GT and AM wrote the first draft with subsequent input from PGH and RM. RM and AM conceived and coordinated the study.