Transcriptomics of manually isolated Amborella trichopoda egg apparatus cells

A protocol for the isolation of egg apparatus cells from the basal angiosperm Amborella trichopoda to generate RNA-seq data for evolutionary studies of fertilization-associated genes. Sexual reproduction is particularly complex in flowering plants (angiosperms). Studies in eudicot and monocot model species have significantly contributed to our knowledge on cell fate specification of gametophytic cells and on the numerous cellular communication events necessary to deliver the two sperm cells into the embryo sac and to accomplish double fertilization. However, for a deeper understanding of the evolution of these processes, morphological, genomic and gene expression studies in extant basal angiosperms are inevitable. The basal angiosperm Amborella trichopoda is of special importance for evolutionary studies, as it is likely sister to all other living angiosperms. Here, we report about a method to isolate Amborella egg apparatus cells and on genome-wide gene expression profiles in these cells. Our transcriptomics data revealed Amborella-specific genes and genes conserved in eudicots and monocots. Gene products include secreted proteins, such as small cysteine-rich proteins previously reported to act as extracellular signaling molecules with important roles during double fertilization. The detection of transcripts encoding EGG CELL 1 (EC1) and related prolamin-like family proteins in Amborella egg cells demonstrates the potential of the generated data set to study conserved molecular mechanisms and the evolution of fertilization-related genes and their encoded proteins.


Introduction
Molecular and fossil evidence indicates that flowering plants (angiosperms) arose abruptly in the late Jurassic/early Cretaceous from a not yet identified ancestral lineage. Since then, angiosperms rapidly diversified to form over 350,000 species alive today (Bell et al. 2005;Doyle 2012; Willis and McElwain 2013;Scutt 2018) and represent the most abundant and ecologically successful group of plants on earth. Amborella trichopoda, a woody shrub endemic to New Caledonia, holds a key position in the angiosperm evolutionary tree as the most basal angiosperm. It is the sole living species of the Amborellaceae family which is strongly supported as the sister lineage of all other extant flowering plants (Amborella Genome Project 2013). For this reason, Amborella has been the focus of many phylogenetic, genomic and reproductive biology studies, providing a deeper understanding regarding the evolution of flowering plants (for review, see Scutt 2018).

3
Among the defining features of angiosperms are carpels, which enclose and protect the ovules, and the process of double fertilization involving two pairs of male and female gametes, respectively (Dresselhaus et al. 2016;Scutt 2018). The highly reduced female gametophyte (embryo sac) develops within the ovule and contains the egg cell and central cell. Both cells become fertilized and give rise to the embryo and the embryo-nourishing endosperm of the seed, respectively. A large majority of extant angiosperms produce the monosporic Polygonum-type embryo sac, which was first described in Polygonum divaricatum (Strasburger 1879). This originates from a single haploid spore (the functional megaspore), which undergoes three incomplete mitotic division cycles to develop into an eight-nucleate syncytium. After cellularization, the Polygonum-type embryo sac is seven celled but eight nucleate, with three cells at each pole and a large binucleate central cell. The egg apparatus is located close to the micropylar entrance point of the pollen tube and comprises one egg cell and two accessory synergid cells, which are required for pollen tube attraction and reception (for review, see Higashiyama and Yang 2017). Three antipodal cells form at the opposite chalazal end of the embryo sac, but little is known about their function (Sprunck and Groß-Hardt 2011).
Although the Polygonum-type embryo sac is most prevalent in flowering plants, embryo sac formation is highly diverse across different plant taxa, suggesting an extensive degree of developmental experimentation during angiosperm evolution (Friedman 2006). Embryo sacs can also develop from two or four functional megaspores (bisporic and tetrasporic development, respectively), and/or from a varying number of mitotic divisions following meiosis. Within the embryo sac, the cells can furthermore occupy different positions (Maheshwari 1950;. Notably, none of the members of the earliest extant angiosperm clades produces a Polygonum-type embryo sac (Friedman 2006). The four-celled Nuphar/Schisandra-type embryo sac comprising one egg cell, two synergid cells and a uninucleate central cell appears to be universal among extant members of the Nymphaeales and occurs also in Austrobaileyales (e.g., Williams and Friedman 2004;Tobe et al. 2007;Friedman 2008;Rudall et al. 2008). According to the present knowledge, the nine-nucleate, eight-celled Amborellatype female gametophyte is restricted to Amborella (Friedman and Ryerson 2009) and might represent a critical link between angiosperms and gymnosperms (Friedman 2006). Detailed structural analyses revealed that Amborella embryo sac development initially parallels that of the Polygonumtype, since immature embryo sacs are eight nucleate and seven celled, with each three cells at the chalazal and the micropylar pole. However, prior to embryo sac maturation, one of the three cells at the micropylar pole undergoes a final mitotic and cytokinetic division to produce two daughter cells: a third synergid cell and the egg cell. At maturity, the Amborella embryo sac is therefore eight celled, with three antipodal cells, a large binucleate central cell and a unique four-celled egg apparatus comprising three synergid cells and the egg cell which is the lineal sister cell of a synergid cell (Friedman 2006). From studies in model plants such as Arabidopsis and maize, we know that cell fate specification processes in the developing embryo sac are tightly controlled and a few of the genes involved in these have been identified (for review, see Tekleyohans et al. 2017;Zhou et al. 2017). Previous studies furthermore showed that synergid cells, egg cells and central cells fulfill unique functions during flowering plant reproduction, which are reflected by distinct gene expression profiles (Márton et al. 2005;Okuda et al. 2009;Wuest et al. 2010;Anderson et al. 2013;Chen et al. 2017). These include molecules mediating pollen tube guidance to the ovule, induction of pollen tube burst, gamete interaction, activation and fusion (Okuda et al. 2009;Amien et al. 2010;Márton et al. 2012;Sprunck et al. 2012;Takeuchi and Higashiyama 2012;Ge et al. 2017). Nevertheless, a deeper molecular understanding of the evolution of flowering plant embryo sac formation is missing, and so far, the transcriptomes of individual embryo sac cells from basal angiosperms remained unexplored.
Here, we describe a method to manually isolate viable Amborella egg apparatus cells. Their morphology was investigated, and they were subsequently used to generate genome-wide gene expression profiles from a low number of cells for bioinformatics analysis. In this pioneer study, we identified Amborella-specific genes preferentially expressed in egg apparatus cells and first candidate genes for comparative evolutionary and functional studies.

Plant material and growth conditions
Flowers were harvested from a female Amborella plant (one flowering period per year), grown in the greenhouse under controlled conditions of 12 h of light, 16.2 °C and constant humidity of 66%. Control tissues of tepals, leaves and roots were harvested and immediately frozen in liquid nitrogen.

Isolation of egg apparatus cells
The isolation procedure was performed only on the most recently fully opened flowers with visible wet stigmas, harvested in the morning (8:00 am). Flowers were harvested in a glass petri dish on wet filter paper, to keep high humidity until dissection. All following steps were performed at room temperature, without any breaks and in an expeditious manner (approximately 1.5 h for processing ten carpels). Under 1 3 a Nikon ZM645 stereo microscope, individual carpels were gently removed from the inflorescence using disposable hypodermic needles (0.4 mm diameter, 13 mm length) and placed in a new glass petri dish. Subsequently, carpels were crosssectioned using a scalpel (No. 11 sterile carbon steel surgical blade). Cutting angle and position of cross section is indicated in Fig. 1b. The lower portion of dissected carpels is immediately transferred to a microscopy Imaging Dish GC 1.5 (130-098-284, Miltenyi Biotec GmbH), filled with 0.3 M filter-sterilized mannitol solution. Subsequently, hypodermic needles (0.4 mm diameter, 13 mm length) were used to further dissect the lower portion of the carpel in mannitol solution. The inner integument was separated from the outer integument, and the inner part of the ovule was gently lifted out of the carpel. The remains of the outer ovary sections were removed. Subsequently, the release of egg apparatus cells from the ovule tips was performed under a Nikon Eclipse TE2000-S inverted microscope equipped with a 20× objective (Plan Fluor ELWD 20×/0.45, DIC L/N1). For the isolation procedure, fine-tipped glass needles were prepared from borosilicate glass rods (4 mm diameter, 10 cm length, Hilgenberg GmbH) by using a heating device (Bunsen burner). To release egg apparatus cells from the ovule tip, one glass needle was used to hold the inner integument in place. The cells were then released by gently tapping on the outer side of the ovule with the second glass needle. Once egg apparatus cells were released, and their diameter was recorded by using a 40× objective (Plan Fluor ELWD 40×/0.60, DIC L/N1, Nikon). Cells were immediately collected using a CellTram ® Oil (Eppendorf) equipped with a Ringcaps ® glass 50-micron capillary (Hirschmann Instru-ments™ 9,600,150). The fine tip of the glass capillary was prepared using a micropipette puller as described elsewhere (Englhart et al. 2017). Individual cells were transferred to 0.5 ml LoBind reaction tubes (Sigma Aldrich) in a minimal amount of mannitol solution (max. 3-5 µl) and immediately frozen in liquid nitrogen for further analyses. Frozen cells were stored at − 80 °C.

RNA isolation, library preparation and sequencing
For control tissues, total RNA was isolated from three biological replicates of tepals, leaves and roots with a Total RNA from egg apparatus cells was extracted according to the "Purification of total RNA from animal and human cells" protocol of the RNeasy Plus Micro Kit (Qiagen). Twenty-eight cells of L-(diameter from 25.85 to 17.96 µm) and M-(diameter from 17.65 to 16.35 µm) size categories and 26 cells of the S-size category (diameter from 16.17 to 13.13 µm) were used for RNA extraction. Finally, total RNA was eluted in 12 μl of nuclease-free water. RNA quality control measurements on the Bioanalyzer were unsuitable, since the RNA quantities obtained from the low number of egg apparatus cells were below the assay sensitivity. The SMARTer Ultra Low Input RNA Kit for Sequencing v4 (Clontech Laboratories, Inc.) was used to generate first-strand cDNA from 500 pg total RNA. Double-stranded cDNA was amplified by LD PCR (14 cycles) and purified via magnetic bead cleanup. Library preparation was carried out as described in the Illumina Nextera XT Sample Preparation Guide (Illumina, Inc.). 150 pg of input cDNA was tagmented (tagged and fragmented) by the Nextera XT transposome. Products were purified and amplified via a limited-cycle PCR program to generate multiplexed sequencing libraries. For the PCR step, 1:5 dilutions of index 1 (i7) and index 2 (i5) primers were used.
Libraries from egg apparatus cells and control tissues were quantified using the KAPA SYBR FAST ABI Prism Library Quantification Kit. Equimolar amounts of each library were used for cluster generation on the cBot (TruSeq SR Cluster Kit v3). Sequencing runs were performed on a HiSeq 1000 instrument using the indexed, 2 × 100 cycles paired end (PE) protocol and the TruSeq SBS v3 Kit. Image analysis and base calling resulted in ".bcl" files, which were converted into ".fastq" files using the CASAVA1.8.2 software. RNA extraction of egg apparatus cells and Illumina deep sequencing were carried out at the genomics core facility of the University of Regensburg (Center for Fluorescent Bioanalytics KFB; www.kfb-regen sburg .de).

RNA-seq data analysis
RNA-seq data were processed using Kallisto (Bray et al. 2016). Reads were mapped to the Amborella genome v1.0 (Amborella Genome Project 2013), obtained from Ensembl Plants (Kersey et al. 2016). Gene expression values were determined using Transcripts Per Million (TPM) (Conesa et al. 2016) and clustered using Python Seaborn software package with "Euclidean" distance and "complete" method, which was also used to generate the heat map shown in Fig. 3c. A comprehensive overview on normalized expression data in Amborella egg apparatus cells, leaves, roots and tepals is shown in Supplementary Table 1. RNA-seq data will be publicly available in the open source web server CoNekT (Proost and Mutwil 2018).

Isolation of Amborella egg apparatus cells
We report here an isolation procedure for Amborella egg apparatus cells by manually dissecting carpels of female flowers, without treating ovules with cell wall-degrading enzymes. Enzyme-free methods have been successfully established mainly for the isolation of egg cells from angiosperm species such as Brassica napus, Hordeum vulgare, Oryza sativa, Plumbago zeylanica and Triticum aestivum (Table 1). After gently removing carpels from female flowers of Amborella (Fig. 1a), their characteristic shape with a flat adaxial and round abaxial side became apparent. Carpels were placed laterally to perform a cross section through ovaries as indicated in Fig. 1b. It was very clear in our studies that position and angle of cross sections determined later success of releasing egg apparatus cells. Cross sections traverse the lumen of central cells (Fig. 1c), which will be destroyed and, thus, cannot be isolated by the described protocol. Immediate transfer of the lower part of carpels into 0.3 M mannitol solution is essential to prevent osmotic stress and to avoid the irreversible collapse of egg apparatus cells by drying out (Fig. 1d).
In pilot experiments, we tested different mannitol molarities and determined 0.3 M mannitol to be optimal to maintain the shape and size of released egg apparatus cells. Similar mannitol concentrations have been reported for the isolation of egg apparatus cells from rice and spider lily (Uchiumi et al. 2006;Ohshika and Ikeda 1994). By contrast, egg apparatus cells from other flowering plant species require higher osmolarities, such as 0.55 M mannitol for wheat (Kovács et al. 1994), or even 0.7 M mannitol for Plumbago zeylanica and rapeseed (Cao andRussell 1997, Katoh et al. 1997). We used one hypodermic needle to hold the outer part of the carpel in place. With a second hypodermic needle, we gently separated the inner integument from the outer integument and lifted ovule tips out of carpel sections (Fig. 1e). After removing remains of carpels from the mannitol solution, release of egg apparatus cells was performed and observed under an inverted microscope. Two fine-tipped glass needles were used to hold one ovule tip and to gently push the egg apparatus cells out of the tip, respectively. One egg apparatus cell, just being released 1 3 from an ovule tip, is shown in Fig. 1f. Subsequently, egg apparatus cells were imaged, transferred into a 0.5 ml reaction tube by micropipette aspiration and immediately frozen in liquid nitrogen. Approximately 5-10 apparently intact egg apparatus cells could be isolated from 50 carpels.

Morphology and identity of isolated egg apparatus cells
Depending on the plant species, either egg or synergid cells were reported to be slightly larger in diameter (Table 1). Only rice and petunia egg apparatus cells were reported to be rather similar in size (Van Went and Kwee 1990; Uchiumi et al. 2006). Furthermore, egg cells and synergid cells isolated from Arabidopsis, or rice, showed distinct cellular morphologies (Englhart et al. 2017;Ohnishi et al. 2011).
To be able to distinguish Amborella egg cells and synergid cells without the help of molecular markers, we studied the morphology of isolated egg apparatus cells in more detail. Typically, intact cells exhibited a plasma membrane, a vacuole of variable size and a nucleus, which could be stained by SYBR Green I (Fig. 1g). Comparable to previous reports on Plumbago, wheat, barley and rice egg cells that were isolated by non-enzymatic cell isolation techniques (e.g., Holm et al. 1994;Cao and Russell 1997;Sprunck et al. 2005;Leljak-Levanic et al. 2013;Ohnishi et al. 2011), egg apparatus cells from Amborella became immediately round during the isolation process, suggesting that either a very thin primary cell wall or no cell wall was present.
Eventually, a group of spherical cells was simultaneously released from one ovary tip (Fig. 2). These cells exhibited slightly different morphologies in terms of cell size and granular, contrast-rich structures in their cytoplasm (Fig. 2a). The diameter of "large" cells ranged from 18 to 25 µm (± 2.23), and they were frequently attached to either one, two or three smaller cells (Fig. 2a, b, d). The cytoplasm of these "large" cells, which we considered as egg cells (see below), contained relatively few granular particles and a large vacuole that pushed the nucleus (with its clearly recognizable nucleolus) to the cell periphery (Fig. 2d). Diameters of "small" egg apparatus cells ranged from 13 to 15 µm (± 1.28). Furthermore, these cells featured less vacuoles, more granular particles in the cytoplasm and were sometimes released in a group of three (Fig. 2c). Based on their similar morphology and the fact that the female gametophyte of A. trichopoda contains three synergid cells (Friedman and Ryerson 2009), we determined the cells with diameters ranging from 13 to 15 µm as synergid cells. 1 3 Eventually, we observed clusters of egg apparatus cells where one of the three synergid cells was larger than the other two (Fig. 2a). Furthermore, a substantial number of isolated egg apparatus cells did not exhibit unambiguous morphological features and cell diameters, which prevented us from classifying them either as egg cells, or as synergid cells. However, the Amborella egg cell is unique in that it differentiates very late, only after a terminal cell division of one of the three micropylar cells of the seven-celled, eightnucleate immature embryo sac present in pre-anthesis floral buds (Friedman 2006). The Amborella egg cell is therefore a lineal sister of the third synergid cell, while in all other angiosperm embryo sacs, the egg cell is the mitotic sister of a polar nucleus of the central cell Friedman 2006). Considering this uniqueness, the close relation between the egg cell and the third synergid cell of Amborella might be reflected by a more similar morphology, especially when the two daughter cells are isolated rather short after the final cell division. For these reasons, we assumed that the "intermediate-sized" and less vacuolated group of egg apparatus cells include young egg cells and their mitotic sister synergid cells, which cannot be unequivocally distinguished by their size and/or morphology. To obtain high-quality RNA-seq data, we had to pool at least 25 cells per replicate. We therefore assigned three egg apparatus cell size categories, based on the cell diameter (Fig. 3a). We pooled 28 cells with a diameter ranging from 17.96 to 25.85 µm as "Large" (L) group predominantly comprising egg cells. A group of 28 cells with diameters of 16.35 to 17.65 µm was classified as "Medium" (M) group enriched in young egg cells and third synergid cells. 26 cells with a diameter ranging from 13.13 to 16.17 µm were assigned as "Small" (S) group predominantly consisting of synergid cells.

RNA-seq analysis of isolated embryo sac cells
After RNA extraction, library preparation and Illumina sequencing, we obtained a total of 57.58 million (M) reads for the three groups of egg apparatus cells that mapped to Amborella genome v1 (Amborella Genome 2013). We also generated RNA-seq data from sporophytic control tissues (three biological replicates of leaves, roots and tepals, respectively) and obtained a total number of 22.62 M mapped reads for leaves, 19.07 M for roots and 50.38 M for tepals. Supplementary Table 1 provides Amborella gene symbols and normalized transcripts per million (TPM) values obtained for the three egg apparatus cell samples and the sporophytic control tissues.
A principal component analysis (PCA) performed on the expression data revealed that the three egg apparatus samples group closely together and are clearly separated from the clusters formed by the biological replicates of the three control tissues (Supplementary Figure 2). Among the egg apparatus cell size categories, L-and M-size cell types grouped together more closely.
We employed a threshold level of ≥ 1 TPM to estimate the number of expressed genes in the three groups of egg apparatus cells. Thereby, we detected 11,841 expressed genes in the category of large (L) cells, 11,808 expressed genes in the intermediate-sized cell category (M) and 12,501 expressed genes in the group of small (S) egg apparatus cells (Fig. 3b). Overall, 15,409 genes were expressed in the Amborella egg apparatus cells. Considering a total number of 27,313 coding genes in the genome of A. trichopoda (Ensembl Plants database release 41; September 2018), the number of egg apparatus-expressed genes comprises 56.42% of all Amborella genes.
When comparing gene expression patterns in the three cell size categories, we detected 8737 genes to be conjointly expressed in large, medium and small egg apparatus cells  Table 2). Despite this rather large overlap (56.7% of 15,409 egg apparatus-expressed genes), a comparison of the top 6000 genes expressed in each cell size category identified prominent gene clusters with differential expression in either the L-, M-or S-group of egg apparatus cells (Fig. 3c). These distinct transcriptome signatures could reflect a cell type-dependent accumulation of transcripts necessary for the egg cells and synergid cells to fulfill their specialized roles during double fertilization. However, more biological replicates are needed to analyze the differential gene expression in Amborella egg apparatus cell types in detail, as only single replicates of each cell size category have been generated during this pioneer study.

Egg apparatus-enriched genes encode CRPs, novel proteins and proteins with homologs in eudicots and/or monocots
We next analyzed the strongest expressed genes that were specifically enriched in the respective size category of egg apparatus cells, but were not expressed in the sporophytic control tissues. Therefore, we excluded all genes with a TPM ≥ 3 in roots, leaves and tepals. In Table 2, we summarize the ten strongest expressed Amborella genes enriched in each cell size category and their best match in sequence similarity searches. Table 2 also provides information on three more plant species annotated for subsequent BLASTP hits. The division of the best BLAST matches to Amborella Heat map comparing the top 6000 genes expressed in each cell size category. Genes are present in rows, while samples are present in columns. Z-score normalized values are shown, ranging from 0 (mean expression, in white) to 1 (maximum standard deviation away from the mean of expression, in red) 1 3 sequences, approximately equally between eudicots and monocots, is in agreement with Amborella's phylogenetic position as a basal angiosperm, equally distantly related to these two groups (Amborella Genome 2013). Notably, six of the 30 genes shown in Table 2 and Supplementary  Table 3 were annotated to encode "hypothetical proteins" in A. trichopoda and did not reveal significant sequence similarities to any protein from other flowering plant species. More than one-third of proteins encoded by the 30 most abundantly enriched transcripts in Amborella egg apparatus cells were predicted to be directed to the secretory pathway, or to localize in the extracellular space. These include a potential Amborella-specific small secreted cysteine-rich protein (CRP) of 109 amino acids (encoded by AMTR_s00059p00189800) and another small secreted CRP with eight conserved cysteine residues (AMTR_ s00162p00083730) showing sequence similarity to nonspecific lipid transfer proteins (nsLTPs), which are a part of the prolamin superfamily and abundant in liverworts, mosses and all other investigated land plants (Edstam et al. 2011).
A defensin-like (DEFL) protein and a putative RALF (RAPID ALKALINIZATION FACTOR), encoded by AMTR_s00067p00126450 and AMTR_s00067p00138490, respectively, were detected in the S-group (enriched in synergid cells). RALFs are CRPs of around 5-kDa that have been identified as extracellular ligands of the Catharanthus roseus RLK1-like (CrRLK1L) family of receptor-like kinases, known to regulate cell expansion, pollen tube cell integrity and burst (Haruta et al. 2014;Ge et al. 2017). The Amborella genome was reported to encode nine RALF family proteins belonging to the major clades II, III and IV (Campbell and Turner 2017). Multiple sequence alignment of S-group-enriched clade IV-C AMTR_s00067p00138490 with other clade IV RALFs from maize, rice and Arabidopsis revealed that it contains the four cysteine residues typically conserved in RALF proteins, but lacks the YISY motif conserved in clade I-III RALFs (Supplementary Figure 3). When we investigated all nine RALFs for their expression in Amborella sporophytic tissues and egg apparatus cells, the second clade IV-C RALF-related AMTR_s00067p00137560 also showed weak, but selective expression in egg apparatus cells, while clade II-B (AMTR_s00045p00202280) and clade III-C RALFs (AMTR_s00017p00211590) were almost ubiquitously expressed. Strikingly, clade IV-B RALF-related Table 2 Top ten Amborella genes enriched in large (L), medium (M), and small (S) cell size categories, respectively, but with average TPM < 3 in sepals, leaves and roots. Abbreviations: AA, amino acids; ACA, alpha carbonic anhydrase; CYP450, cytochrome P450; ERF, ethylene response factor; nsLTP, non-specific lipid transfer protein; RALF, rapid alkalinization factor; TPM, transcripts per million *Excluding the top match to the query protein if annotated as "unknown," "uncharacterized" or "hypothetical," except when no other significant matches were found # tblastn search result 1 3 AMTR_s00045p00207290 exhibited extremely strong expression values in the synergid-enriched group of egg apparatus cells (Fig. 4). It therefore seems likely that the respective RALF-related protein holds an important function as extracellular ligand in Amborella synergid cells.
The above mentioned DEFL did not reveal any similarity to synergid-expressed DEFL family proteins with known functions such as LUREs, which act as species-specific pollen tube attractants in Torenia and Arabidopsis (Okuda et al. 2009;Takeuchi and Higashiyama 2012), or maize ES1-4 (EMBRYO SAC1-4), which induce pollen tube tip burst (Amien et al. 2010). However, related functions cannot be excluded as DEFLs are highly polymorphic proteins, making it difficult to determine exact relationships of gene orthology (Higashiyama and Takeuchi 2015). Furthermore, other DEFLs with enriched expression in Amborella synergid cells might be present among the genes comprising egg apparatus-expressed genes with transcripts in roots, leaves or tepals (Supplementary Table 4).

Amborella egg cells express EGG CELL 1 and other DUF784/DUF1278 gene family members
Importantly, we also identified four genes encoding CRPs with a prolamin-like domain (PF05617) to be specifically expressed in Amborella egg apparatus cells (Supplementary Table 3). These include two of the strongest expressed genes in egg cells (L-group: AMTR_s00067p00156690 and AMTR_s00032p00113070), the strongest expressed gene in the M-group (AMTR_s00017p00240840) and one gene in the S-group (AMTR_s00067p00155100). A fifth CRP with prolamin-like characteristics was detected when we manually inspected the predicted protein sequence encoded by AMTR_s00058p00172040, which is the strongest expressed genes in egg cells (L-group) ( Table 2).

Fig. 4 Expression pattern of
Amborella genes encoding RALFs. The classification of RALF family proteins into four major clades is according to Campbell and Turner (2017). Nine genes encoding RALF and RALF-related proteins were identified in the Amborella genome, belonging to clades II-B, III-B to III-D and IV-B and IV-C, respectively. Clade IV-B RALF-related AMTR_s00045p00207290 is expressed in the control tissues (tepals, leaves, roots), but highly enriched in S-group egg apparatus cells (synergids) when compared with all other RALF and RALF-related genes. Clade IV-C RALF-related genes AMTR_s00067p00138490 and AMTR_s00067p00137560 are selectively present in egg apparatus cells, without detectable expression in the control tissues 1 3 Prolamin-like CRPs comprise Domain of Unknown Function 784 (DUF784) and DUF1278 proteins, which are often collectively termed as ECA1 gametogenesisrelated proteins (Zhang 2009). In Arabidopsis, DUF784 and DUF1278 proteins are encoded by embryo sac-specific gene families (Jones-Rhoades et al. 2007). So far, a function for prolamin-like CRPs has only been assigned for five Arabidopsis DUF1278 family members termed EGG CELL 1 (EC1). EC1 proteins are secreted by the egg cell upon sperm cell arrival and essential for successful gamete interactions during double fertilization (Sprunck et al. 2012;Rademacher and Sprunck 2013). Notably, sequence similarity searches using both AMTR_s00067p00156690 and AMTR_s00017p00240840 resulted in matches to proteins annotated as Egg cell-secreted protein 1.4 (Table 2). To investigate Amborella prolamin-like proteins in more detail, we performed multiple sequence alignments and maximum likelihood phylogenies (Fig. 5). For sequence comparisons, we included EC1 proteins with validated expression in wheat, Arabidopsis, maize and rice egg cells (Sprunck et al. 2005;Ohnishi et al. 2011;Sprunck et al. 2012;Chen et al. 2017;Resentini et al. 2017), but also Arabidopsis DUF784 and DUF1278 proteins with validated expression in synergid cells (Jones-Rhoades et al. 2007;Steffen et al. 2007). The phylogenetic tree in Fig. 5 shows individual clades formed by EC1 proteins, synergid-expressed DUF1278 proteins and DUF784 proteins, respectively. Four Amborella prolamin-like proteins do not locate to these clades. However, the protein encoded by AMTR_s00017p00240840 forms a subclade with EC1 proteins from maize and rice suggesting that it represents a true Amborella EC1 ortholog (Fig. 5). Although a similar function for Amborella EC1 during double fertilization remains to be experimentally confirmed, our findings strongly suggest that the molecular mechanism of EC1-mediated sperm activation by the egg cell, as previously reported for Arabidopsis EC1 (Sprunck et al. 2012;Rademacher and Sprunck 2013), existed already in the most recent common ancestor of all extant flowering plants.

Conclusions
We have shown that the described method for the isolation of living egg apparatus cells from the basal angiosperm Amborella is suitable for subsequent investigations including omics approaches, but it will also be valuable for in vitro fertilization experiments to generate zygote and embryo stages. A limited number of Amborella egg apparatus cells was used to generate RNA-Seq data and to identify cell typeenriched transcripts. The presented data set revealed first candidate genes for comparative evolutionary and functional studies in angiosperms, and more genes remain to be identified. It will also allow the search for molecular footprints of non-flowering seed plants, for example genes expressed in archegonia of gymnosperms. For future transcriptome studies of Amborella embryo sac cells, it is important to generate more biological replicates. A detailed comparison of gene expression programs in egg cells and synergid cells from Amborella with those from eudicot and monocot egg cells and synergid cells will contribute to the identification of evolutionary conserved molecular mechanisms involved in their specific functions during double fertilization. Future transcriptome studies should also include the Amborella central cell. This will require technical adaptions of the presented method to avoid the destruction of this large and highly vacuolated cell that occupies most of the volume of the embryo sac. Furthermore, the unambiguous classification of Amborella egg cells and synergid cells will be essential to prevent cross-contamination between pooled cell populations. Nevertheless, single-cell RNA sequencing technologies are rapidly evolving and will soon allow the profiling and classification of individual embryo sac cells, isolated from Amborella and other angiosperms.
Author contribution statement SS and TD designed the study, MFT established the cell isolation protocol and performed experiments, SP and MM processed raw RNA-seq data to enable data analyses by MFT, MM and SS. CS provided the female Amborella plant used for cell isolation. SS wrote the manuscript, with input from TD, MFT and CS. All authors approved the manuscript.