Overview of the transcriptome in seeds
To identify transcripts and biological processes involved in early zygotic embryogenesis in Scots pine, RNA was isolated from embryos and megagametophytes representing four developmental stages (Fig. 1). Nine RNA-seq libraries were sequenced by using 454 Roche sequencing technology. A total of 6.6 million raw reads was generated, resulting in 121,938 transcripts varying in length from 150 to 18,101 bp and with a mean length of 1242 bp (Additional file 1: Tables S1, S2, S3 and S4).
In total, 36,106 transcripts containing ORFs were identified in the seed transcriptome, of which 28,190 transcripts (78%) had significant alignments to the Arabidopsis thaliana TAIR10 database and 7404 transcripts (20%) with the Plant Transcription Factor Database (Table 1). 26,743 transcripts (74%) had annotated GO terms into at least one of the three main categories: 22,362 transcripts (60%) displayed one or more ontologies related to Biological Process, 24,259 (67%) to Molecular Function and 19,301 (53%) to Cell Component.
Table 1 Summary of RNA-seq seed transcriptome data
Transcript expression values were calculated as RPKM, resulting in 81,120 assembled transcripts with detectable expression signals (RPKM >0), in at least one of the developmental stage (Table 1). 74,150 and 59,526 transcripts were detected in embryos and megagametophytes, respectively. Most of the transcripts (65%) were detected in both tissues, however the number of unique transcripts was threefold higher in embryos than in megagametophytes (Fig. 2a). The number of identified transcription factors (TFs) was also higher in embryos (Fig. 2b).
The total number of transcripts detected at each developmental stage during seed development increased in embryos, but decreased in megagametophytes (Table 2). Around 15,000 transcripts were expressed at all developmental stages both in embryos and in megagametophytes. The number of unique transcripts detected at specific developmental stages was fairly constant in the embryos, but decreased in the megagametophytes during seed development from 10,907 transcripts at stage M1 to 3201 transcripts at stage M4 (Additional file 1: Figure S1A and B). Out of 7404 TFs identified during early embryo development, 3734 TFs (50%) were detected at all developmental stages, and about 140 TFs were only detected at one developmental stage (Additional file 1: Figure S1C). In megagametophytes, 3775 TFs (56%) were detected at all developmental stages, however, the number of TFs detected at only one developmental stage decreased during seed development (Additional file 1: Figure S1D).
Table 2 Number of transcripts and TFs detected in embryos and megagametophytes at different developmental stages (RPKM > 0)
To test the reliability of the RNA-seq results, 30 transcripts (23 transcripts in embryos and 7 transcripts in megagametophytes) were selected for examination by qRT-PCR. The Pearson correlation coefficient between the expression profiles obtained by RNA-seq and qRT-PCR was calculated from each transcript separately (Additional file 2: Table S6). The correlation coefficient obtained was similar for most transcripts, except for a few transcripts at some time points.
Changes in transcript accumulation during seed development
Differentially expressed transcripts in pairwise comparisons between embryos and megagametophytes during seed development
To identify differentially expressed transcripts (DETs) we performed pairwise comparisons between embryos and megagametophytes at the same developmental stage. In total 18,638 transcripts were up-regulated with a fold change higher than 2 (FC > 2) in at least one of the pairwise comparisons between embryos and megagametophytes (Additional file 3: Figure S2A, Additional file 4). 12,906 transcripts were up-regulated in embryos and 5732 in megagametophytes. The greatest difference in the number of up-regulated transcripts between embryos and megagametophytes was observed at developmental stage 2 (Additional file 3: Figure S2B).
About 54% of the DETs up-regulated in embryos and 58% of the DETs up-regulated in megagametophytes could be GO annotated (Additional file 3: Figure S2A). Cellular and metabolic processes were the most dominant groups in the Biological Process category both in embryos and megagametophytes (Fig. 3). Furthermore, transcripts assigned to response to stimulus were over-represented in megagametophytes. In both embryos and megagametophytes, enriched GO terms in the Molecular Function category included catalytic and binding activities, and in the Cell Component category the subcategories cell and cell part were the most abundant.
By increasing the GO annotation level, it was found that transcripts up-regulated in embryos were enriched for diverse Biological Processes such as cellular component biogenesis and cellular and metabolic processes related to chromosome organization, DNA packaging, translation and gene expression (Fig. 4a and Additional file 3: Figure S3). In the megagametophytes the up-regulated transcripts were highly enriched in response to stimulus, such as response to stress and to chemical and endogenous stimulus, including response to abscisic acid (ABA) (Fig. 4b and Additional file 3: Figure S4). In the Molecular Function category, assignments in the embryos were mainly related to DNA binding and structural constituent of ribosome. Both activities are highly related to gene expression and protein synthesis. In the megagametophytes, transcripts functioning in nutrient reservoir activity were highly over-represented (FDR = 2.82e-92). Transcripts identified in embryos for the Cell Component category showed enrichment for nucleus, ribosome and protein-DNA complex (Fig. 4a) and transcripts in megagametophytes were enriched mainly in protein body component (Fig. 4b). As expected, transcripts up-regulated in embryos showed GO enrichment for different cellular processes and functions in DNA-packaging, translation and gene expression. These processes are important during active cell proliferation [29]. Transcripts up-regulated in megagametophytes were enriched for accumulation of storage material and response to chemical and endogenous stimuli. This might indicate that the megagametophyte, in a similar way as the endosperm, can sense environmental signals and induce the corresponding signalling pathways for regulating embryo development [30].
We carried out pairwise comparisons between the group of transcripts showing the highest differences in abundance between embryos and megagametophytes at each developmental stage (Additional file 5). Transcripts related to members of the Arabidopsis cytochrome P450 gene family (CYP78A7, CYP78A8 and CYP71B22) showed, at all developmental stages, high accumulation in embryos but low or no accumulation (RPKM close to 0) in megagametophytes. Up-regulated transcripts in megagametophytes were mainly related to the Arabidopsis 12S seed storage protein family (CRB, CRC, CRD), also known as cruciferins. These proteins are involved in nutrient reservoir activity and are the major sources of nitrogen and carbon during early seed germination [31]. The RPKM values of cruciferin-related transcripts were similar at all developmental stages. The majority of the transcripts up-regulated in the megagametophytes had no hits against the TAIR database (Additional file 5).
At stage E1, E2 and E3DO, transcripts related to genes encoding for cell wall modifications (expansins, cellulose metabolism, endoglucanase, pectin-acetylesterase and pectin-lyase) were detected. Specifically at stage E1, a putative homolog to SOMATIC EMBRYOGENESIS RECEPTOR-LIKE KINASE1 (SERK1), as well as transcripts related to genes involved in response to auxin and other hormones such as INDOLE-3ACETATE O-METHYLTRANSFERASE 1 (IAMT1), SKP1-LIKE PROTEIN 1A (SKP1A), GAMMA-VACUOLAR-PROCESSING ENZYME (GAMMA-VPE), GIBBERELLIN-REGULATED PROTEIN 2 (GASA2) and GLUTATHIONE S-TRANSFERASE U17 (GSTU17) were highly abundant (Additional file 5, Up in E1). Transcripts related to nucleosome assembly (histones) were detected at all developmental stages except at stage E1. Other transcripts up-regulated from stage 2 onwards coded for proteins related to stress responses i.e. non-specific LIPID-TRANSFER PROTEIN 3 (LTP3), SUGAR TRANSPORT PROTEIN 13 (STP13) or ABSCISIC ACID INSENSITIVE 4 (ABI4). Transcripts up-regulated at stage E4 included transcripts related to cell signalling, negative regulation of cell division and cell wall loosening, as well as transcripts related to development such as PROTEIN RALF-like 34 (RALFL34), FAMA, PLANTACYANIN (ARPN) or PECTIN ACETYLESTERASE (PAE9) (Additional file 5, Up in E4).
In total 7704 TFs were detected during early seed development (Table 1). Out of these TFs, 2890 were differentially expressed with a fold change higher than two between embryos and megagametophytes (Additional file 6). The differentially expressed TFs belonged to 78 families, of which the bHLH, FAR1, TRAF and NAC families were the largest (Additional file 7: Figure S5 and Additional file 6, Family distribution). In general, the number of TFs belonging to each family was higher in embryos than in megagametophytes. Interestingly, some of the TF families were enriched differently in embryos and megagametophytes during seed development e.g. for bHLH, C3H, NAC, AP2-EREBP and TRAF (Additional file 7: Figure S6). In addition, sixteen TF families were detected only in embryos and four TF families were detected only in megagametophytes. In general, several TFs belonging to families specifically expressed in embryos were involved in plant growth and development, while TF families detected only in megagametophytes were related to responses to stress and other stimuli [32–34].
Differentially expressed transcripts during embryo development
In total, 18,234 DETs with a fold change higher than two were identified in the pairwise comparisons between embryos at different developmental stages (Additional file 8: Figure S7). When including only transcripts with a RPKM > 10, 6669 DETs were detected. To provide an overview of the expression patterns of these DETs during embryo development, k-means clustering analysis was performed (DETs from subordinate embryos were excluded from this analysis). Four types of expression profiles were detected, where type I and II included four clusters each and type III and IV include two clusters (Fig. 5). The accumulation of transcripts belonging to type I increased throughout the course of embryo development. Transcripts in cluster 1 and 8 were specifically enriched for processes related to response to abiotic stress, and transcripts in cluster 3 and 7 were highly enriched for nutrient reservoir activity (FDR = 2.10e-47), response to ABA and other hormones. The expression of type II transcripts decreased during embryo development. However, the accumulation pattern differed among the four clusters. Transcripts in cluster 9 and 12 were abundant for cell wall modification, toxin and carbohydrate metabolic processes, while cluster 11 included a higher number of transcripts with a function in structural constituents of ribosomes. Type III transcripts showed high accumulation at only one intermediate developmental stage (E2 or E3DO). Transcripts within cluster 5, mainly accumulated at stage E2, were highly enriched for nutrient reservoir activity. However, no significant GO enrichment was obtained for cluster 4. The expression level of type IV transcripts was either high or low at both E2 and E3DO stages. Cluster 2 included transcripts involved in DNA packaging and protein-DNA complex assembly. Together the GO enrichment analyses of the clusters showed that the abundance of transcripts related to stress response and nutrient activity increased during embryo development, while the abundance of transcripts related to cell wall modification decreased.
When comparing embryos at consecutive developmental stages, including subordinate embryos, 4411 DETs were detected. The highest number of DETs (2667) was detected in the comparison between embryos at stage E1 and E2, and 80% (2152) of these DETs were only detected in this pairwise comparison (Fig. 6a and b). DETs highly accumulated at stage E1, were enriched for Biological Processes related to cell wall loosening, organization and modification, with a beta-expansin (EXPB1)-related transcript having the highest fold-change (Additional file 9, E1xE2 Up). Furthermore, 28 TFs involved in several developmental processes were detected, out of which transcripts related to LOB DOMAIN-CONTAINING PROTEIN 29 (LBD29) and SERK1, as well as some members belonging to the homeobox-leucine zipper protein family (HAT5 and HB13), showed a high fold-change (Additional file 8: Table S7 and Additional file 10, E1xE2 Up). Transcripts that were over-represented in E2 were enriched for processes related to response to ABA, hormone stimulus, nucleosome organization and nutrient reservoir activity (Additional file 9, E1xE2 Down). These DETs included 21 TFs that were GO annotated for developmental processes (Additional file 8: Table S7 and Additional file 10, E1xE2 Down).
Close to 660 DETs were identified when comparing embryos at stage E2 and E3DO (Fig. 6a). Transcripts assigned to response to ABA and hormone stimulus showed higher accumulation at stage E2, and those involved in response to abiotic stress were enriched at stage E3DO (Additional file 9, E2xE3DO). When comparing dominant embryos at stage E3DO and stage E4, 1087 DETs were detected (Fig. 6a). Transcripts up-regulated in E3DO embryos were mainly related to axis specification processes, while transcripts up-regulated at stage E4 were involved in processes related to response to hormone stimulus and lipid transport (LTP3 and LTP4) (Additional file 9, E3DOxE4 Down). TFs, differentially expressed in embryos at stage E3DO and E4, which were annotated to developmental processes, included transcripts related to AUXIN RESPONSIVE FACTOR 2 (ARF2), LEUNIG (LUG), WUSCHEL-RELATED HOMEOBOX (WOX), CYP78A7 and ARABIDOPSIS NAC DOMAIN CONTAINING PROTEIN 9 (ANAC009) (Additional file 8: Table S7 and Additional file 10).
By comparing dominant and subordinate embryos at stage E3, it was possible to detect 748 DETs (Fig. 6a). Many of the transcripts up-regulated in dominant embryos were related to carbohydrate metabolic processes and axis specification processes (Additional file 9, E3DOxE3SU Up). DETs enriched in subordinate embryos were involved in response to water stress (including water deprivation) and lipid transport. NAC and HB were the largest TF families in dominant embryos, while in subordinates MYB-related factors were the most abundant (Additional file 10, TF families).
A schematic summary of the results obtained from the pairwise comparisons between consecutive stages during embryo development is presented in Fig. 7. Together our results show that processes involved in cell-wall modifications, hormone signalling, axis specification and stress-induced responses are activated during early embryo development. A strict regulation of cell division, elongation and adhesion is critical during embryonic patterning formation. Auxin is perhaps the most pervasive signalling molecule in plants and has been implicated in many developmental processes including embryogenesis in both angiosperms and conifers [35–37]. In several studies it has been shown that genes related to stress are over-represented during early embryo development [12, 16, 17, 38]. Furthermore, many of the differentially expressed TFs that belong to the largest families (bHLH, FAR1, NAC and AP2-EREBP) are related to cellular and developmental processes, hormone signalling and stress responses [39–42].
Differentially expressed transcripts between different developmental stages in megagametophytes
DETs identified in the pairwise comparisons between megagametophytes at consecutive developmental stages were also subjected to k-means clustering, resulting in 12 different clusters grouped into five types of expression profiles (Additional file 11: Figure S8). No significant GO enrichment processes or functions (FDR < 0.05) were assigned to any of the clusters. The accumulation of type I transcripts increased from stage M1 to stage M4. Transcripts related to response to stimulus and regulation of biological process were the most abundant. The expression of type II transcripts decreased from stage M1 to stage M4. Type II clusters were abundant in transcripts with GO terms associated with cell wall organization, and reproductive and developmental processes. Type III transcripts accumulated either at stage M2 or stage M3. GO terms assigned to clusters in type III were mainly related to response to stimulus. Type IV and V included only one small cluster each. Cluster 6, with transcripts accumulating both at stages M2 and M3, presented a high percentage of DETs responding to stimulus, while transcripts from cluster 11 were annotated only for metabolic and cellular processes.
A total of 600 DETs (FC < 2, RPKM > 10) were detected in the pairwise comparisons between megagametophytes at consecutive developmental stages. No significant enriched processes or functions were found in any of the pairwise comparisons. Similar to embryos, the highest number of DETs was detected during the transition from stage M1 to M2, and 85% of the DETs were specifically detected in this pairwise comparison (Additional file 11: Figure S9A and B). The number of transcripts annotated for developmental processes decreased from 10 at stage M1 to 2 at stage M4. Meanwhile, the number of DETs with GO terms associated with response to stress and stimulus remained more constant (Additional file 11: Figure S9C). In addition, transcripts encoding for proteins belonging to the small Heat Shock Protein (sHSP) family, known for its role in stress response, showed similar accumulation during all developmental stages (Additional file 12, M1xM2 Down, M2xM3 Up).
Ten DETs with assigned GO terms related to development were specifically detected at stage M1, in which putative homologs to expansins (EXPB1s) and AGAMOUS-like MADS-box (AGL11) were included. AGL11 was not detected in embryos at any developmental stage. In accordance, AGL11 was expressed in the endosperm but not embryos of Brachypodium distachyon (Expression Atlas data from EMBL-EBI). Several transcripts related to metal ion transport, e.g. Copper Transporter 5 (COPT5) and Zinc Transporter 11 (ZIP11), were detected at stage M1 (Additional file 12, M1xM2 Up). ZIP transporters participating in ion translocation during embryo and endosperm development have been detected in maize seeds [43]. At stage M4, a transcript related to AtEP3, encoding for an endochitinase, was highly abundant (Additional file 12, M3xM4 Down). A homologous gene, Chia4-Pa1, has been shown to be expressed in the single cell-layered zone surrounding the corrosion cavity in the megagametophyte in Norway spruce seeds [44].
Expression of selected transcripts during early embryo development
The transcript levels of selected DETs were tested by qRT-PCR in four biological replicates. The selection was based on the estimated expression (RPKM values) obtained from the transcriptome data and functional annotations of homologous genes in other species, mainly Arabidopsis, that have been related to embryo development. Gene sequence information in conifers is limited, thus for convenience, we refer each conifer transcript to the Arabidopsis gene that it shares most sequence similarity to. We have taken this approach for making it possible to get a general idea about which processes might be important during early embryo development. The results generated from qRT-PCR analysis are presented in Fig. 8.
Transcripts related to ENDO-BETA-MANNASE 7 (MAN7), TRANSPARENT TESTA7 (TT7), EXPB1, SERK1, LTP4, and HAP3A were highly abundant at stage E1 and decreased significantly at stage E2.
In Arabidopsis seeds, the mannanase-encoding gene, AtMAN7, is expressed in the micropylar endosperm and in the radicle tip just before radicle emergence [45]. We assume that the high expression of PsMAN7 in E1 embryos might facilitate their penetration into the nutritious megagametophyte.
Early embryogenesis is a critical developmental phase when the apical-basal polarity is established through directional auxin transport mainly mediated by auxin influx and efflux carriers [46]. In addition, flavonols can act as negative regulators of auxin transport [47]. AtTT7 encodes flavonoid 3′hydroxylase, a flavonol biosynthetic enzyme. Down-regulation of PsTT7 from stage E2 might reflect that auxin transport is increased from the cleavage stage and during further embryo development. Previous studies have associated the action of expansins in cell wall loosening, expansion, dissemble or separation [48, 49]. A high expression of PsEXPB1 at stage E1 might indicate the importance of loosening the cell walls to allow separation of the four early embryos. AtSERK1 marks cells that are competent to form embryos, and it also influences the competence of the cells to differentiate into embryos [50]. The high expression of PsSERK1 at stage E1 might be important for stimulating the four apical cells to differentiate into separate embryos and thereby stimulate the cleavage process to start. Directly after the first cleavage, the four new embryos should develop further. We assume that down-regulation of PsSERK1 is important for blocking a second round of cleavage.
Another set of transcripts, represented by putative homologs to CYP78A7, DWARF IN LIGHT 1 (DFL1) and ROP-INTERACTIVE CRIB MOTIF-CONTAINING PROTEIN 3 (RIC3), were also highly abundant at stage E1 but declined successively during later developmental stages.
Cytochrome P450s are involved in the metabolism of most phytohormones and many secondary metabolites in plant cells. Overexpression of a member of the CYP78A family in rice (Oryza sativa) promotes cell proliferation but reduces the size of the embryos [51]. Furthermore, the gene product of AtDFL1, which is involved in auxin signal transduction, can inhibit cell elongation [52]. Before the development of the dominant embryo, the four early embryos are equal-sized. Although it is not known which mechanisms are restricting the growth of the embryonal mass of stage 2 embryos, our results indicate that PsCYP78A and PsDFL1 might be involved. AtRIC3 is important for tip growth of pollen tubes [53]. Early embryos in Pinus, developing after the cleavage process, begin their development by apical cell growth [7]. A high expression of PsRIC3 at stage E1 and E2 might reflect that these embryos develop by apical cell growth.
The level of transcripts related to NAC009, FAMA, PROTODERMAL FACTOR2 (PDF2), and VIVIPAROUS1 (VP1) were low at stage E1 but increased during later stages. Putative homologs to PLANTACYANIN (ARPN) and GLYOXALASE I (GLOI) showed a higher accumulation at stage E3DO. In addition, a peak of transcript abundance in subordinate embryos was observed for transcripts related to WOX2, WOX8/9, AINTEGUMENTA-like 5 (PsAIL5), PHOSPHOGLUCAN WATER DIKINASE (PWD), MYO-INOSITOL OXYGENASE 1 (MIOX1), ALFIN-like 3 (AL3) and a transcript encoding Auxin-dormancy-related protein.
ANAC009 is expressed in root cap stem cells where it promotes periclinal cell divisions [54]. FAMA, a basic helix-loop-helix protein, regulates a critical switch between division and differentiation during stomatal development [55]. In Norway spruce, the apical-basal polarization during early embryogeny proceeds through the establishment of the meristematic cells of the embryonal mass and the terminally differentiated, expanding suspensor cells [56]. The high expression of PsNAC009 and PsFAMA at stage E3DO and E4, but not at stage E3SU, may reflect the importance of correct cell division pattern for the development of dominant embryos. PaWOX8/9 regulates the orientation of the cell division plane in the basal part of the embryonal mass during early and late embryogeny in Norway spruce [57]. In accordance, PsWOX8/9 was expressed at all analysed developmental stages, however, the expression was significantly higher at stage E3SU. Although we do not know how overexpression of PaWOX8/9 affects early embryo development, it is tempting to assume that the high expression of PsWOX8/9 in subordinate embryos inhibits further development of the embryos or is only a consequence of a blocked development caused by other factors. In Arabidopsis plants overexpressing AtARPN, the endothecium degenerates, probably as a consequence of plantacyanin-induced precocious PCD [58]. The terminally differentiated suspensor cells in early embryos of Scots pine and Norway spruce are eliminated by PCD [9, 59]. The high expression of PsARPN in E3DO and E4 embryos, but not in E3SU, coincides with the degeneration of the suspensor cells. Taken together, these results suggest that the apical-basal polarization is strictly regulated in dominant embryos but not in subordinate embryos.
Picea abies HOMEOBOX 1 (PaHB1), a homolog of AtPDF2, and PaWOX2 are important for specification of the protoderm in somatic embryos of Norway spruce [60, 61]. Furthermore the expression of a Norway spruce LTP gene (Pa18) switches from a uniform expression in proembryogenic masses to a protoderm-specific localization in developing somatic embryos [62]. We assume that the differential expression of PsLTP4 and PsPDF2 is reflecting specification of the protoderm, which would indicate that radial patterning is regulated in a similar way in dominant and subordinate embryos. The expression pattern of PsWOX2 was similar to that of PsWOX8/9, both transcripts were expressed at all developmental stages, but with significantly higher levels in subordinate embryos. This is probably related to the blocked development of the subordinate embryos.
We have previously shown that PsHAP3A is expressed during the morphogenic phase and PsVP1 during the maturation phase [63]. PsHAP3A was down-regulated in both E3DO and E3SU embryos, while PsVP1 was up-regulated in E3DO but not in E3SU embryos. Overexpression of the Arabidopsis EMBRYOMAKER (AtEMK), which is identical to AtAIL5, results in the formation of embryo-like structures on seedlings [64]. The authors concluded that AtEMK is a key player to maintain embryonic identity. The PsAIL5 transcript was down-regulated in E3DO but not in E3SU embryos. Taken together, the low expression of PsVP1 and high expression of PsAIL5 in the subordinate embryos indicate that the transition from the morphogenic phase to the maturation phase is not completed in the subordinate embryos.
Although the functions of genes encoding for proteins related to PWD, MIOX1, AL3 and an Auxin-dormancy related protein during embryo development are not known, the fact that they are differentially expressed in E3DO and E3SU embryos indicates differences in metabolic processes between dominant and subordinate embryos.
Based on the processes identified in the GO enrichment analyses and the expression of the selected transcripts we suggest that processes related to embryogenic competence and cell wall loosening are involved in activating the cleavage process. Directly after cleavage, the growth of the embryos is restricted. Apical-basal polarization is strictly regulated in the dominant embryo, which has reached the maturation phase. However, functional studies must be performed before we will understand the processes controlling the successive development of the early embryos.