Background

Temperate trees face a wide range of environmental conditions including highly contrasted seasonal changes. Among the strategies to enhance survival under unfavourable climatic conditions, bud dormancy is crucial for perennial plants since its progression over winter is determinant for optimal growth, flowering and fruit production during the subsequent season. Bud dormancy has long been compared to an unresponsive physiological phase, in which metabolic processes within the buds are halted by cold temperature and/or short photoperiod. However, several studies have shown that bud dormancy progression can be affected in a complex way by temperature, photoperiod or both, depending on the tree species [1,2,3,4,5]. Bud dormancy has traditionally been separated into three main phases: (i) paradormancy, also named “summer dormancy” [6]; (ii) endodormancy, mostly triggered by internal factors; and (iii) ecodormancy, controlled by external factors [7, 8]. Progression through endodormancy requires cold accumulation whereas warmer temperatures, i.e. heat accumulation, drive the competence to resume growth over the ecodormancy phase. Dormancy is thus highly dependent on external temperatures, and changes in seasonal timing of bud break and blooming have been reported in relation with global warming. Notably, advances in bud break and blooming dates in spring have been observed for tree species, such as apple, cherry, birch, oak or Norway spruce, in the northern hemisphere, thus increasing the risk of late frost damages [9,10,11,12,13,14], while insufficient cold accumulation during winter may lead to incomplete dormancy release associated with bud break delay and low bud break rate [15, 16]. These phenological changes directly impact the production of fruit crops, leading to large potential economic losses [17]. Consequently, it becomes urgent to acquire a better understanding of bud responses to temperature stimuli in the context of climate change in order to tackle fruit losses and anticipate future production changes.

In the recent years, an increasing number of studies have investigated the physiological and molecular mechanisms of bud dormancy transitions in perennials using RNA sequencing technology, thereby giving a new insight into potential pathways involved in dormancy. The results suggest that the transitions between the three main bud dormancy phases (para-, endo- and eco- dormancy) are mediated by pathways related to DORMANCY ASSOCIATED MADS-box (DAM) genes [18], phytohormones [19,20,21,22], carbohydrates [22, 23], temperature [24, 25], photoperiod [26], reactive oxygen species [27, 28], water deprivation [26], cold acclimation and epigenetic regulation [29]. Owing to these studies, a better understanding of bud dormancy has been established in different perennial species [18, 30, 31]. However, we are still missing a fine-resolution temporal understanding of transcriptomic changes happening over the entire bud development, from bud organogenesis to bud break.

Indeed, the small number of sampling dates in existing studies seems to be insufficient to capture all the information about changes occurring throughout the dormancy cycle as it most likely corresponds to a chain of biological events rather than an on/off mechanism. Many unresolved questions remain: What are the fine-resolution dynamics of gene expression related to dormancy? Are specific sets of genes associated with dormancy stages? Since the timing for the response to environmental cues is cultivar-dependant [32, 33], are transcriptomic profiles during dormancy different in cultivars with contrasted flowering date?

To explore these mechanisms, we conducted a transcriptomic analysis of sweet cherry (Prunus avium L.) flower buds from bud organogenesis until the end of bud dormancy using next-generation sequencing. Sweet cherry is a perennial species highly sensitive to temperature [34] and we focused on three sweet cherry cultivars displaying contrasted flowering dates. We carried out a fine-resolution time-course spanning the entire bud development, from flower organogenesis in July to flowering in spring of the following year (February to April), encompassing para-, endo- and ecodormancy phases. Our results indicate that transcriptional changes happening during dormancy are conserved between different sweet cherry cultivars, opening the way to the identification of key factors involved in the progression through bud dormancy.

Results

Transcriptome accurately captures the dormancy state

In order to define transcriptional changes happening over the sweet cherry flower bud development, we performed a transcriptomic-wide analysis using next-generation sequencing (RNA-seq) from bud organogenesis to flowering. According to bud break percentage (Fig. 1a), morphological observations (Fig. 1b), average temperatures (see Additional file 1: Figure S1a ) and descriptions from Lang et al., (1987), we assigned five main stages to the flower buds samples (Fig. 1c): i) flower bud organogenesis occurs in July and August; ii) paradormancy corresponds to the period of growth cessation, that we arbitrarily delimited to September; iii) during the endodormancy phase, initiated in October, buds are unresponsive to forcing conditions therefore the increasing bud break percentage under forcing conditions suggests that endodormancy was released on 9th December 2015, 29th January 2016, and 26th February 2016 for the three cultivars ‘Cristobalina’, ‘Garnet’ and ‘Regina’, respectively, thus corresponding to iv) dormancy release; and v) ecodormancy starting from the estimated dormancy release date until flowering. We harvested buds at 11 dates spanning all these bud stages for the sweet cherry cultivars ‘Cristobalina’, ‘Garnet’ and ‘Regina’, and generated a total of 81 transcriptomes (RNA-seq samples in Additional file 2: Table S1). First, in order to explore the transcriptomic characteristics of each bud stage separately from the cultivar effect, we focused the analysis on the early flowering cultivar ‘Garnet’.

Fig. 1
figure 1

Dormancy status under environmental conditions and RNA-seq sampling dates. a Evaluation of bud break percentage under forcing conditions was carried out for three sweet cherry cultivars displaying different flowering dates: ‘Cristobalina’, ‘Garnet’ and ‘Regina’ for the early, medium and late flowering cultivars, respectively. The dashed and dotted lines correspond to the dormancy release date, estimated at 50% of buds at BBCH stage 53 [35], and the flowering date, respectively. b Pictures of the sweet cherry buds corresponding to the different sampling dates. c Sampling time points for the transcriptomic analysis are represented by coloured stars. Red for ‘Cristobalina, green for ‘Garnet’ and blue for ‘Regina’

Using DESeq2 and a threshold of 0.05 on the adjusted p-value, we identified 6683 genes that are differentially expressed (DEGs) between the dormant and non dormant bud stages for the sweet cherry cultivar ‘Garnet’ (Additional file 2: Table S2). When projected into a two-dimensional space (Principal Component Analysis, PCA), data for these DEGs show that transcriptomes of samples harvested at a given date are projected together (Fig. 2), showing the high quality of the biological replicates and that different trees are in a very similar transcriptional state at the same date. Very interestingly, we also observe that flower bud stages are clearly separated on the PCA, with the exception of organogenesis and paradormancy, which are projected together (Fig. 2). The first dimension of the analysis (PC1) explains 41.63% of the variance and clearly represents the strength of bud dormancy where samples on the right of the axis are in late endodormancy (Dec) or dormancy release stages, while samples on the left of the axis are in organogenesis and paradormancy. Samples harvested at the beginning of the endodormancy (Oct and Nov) are mid-way between samples in paradormancy and in late endodormancy (Dec) on PC1. The second dimension of the analysis (PC2) explains 20.24% of the variance and distinguishes two main phases of the bud development: before and after dormancy release. We obtain very similar results when performing the PCA on all genes (Additional file 1: Figure S2). These results indicate that the transcriptional state of DEGs accurately captures the dormancy state of flower buds.

Fig. 2
figure 2

Separation of samples by dormancy stage using differentially expressed genes . The principal component analysis was conducted on the TPM (transcripts per millions reads) values for the differentially expressed genes in the cultivar ‘Garnet’ flower buds, sampled on three trees between July and March. Samples in organogenesis are red points, samples in paradormancy are yellow points, samples in endodormancy are dark blue points, samples at dormancy release are light blue points and samples in ecodormancy are green points. Each point corresponds to one sampling time in a single tree

Bud stage-dependent transcriptional activation and repression are associated with different pathways

We further investigated whether specific genes or signalling pathways could be associated with the different flower bud stages. For this, we performed a hierarchical clustering of the DEGs based on their expression in all samples. We could group the genes in ten clusters clearly showing distinct expression profiles throughout the bud development (Fig. 3). Overall, three main types of clusters can be discriminated: the ones with a maximum expression level during organogenesis and paradormancy (cluster 1: 1549 genes; cluster 2: 70 genes; cluster 3: 113 genes; cluster 4: 884 genes and cluster 10: 739 genes, Fig. 3), the clusters with a maximum expression level during endodormancy and around the time of dormancy release (cluster 5: 156 genes; cluster 6: 989 genes; cluster 7: 648 genes and cluster 8: 612 genes, Fig. 3), and the clusters with a maximum expression level during ecodormancy (cluster 9: 924 genes and cluster 10: 739 genes, Fig. 3). This result shows that different groups of genes are associated with these three main flower bud phases. Interestingly, we also observed that during the endodormancy phase, some genes are expressed in October and November then repressed in December (cluster 4, Fig. 3), whereas another group of genes is expressed in December (clusters 8, 5, 6 and 7, Fig. 3) therefore separating endodormancy in two periods with distinct transcriptional states, which supports the PCA observation.

Fig. 3
figure 3

Clusters of expression patterns for differentially expressed genes in the sweet cherry cultivar ‘Garnet’. Heatmap for ‘Garnet’ differentially expressed genes during bud development. Each column corresponds to the gene expression for flower buds from one single tree at a given date. Each row corresponds to the expression pattern across samples for one gene. Clusters of genes are ordered based on the chronology of the expression peak (from earliest – July, 1-dark green cluster – to latest – March, 9 and 10). Expression values were normalized and z-scores are represented here

In order to explore the functions and pathways associated with the gene clusters, we performed a GO enrichment analysis for each of the ten identified clusters (Fig. 4, Additional file 1: Figure S3). GO terms associated with the response to stress as well as biotic and abiotic stimuli were enriched in the clusters 2, 3 and 4, with genes mainly expressed during organogenesis and paradormancy. In addition, we observed high expression of genes associated with floral identity before dormancy, including AGAMOUS-LIKE20 (PavAGL20) and the bZIP transcription factor PavFD (Fig. 5). On the opposite, at the end of the endodormancy phase (cluster 6, 7 and 8), we highlighted different enrichments in GO terms linked to basic metabolisms such as nucleic acid metabolic processes or DNA replication but also to response to alcohol and abscisic acid (ABA). For example, ABA BINDING FACTOR 2 (PavABF2), Arabidopsis thaliana HOMEOBOX 7 (PavATHB7) and ABA 8′-hydroxylase (PavCYP707A2), associated with the ABA pathway, as well as the stress-induced gene PavHVA22, were highly expressed during endodormancy (Fig. 5). During ecodormancy, genes in cluster 9 and 10 are enriched in functions associated with transport, cell wall biogenesis as well as oxidation-reduction processes (Fig. 4; Additional file 1: Figure S3). Indeed, we identified the GLUTATHION S-TRANSFERASE8 (PavGST8) gene and a peroxidase specifically activated during ecodormancy (Fig. 5). However, oxidation-reduction processes are likely to occur during endodormancy as well, as suggested by the expression patterns of GLUTATHION PEROXIDASE 6 (PavGPX6) and GLUTATHION REDUCTASE (PavGR). Interestingly, AGAMOUS (PavAG) and APETALA3 (PavAP3) showed an expression peak during ecodormancy (Fig. 5). These results show that different functions and pathways are specific to flower bud development stages.

Fig. 4
figure 4

Enrichments in gene ontology terms for biological processes and average expression patterns in the different clusters in the sweet cherry cultivar ‘Garnet’. a Using the topGO package [36], we performed an enrichment analysis on GO terms for biological processes based on a classic Fisher algorithm. Enriched GO terms with the lowest p-value were selected for representation. Dot size represents the number of genes belonging to the clusters associated with the GO term. b Average z-score values for each cluster. The coloured dotted line corresponds to the estimated date of dormancy release

Fig. 5
figure 5

Expression patterns of key genes involved in sweet cherry bud dormancy. Expression patterns, expressed in transcripts per million reads (TPM) were analysed for the cultivar ‘Garnet’ from August to March, covering bud organogenesis (O), paradormancy (P), endodormancy (Endo), and ecodormancy (Eco). Dash lines represent the estimated date of dormancy release

We further investigated whether dormancy-associated genes were specifically activated and repressed during the different bud stages. Among the six annotated DAM genes, four were differentially expressed in the dataset. PavDAM1, PavDAM3 and PavDAM6 were highly expressed during paradormancy and at the beginning of endodormancy (cluster 4, Fig. 5) whereas the expression peak for PavDAM4 was observed at the end of endodormancy (cluster 6, Fig. 5). In addition, we found that genes coding for 1,3-β-glucanases from the Glycosyl hydrolase family 17 (PavGH17), as well as a PLASMODESMATA CALLOSE-BINDING PROTEIN 3 (PavPDCB3) gene were repressed during dormancy (clusters 1 and 10, Fig. 5).

Specific transcription factor target genes are expressed during the main flower bud stages

To better understand the regulation of genes that are expressed at different flower bud stages, we investigated whether some transcription factors (TFs) targeted genes in specific clusters. Based on a list of predicted regulation between TFs and target genes that is available for peach in PlantTFDB [37], we identified the TFs with enriched targets in each cluster (Table 1). We further explored these target genes and their biological functions with a GO enrichment analysis (Additional file 2: Tables S3, S4). Moreover, to have a complete overview of the TFs’ targets, we also identified enriched target promoter motifs in the different gene clusters (Table 2), using motifs we discovered with Find Individual Motif Occurrences (FIMO) [39] and reference motifs obtained from PlantTFDB 4.0 [37]. We decided to focus on results for TFs that are themselves DEGs between dormant and non-dormant bud stages. Results show that different pathways are activated throughout bud development.

Table 1 Transcription factors with over-represented targets in the different clusters
Table 2 Transcription factors with over-represented target motifs in the different clusters

Among the genes expressed during the organogenesis and paradormancy phases (clusters 1, 2, 3 and 4), we observed an enrichment for motifs targeted by several MADS-box TFs such as AGAMOUS (AG), APETALA3 (AP3) and SEPALLATA3 (SEP3), several of them potentially involved in flower organogenesis [40]. On the other hand, for the same clusters, results show an enrichment in MYB-related targets, WRKY and ethylene-responsive element (ERF) binding TFs (Table 1, Table 2). Several members of these TF families have been shown to participate in the response to abiotic factors. Similarly, we found in the cluster 4 target motifs enriched for DEHYDRATION RESPONSE ELEMENT-BINDING2 (PavDREB2C), potentially involved in the response to cold [41]. PavMYB63 and PavMYB93 transcription factors, expressed during organogenesis and paradormancy, likely activate genes involved in secondary metabolism (Table 1, Additional file 2: Tables S3, S4).

During endodormancy, we found that PavMYB14 and PavMYB40 specifically target genes from cluster 10 that are involved in secondary metabolic processes and growth (Additional file 2: Tables S3, S4). Expression profiles suggest that PavMYB14 and PavMYB40 repress expression of these target genes during endodormancy (Additional file 1: Figure S4). This is consistent with the functions of Arabidopsis thaliana MYB14 that negatively regulates the response to cold [42]. One of the highlighted TFs was PavWRKY40, which is activated before endodormancy and preferentially regulates genes associated with oxidative stress (Table 1, and Additional files 1: Figure S4, Additional files 2: Table S4).

Interestingly, we observed a global response to cold and stress during endodormancy since we identified an enrichment of genes with motifs for several ethylene-responsive element binding TFs such as PavDREB2C in the cluster 5. We also observed an enrichment in the same cluster for PavABI5-targeted genes (Table 2). All these TFs are involved in the response to cold, in agreement with the fact that genes in the cluster 5 are expressed during endodormancy. Genes belonging to the clusters 6, 7 and 8 are highly expressed during deep dormancy and we found targets and target motifs for many TFs involved in the response to abiotic stresses. For example, we found motifs enriched in the cluster 7 for a TF of the C2H2 family, which is potentially involved in the response to a wide spectrum of stress conditions, such as extreme temperatures, salinity, drought or oxidative stress (Table 2 [43, 44];). Similarly, in the cluster 8, we also identified an enrichment in targets and motifs of many TFs involved in the response to ABA and to abiotic stimulus, such as PavABF2, PavAREB3, PavABI5, and PavDREB2C (Table 1, Additional file 2: Tables S3, S4) [41, 45]. Their targets include ABA-related genes HIGHLY ABA-INDUCED PP2C GENE 1 (PavHAI1), PavCYP707A2 that is involved in ABA catabolism, PavPYL8 a component of ABA receptor 3 and LATE EMBRYOGENESIS ABUNDANT PROTEIN (PavLEA), involved in the response to desiccation [4].

We also observe during endodormancy an enrichment for targets of PavRVE1, involved in the response to light and temperature (Table 1, [5, 46]), and PavRVE8 that preferentially target genes involved in cellular transport like LIPID TRANSFER PROTEIN1 (PavLP1, Additional file 2: Table S3). Interestingly, we found that among the TFs with enriched targets in the clusters, only ten display changes in expression during flower bud development (Table 1), including PavABF2, PavABI5 and PavRVE1. Expression profiles for these three genes are very similar, and are also similar to their target genes, with a peak of expression around the estimated dormancy release date, indicating that these TFs are positively regulating their targets (see Additional file 1: Figure S4).

Expression patterns highlight bud dormancy similarities and disparities between three cherry tree cultivars

Since temperature changes and progression through the flower bud stages are happening synchronously, it is challenging to discriminate transcriptional changes that are mainly associated with one or the other. In this context, we also analysed the transcriptome of two other sweet cherry cultivars: ‘Cristobalina’, characterized by very early flowering dates, and ‘Regina’, with a late flowering time. The span between flowering periods for the three cultivars is also found in the transition between endodormancy and ecodormancy since 10 weeks separated the estimated dates of dormancy release between the cultivars: 9th December 2015 for ‘Cristobalina’, 29th January 2016 for ‘Garnet’ and 26th February 2016 for ‘Regina’ (Fig. 1a). The three cultivars present differences in the chilling requirements for dormancy release (Fig. 1, Additional file 1: Figure S1b), and the heat accumulation before flowering (Fig. 1, Additional file 1: Figure S1c). The transition from organogenesis to paradormancy is not well documented and many studies suggest that endodormancy onset is under the strict control of environment in Prunus species [3]. Therefore, we considered that these two transitions occurred at the same time in all three cultivars. However, the 2 months and half difference in the date of transition from endodormancy to ecodormancy between the cultivars allow us to look for transcriptional changes associated with this transition independently of environmental conditions. Since the transition between endodormancy and ecodormancy happens at different dates for the three cultivars, buds in the same dormancy stage were harvested at different dates for the three cultivars. In that case, expression patterns that would be similar in the three cultivars would indicate that transcriptional states reflect the dormancy stage and not the harvest period. To do so, we analysed transcriptomes from buds harvested at ten dates for the cultivar ‘Cristobalina’, and eleven dates for the cultivar ‘Regina’, spanning all developmental stages from bud organogenesis to flowering. We compared the expression patterns between the three contrasted cultivars throughout flower bud stages for the genes we identified as differentially expressed in the cultivar ‘Garnet’ (Fig. 1b).

When projected into a PCA 2-components plane, all samples harvested from buds at the same stage cluster together, whatever the cultivar (Fig. 6 and Additional file 1: Figure S5), suggesting that the stage of the bud has more impact on the transcriptional state than time or external conditions. Interestingly, the 100 genes that contributed the most to the PCA dimensions 1 and 2 were very specifically associated with each dimension (Additional file 1: Figure S6, Additional file 2: Table S5). We further investigated which clusters were over-represented in these genes (Additional file 1: Figure S6b) and we found that genes belonging to the clusters 6 and 8, associated with endodormancy, were particularly represented in the best contributors to the dimension 1. In particular, we identified genes involved in oxidation-reduction processes like PavGPX6, and stress-induced genes such as PavLEA14, together with genes potentially involved in leaf and flower development, including GROWTH-REGULATING FACTOR7 (PavGRF7) and PavSEP1 (Table S5). In contrast, genes that best contributed to the dimension 2 strictly belonged to clusters 9 and 10, therefore characterized by high expression during ecodormancy (Additional file 1: Figure S6). These results suggest that bud stages can mostly be separated by two criteria: dormancy depth before dormancy release, defined by genes highly expressed during endodormancy, and the dichotomy defined by the status before/after dormancy release.

Fig. 6
figure 6

Separation of samples by dormancy stage and cultivar using differentially expressed genes. The principal component analysis was conducted on the TPM (transcripts per millions reads) values for the differentially expressed genes in the flower buds of the cultivars ‘Cristobalina’ (filled squares), ‘Garnet’ (empty circles) and ‘Regina’ (stars). Samples in organogenesis are red points, samples in paradormancy are yellow points, samples in endodormancy are dark blue points, samples at dormancy release are light blue points and samples in ecodormancy are green points. Each point corresponds to one sampling time in a single tree

To go further, we compared transcriptional profiles throughout the time course in all cultivars. For this we analysed the expression profiles in each cultivar for the clusters previously identified for the cultivar ‘Garnet’ (Fig. 7, see also Additional file 1: Figure S7). In general, averaged expression profiles for all clusters are very similar in all three cultivars, with the peak of expression happening at a similar period of the year. However, we can distinguish two main phases according to similarities or disparities between cultivars. First, averaged expression profiles are almost similar in all cultivars between July and November. This is especially the case for clusters 1, 4, 7, 8 and 9. On the other hand, we can observe a temporal shift in the peak of expression between cultivars from December onward for genes in clusters 1, 5, 6, 8 and 10. Indeed, in these clusters, the peak or drop in expression happens earlier in ‘Cristobalina’, and slightly later in ‘Regina’ compared to ‘Garnet’ (Fig. 7), in correlation with their dormancy release dates. These results seem to confirm that the organogenesis and paradormancy phases occur concomitantly in the three cultivars while temporal shifts between cultivars are observed after endodormancy onset. Therefore, similarly to the PCA results (Fig. 6), the expression profile of these genes is more associated with the flower bud stage than with external environmental conditions.

Fig. 7
figure 7

Expression patterns in the ten clusters for the three cultivars. Expression patterns were analysed from August to March, covering bud organogenesis (O), paradormancy (P), endodormancy (Endo), and ecodormancy (Eco). Dash lines represent the estimated date of dormancy release, in red for ‘Cristobalina’, green for ‘Garnet’ and blue for ‘Regina’. Average z-score patterns (line) and standard deviation (ribbon), calculated using the TPM values from the RNA-seq analysis, for the genes belonging to the ten clusters

Flower bud stage can be predicted using a small set of marker genes

We have shown that flower buds in organogenesis, paradormancy, endodormancy and ecodormancy are characterised by specific transcriptional states. In theory, we could therefore use transcriptional data to infer the flower bud stage. For this, we selected a minimum number of seven marker genes, one gene for each of the clusters 1, 4, 5, 7, 8, 9 and 10 (identified in Fig. 3), for which expression presented the best correlation with the average expression profiles of their cluster (Fig. 8). We aimed to select the minimum number of marker genes that are sufficient to infer the flower bud stage, therefore excluding the clusters 2, 3 and 6 as they either had very small number of genes, or had expression profiles very similar to another cluster.

Fig. 8
figure 8

Expression patterns for the seven marker genes in the three cultivars. Expression patterns were analysed from August to March, covering bud organogenesis (O), paradormancy (P), endodormancy (Endo), and ecodormancy (Eco). Dash lines represent the estimated date of dormancy release, in red for ‘Cristobalina’, green for ‘Garnet’ and blue for ‘Regina’. TPM were obtained from the RNA-seq analysis for the seven marker genes from clusters 1, 4, 5, 7, 8, 9 and 10. Lines represent the average TPM, dots are the actual values from the biological replicates. SRP: STRESS RESPONSIVE PROTEIN; TCX2: TESMIN/TSO1-like CXC 2; CSLG3: Cellulose Synthase like G3; GH127: Glycosyl Hydrolase 127; PP2C: Phosphatase 2C; UDP-GalT1: UDP-Galactose transporter 1; MEE9: maternal effect embryo arrest 9

Expression for these marker genes not only recapitulates the average profile of the cluster they originate from, but also temporal shifts in the profiles between the three cultivars (Fig. 8). In order to define if these genes encompass as much information as the full transcriptome, or all DEGs, we performed a PCA of all samples harvested for all three cultivars using expression levels of these seven markers (Additional file 1: Figure S8). The clustering of samples along the two main axes of the PCA using these seven markers is very similar, if not almost identical, to the PCA results obtained using expression for all DEGs (Fig. 6). This indicates that the transcriptomic data can be reduced to only seven genes and still provides accurate information about the flower bud stages.

To test if these seven markers can be used to define the flower bud stage, we used a multinomial logistic regression modelling approach to predict the flower bud stage in our dataset based on the expression levels for these seven genes in the three cultivars ‘Garnet’, ‘Regina’ and ‘Cristobalina’ (Fig. 9). For this, we trained and tested the model, on randomly picked sets, to predict the five bud stage categories, and obtained a very high model accuracy (100%; Additional file 1: Figure S9). These results indicate that the bud stage can be accurately predicted based on expression data by just using seven genes. In order to go further and test the model in an independent experiment, we analysed the expression for the seven marker genes by RT-qPCR on buds sampled from another sweet cherry tree cultivar ‘Fertard’ for two consecutive years (Fig. 9a, b). Based on these RT-qPCR data, we predicted the flower bud developmental stage using the parameters of the model obtained from the training set on the three cultivars ‘Garnet’, ‘Regina’ and ‘Cristobalina’. We achieved a high accuracy of 71% for our model when tested on RT-qPCR data to predict the flower bud stage for the ‘Fertard’ cultivar (Fig. 9c and Additional file 1: Figure S9c). In particular, the chronology of bud stages was very well predicted. This result indicates that these seven genes can be used as a diagnostic tool in order to infer the flower bud stage in sweet cherry trees.

Fig. 9
figure 9

Expression for the seven marker genes allows accurate prediction of the bud dormancy stages in the late flowering cultivar ‘Fertard’ during two bud dormancy cycles. a Relative expressions were obtained by RT-qPCR and normalized by the expression of two reference constitutively expressed genes PavRPII and PavEF1. Data were obtained for two bud dormancy cycles: 2015/2016 (orange lines and symbols) and 2017/2018 (blue lines and symbols). b Evaluation of the dormancy status in ‘Fertard’ flower buds during the two seasons using the percentage of open flower buds (BBCH stage 53). c Predicted vs experimentally estimated bud stages. SRP: STRESS RESPONSIVE PROTEIN; TCX2: TESMIN/TSO1-like CXC 2; CSLG3: Cellulose Synthase like G3; GH127: Glycosyl Hydrolase 127; PP2C: Phosphatase 2C; UDP-GalT1: UDP-Galactose transporter 1; MEE9: maternal effect embryo arrest 9

Discussion

In this work, we have characterised transcriptional changes at a genome-wide scale happening throughout cherry tree flower bud dormancy, from organogenesis to the end of dormancy. To do this, we have analysed expression in flower buds at 11 dates from July 2015 (flower bud organogenesis) to March 2016 (ecodormancy) for three cultivars displaying different dates of dormancy release, generating 81 transcriptomes in total. This resource, with a fine time resolution, reveals key aspects of the regulation of cherry tree flower buds during dormancy (Fig. 10). We have shown that buds in organogenesis, paradormancy, endodormancy and ecodormancy are characterised by distinct transcriptional states (Figs. 2, 3) and we highlighted the different pathways activated during the main cherry tree flower bud dormancy stages (Fig. 4 and Table 1). Finally, we found that just seven genes are enough to accurately predict the main cherry tree flower bud dormancy stages (Fig. 9).

Fig. 10
figure 10

From bud formation to flowering: transcriptomic regulation of flower bud dormancy. Our results highlighted seven main expression patterns corresponding to the main dormancy stages. During organogenesis and paradormancy (July to September), signalling pathways associated with flower organogenesis and ABA signalling are upregulated. Distinct groups of genes are activated during different phases of endodormancy, including targets of transcription factors involved in ABA signalling, cold response and circadian clock. ABA: abscisic acid

Our results show that the transcriptional state reflects the dormancy stage of the bud independently of the chilling requirement. Indeed, samples of the three cultivars at the same dormancy stage are very similar in terms of expression patterns, even if they correspond to samples harvested at different dates. Given this observation, we can speculate that the genes and pathways we find to be regulated at each dormancy stage are potentially involved in the control of this dormancy stage, and not just in the response to environmental conditions. We discuss below the main functions we find to be associated with each dormancy stage.

DAMs, floral identity and organogenesis genes characterize the pre-dormancy stages

To our knowledge, this is the first report on the transcriptional regulation of early stages of flower bud development in temperate fruit trees. Information on dormancy onset and pre-dormancy bud stages are scarce and we arbitrarily delimited the organogenesis and paradormancy in July/August and September, respectively. However, based on transcriptional data, we could detect substantial discrepancies suggesting that the definition of the bud stages can be improved. Indeed, we observe that samples harvested from buds during phases that we defined as organogenesis and paradormancy cluster together in the PCA, but away from samples harvested during endodormancy. Moreover, most of the genes highly expressed during paradormancy are also highly expressed during organogenesis. This is further supported by the fact that paradormancy is a flower bud stage predicted with less accuracy based on expression level of the seven marker genes. In details, paradormancy is defined as a stage of growth inhibition originating from surrounding organs [7] therefore it is strongly dependent on the position of the buds within the tree and the branch. Our results suggest that defining paradormancy for multiple cherry flower buds based on transcriptomic data is difficult and even raise the question of whether paradormancy can be considered as a specific flower bud stage. Alternatively, we propose that the pre-dormancy period should rather be defined as a continuum between organogenesis, growth and/or growth cessation phases. Further physiological observations, including flower primordia developmental context [47], could provide crucial information to precisely link the transcriptomic environment to these bud stages. Nonetheless, we found very few, if not at all, differences between the three cultivars for the expression patterns during organogenesis and paradormancy, supporting the hypothesis that pre-dormancy processes are not associated with the different timing in dormancy release and flowering that we observe between these cultivars.

Our results showed that specific pathways were specifically activated before dormancy onset. The key role of ABA in the control of bud set and dormancy onset has been known for decades and we found that the ABA-related transcription factor PavWRKY40 is expressed as early as during organogenesis. Several studies have highlighted a role of PavWRKY40 homolog in Arabidopsis in ABA signalling, in relation with light transduction [48, 49] and biotic stresses [50]. These results suggest that there might be an early response to ABA in flower buds. Furthermore, we uncovered the upregulation of several pathways linked to organogenesis during the summer months, including PavMYB63 and PavMYB93, expressed during early organogenesis, with potential roles in the secondary wall formation [51] and root development [52]. Interestingly, TESMIN/TSO1-like CXC 2 (PavTCX2), defined here as a marker gene for organogenesis and paradormancy, is the homolog of an Arabidopsis TF potentially involved in stem cell division [53]. We found that targets for PavTCX2 may be over-represented in genes up-regulated during endodormancy, thus suggesting that PavTCX2 acts on bud development by repressing dormancy-associated genes. In accordance with the documented timing of floral initiation and development in sweet cherry [54], several genes involved in floral identity and flower development, including PavAGL20, PavFD, as well as targets of PavSEP3, PavAP3 and PavAG, were markedly upregulated during the early stages of flower bud development. Many studies conducted on fruit trees support the key role of DAM genes in the control of dormancy establishment and maintenance [18] and we found expression patterns very similar to the peach DAM genes with PavDAM1 and PavDAM3, as well as PavDAM6, expressed mostly during summer [55]. The expression of these three genes was at the highest before endodormancy and seems to be inhibited by cold exposure from October onward, similarly to previous results obtained in sweet cherry [56], peach [57], Japanese apricot [58] and apple [59]. These results further suggest a major role for PavDAM1, PavDAM3 and PavDAM6 in dormancy establishment, bud onset and growth cessation in sweet cherry.

Integration of environmental and internal signals through a complex array of signaling pathways during endodormancy

Previous studies have proved the key role of a complex array of signaling pathways in the regulation of endodormancy onset and maintenance that subsequently lead to dormancy release, including genes involved in cold response, phytohormone-associated pathways and oxidation-reduction processes. Genes associated with the response to cold, notably, have been shown to be up-regulated during endodormancy such as dehydrins and DREB genes identified in oak, pear and leafy spurge [24, 27, 60]. We observe an enrichment for GO involved in the response to abiotic and biotic responses, as well as an enrichment for targets of many TFs involved in the response to environmental factors. In particular, our results suggest that PavMYB14, which has a peak of expression in November just before the cold period starts, is repressing genes that are subsequently expressed during ecodormancy. This is in agreement with the fact that AtMYB14, the PavMYB14 homolog in Arabidopsis thaliana, is involved in cold stress response regulation [42]. Although these results were not confirmed in Populus [61], two MYB DOMAIN PROTEIN genes (MYB4 and MYB14) were also up-regulated during the induction of dormancy in grapevine [62]. Similarly, we identified an enrichment in genes highly expressed during endodormancy with target motifs of a transcription factor belonging to the CBF/DREB family. These TFs have previously been implicated in cold acclimation and endodormancy in several perennial species [60, 63]. These results are in agreement with the previous observation showing that genes responding to cold are differentially expressed during dormancy in other tree species [24]. Cold acclimation is the ability of plants to adapt to and withstand freezing temperatures and is triggered by decreasing temperatures and photoperiod. Therefore mechanisms associated with cold acclimation are usually observed concomitantly to the early stages of endodormancy. The stability of membranes and a strict control of cellular homeostasis are crucial in the bud survival under cold stress and we observe that genes associated with cell wall organization and nutrient transporters are up-regulated at the beginning of endodormancy, including the CELLULOSE SYNTHASE-LIKE G3 (PavCSLG3) marker gene.

Similarly to seed dormancy processes, hormonal signals act in a complex way to balance dormancy maintenance and growth resumption. In particular, ABA levels have been shown to increase in response to environmental signals such as low temperatures and/or shortening photoperiod, and trigger dormancy induction [64,65,66] Several studies have also shown that a subsequent drop in ABA concentration is associated with dormancy release [65, 67]. These results are supported by previous reports where genes involved in ABA signaling are differentially expressed during dormancy in various tree species (for e.g., see [19, 20, 22, 24, 68]). We find ABA-related pathways to be central in our transcriptomic analysis of sweet cherry bud dormancy, with the enrichment of GO terms related to ABA found in the genes highly expressed during endodormancy. These genes, including ABA-degradation gene PavCYP707A2, ABA-response factor PavABF2, and the Protein phosphatase 2C (PavPP2C) marker gene, are then inhibited after dormancy release in the three cultivars. Accordingly, we identified a key role for ABA-associated genes PavABI5 and PavABF2 in the regulation of dormancy progression in our dataset. These two transcription factors are mainly expressed around the time of dormancy release, like their target, and their homologs in Arabidopsis are involved in key ABA processes, especially during seed dormancy [69]. These results are consistent with records that PmABF2 is highly expressed during endodormancy in Japanese apricot [22]. Interestingly, both positive regulators of ABA, including PavABF2 and PavABI5, and negative regulators of ABA, such as PavCYP707A2, are highly expressed during endodormancy. These results show an increased regulation of ABA levels during endodormancy. They also suggest that elevated ABA levels may then be present in the buds and that they are correlated with deep dormancy, as previously shown in other studies [70,71,72,73,74]. In addition, PavCYP707A2 is upregulated at the same dormancy stages, which is consistent with the hypothesis that ABA catabolism is activated concomitantly with increased ABA biosynthesis to maintain its homeostasis [75]. Previous reports showed an activation of ABA-induced dormancy by DAM genes [65, 74] and we observed that PavDAM4 expression pattern is very similar to ABA-related genes. We can therefore hypothesize that PavDAM4 has a key role in dormancy onset and maintenance, potentially by regulating ABA metabolism. On the other side of the pathway, ground-breaking works have revealed that ABA signaling is crucial in triggering dormancy onset by inducing plasmodesmata closure, potentially through callose deposit [66, 76]. Accordingly, we found that PavGH17 genes involved in callose degradation are highly activated before and after endodormancy while their expression is inhibited during endodormancy, thus suggesting that callose deposit is activated during endodormancy in sweet cherry flower buds.

In plants, response to environmental and developmental stimuli usually involves pathways associated with circadian clock regulation. This is also true for bud dormancy where the interplay between environmental and internal signals necessitates circadian clock genes for an optimal response [4, 77,78,79,80]. Indeed, transcriptomic analyses conducted in poplar showed that among the genes up-regulated during endodormancy, were genes with the EVENING ELEMENT (EE) motifs, that are important regulators of circadian clock and cold-responsive genes, and components of the circadian clock, including LATE-ELONGATE HYPOCOTYL (LHY) and ZEITLUPE (ZTL) [61, 68]. We identified an enrichment of targets for PavRVE8 and PavRVE1 among the genes expressed around the time of dormancy release. Homologs of RVE1 are also up-regulated during dormancy in leafy spurge [46] and apple [81]. These TFs are homologs of Arabidopsis MYB transcription factors involved in the circadian clock. In particular, AtRVE1 seems to integrate several signalling pathways including cold acclimation and auxin [82,83,84] while AtRVE8 is involved in the regulation of circadian clock by modulating the pattern of H3 acetylation [85]. Our findings that genes involved in the circadian clock are expressed and potentially regulate genes at the time of dormancy release strongly support the hypothesis that environmental cues might be integrated with internal factors to control dormancy and growth in sweet cherry flower buds.

Consistently with observations that elevated levels of the reactive species of oxygen H2O2 are strongly associated with dormancy release [86], oxidative stress is considered as one of the important processes involved in the transition between endodormancy and ecodormancy [30, 87, 88]. In line with these findings, we identified genes involved in oxidation-reduction processes that are up-regulated just before endodormancy release including PavGPX6 and PavGR, that are involved in the detoxification systems. In their model for the control of dormancy, Ophir and colleagues [88] hypothesize that respiratory stress, ethylene and ABA pathways interact to control dormancy release and growth resumption. Our results concur with this hypothesis to some extend albeit the key role of DAM genes should be further explored. Co-regulation analyses will be needed to investigate whether oxidative stress signalling is involved upstream to trigger dormancy release or downstream as a consequence of cellular activity following dormancy release in sweet cherry buds, leading to a better understanding of how other pathways interact or are directly controlled by oxidative cues.

Global cell activity characterizes the ecodormancy stage in sweet cherry flower buds

Following the release of endodormancy, buds enter the ecodormancy stage, which is a state of inhibited growth controlled by external signals that can therefore be reversed by exposure to growth-promoting signals [7]. This transition towards the ability to grow is thought to be associated with the prolonged downregulation of DAM genes (see [18] for review), regulated by epigenetic mechanisms such as histone modifications [63, 89,90,91] and DNA methylation [56], in a similar way to FLC repression during vernalization in Arabidopsis. We observe that the expression of all PavDAM genes is inhibited before dormancy release, thus supporting the hypothesis that DAM genes may be involved in dormancy maintenance. In particular, the transition to ecodormancy coincides with a marked decrease in PavDAM4 expression, which suggests that the regulation of its expression is crucial in the progression of dormancy towards growth resumption. However, other MADS-box transcription factors were found to be up-regulated during ecodormancy, including PavAG and PavAP3, similarly to previous results obtained in Chinese cherry (Prunus pseudocerasus) [28]. We also found that the marker gene PavMEE9, expressed during ecodormancy, is orthologous to the Arabidopsis gene MATERNAL EFFECT EMBRYO ARREST 9 (MEE9), required for female gametophyte development [92], which could suggest active cell differentiation during the ecodormancy stage.

As mentioned before, in-depth studies conducted on poplar have led to the discovery that the regulation of the movements through the plasma membrane plays a key role not only in dormancy onset but also in dormancy release [93]. This is also true for long-distance transport with the observation that in peach, for example, active sucrose import is renewed during ecodormancy [94]. In sweet cherry, our results are consistent with these processes since we show that GO terms associated with transmembrane transporter activity are enriched for genes highly expressed during ecodormancy. Transmembrane transport capacity belongs to a wide range of membrane structures modifications tightly regulated during dormancy. For example, lipid content, linoleic and linolenic acids composition and unsaturation degree of fatty acids in the membrane are modified throughout dormancy progression [30] and these changes in the membrane structure may be associated with modifications in the cytoskeleton [93]. Consistently, we find that genes involved in microtubule-based processes and cell wall organization are up-regulated during ecodormancy in sweet cherry flower buds. For example, the marker gene PavUDP-GalT1, orthologous to a putative UDP-galactose transmembrane transporter, is highly express after dormancy release in all three cultivars.

Overall, all processes triggered during ecodormancy are associated with cell activity. The trends observed here suggest that after endodormancy release, transmembrane and long distance transports are reactivated, thus allowing an active uptake of sugars, leading to increased oxidation-reduction processes and cell proliferation and differentiation.

Development of a diagnostic tool to define the flower bud dormancy stage using seven genes

We find that sweet cherry flower bud stage can be accurately predicted with the expression of just seven genes. It indicates that combining expression profiles of just seven genes is enough to recapitulate all transcriptional states in our study. This is in agreement with previous work showing that transcriptomic states can be accurately predicted using a relatively low number of markers [95]. Marker genes were not selected on the basis of their function and indeed, two genes are orthologous to Arabidopsis proteins of unknown function: PavSRP (Stress responsive A/B Barrel Domain-containing protein) and PavGH127 (putative glycosyl hydrolase). However, as reported above, some of the selected marker genes are involved in the main pathways regulating dormancy progression, including cell wall organization during the early phase of endodormancy (PavCSLG3), ABA (PavPP2C), transmembrane transport (PavUDP-GalT1) and flower primordia development (PavMEE9).

Interestingly, when there are discrepancies between the predicted bud stages and the ones defined by physiological observations, the model always predicts that stages happen earlier than the actual observations. For example, the model predicts that dormancy release occurs instead of endodormancy, or ecodormancy instead of dormancy release. This could suggest that transcriptional changes happen before we can observe physiological changes. This is indeed consistent with the indirect phenotyping method currently used, based on the observation of the response to growth-inducible conditions after 10 days. Using these seven genes to predict the flower bud stage would thus potentially allow to identify these important transitions when they actually happen.

We show that the expression level of these seven genes can be used to predict the flower bud stage in other conditions and genotypes by performing RT-qPCR. Also this independent experiment has been done on two consecutive years and shows that RT-qPCR for these seven marker genes as well as two control genes are enough to predict the flower bud stage in cherry trees. It shows that performing a full transcriptomic analysis is not necessary if the only aim is to define the dormancy stage of flower buds.

Conclusions

In this work, we have characterized transcriptional changes throughout all stages of sweet cherry flower bud development and dormancy. To our knowledge, no analysis had previously been conducted on this range of dates in temperate trees. Pathways involved at different stages of bud dormancy have been investigated in other species and we confirmed that genes associated with the response to cold, ABA and development processes were also identified during sweet cherry flower bud dormancy. We took advantage of the extended timeframe and we highlighted genes and pathways associated with specific phases of dormancy, including early endodormancy, deep endodormancy and dormancy release. For that reason, our results suggest that commonly used definitions of bud dormancy are too restrictive and transcriptomic states might be useful to redefine the dormancy paradigm, not only for sweet cherry but also for other species that undergo overwintering. We advocate for large transcriptomic studies that take advantage of the wide range of genotypes available in forest and fruit trees, aiming at the mechanistic characterization of dormancy stages. Using this approach of comparing transcriptomes for several cultivars of flower buds from organogenesis to dormancy release, we find that the transcriptional states reflect the bud dormancy stage independently of the chilling requirement of the cultivars. Furthermore, we then went a step beyond the global transcriptomic analysis and we developed a model based on the transcriptional profiles of just seven genes to accurately predict the main dormancy stages. This offers an alternative approach to methods currently used such as assessing the date of dormancy release by using forcing conditions. In addition, this result sets the stage for the development of a fast and cost effective diagnostic tool to molecularly define the dormancy stages in cherry trees. This approach, from transcriptomic data to modelling, could be tested and transferred to other fruit tree species and such diagnostic tool would be very valuable for researchers working on fruit trees as well as for plant growers, notably to define the best time for the application of dormancy breaking agents, whose efficiency highly depends on the state of dormancy progression.

Methods

Plant material

Branches and flower buds were collected from four different sweet cherry cultivars with contrasted flowering dates: ‘Cristobalina’, ‘Garnet’, ‘Regina’ and ‘Fertard’, which display extra-early, early, late and very late flowering dates, respectively. ‘Cristobalina’, ‘Garnet’, ‘Regina’ trees were grown in an orchard located at the Fruit Experimental Unit of INRA in Bourran (South West of France, 44° 19′ 56′′ N, 0° 24′ 47′′ E), under the same agricultural practices. ‘Fertard’ trees were grown in an orchard at the Fruit Experimental Unit of INRA in Toulenne, near Bordeaux (48° 51′ 46′′ N, 2° 17′ 15′′ E). During the first sampling season (2015/2016), ten or eleven dates spanning the entire period from flower bud organogenesis (July 2015) to bud break (March 2016) were chosen for RNA sequencing (Fig. 1a and Additional file 2: Table S1), while bud tissues from ‘Fertard’ were sampled in 2015/2016 (12 dates) and 2017/2018 (7 dates) for validation by RT-qPCR (Additional file 2: Table S1). For each date, flower buds were sampled from different trees, each tree corresponding to a biological replicate. Upon harvesting, buds were flash frozen in liquid nitrogen and stored at − 80 °C prior to performing RNA-seq.

Measurements of bud break and estimation of the dormancy release date

For the two sampling seasons, 2015/2016 and 2017/2018, three branches bearing floral buds were randomly chosen fortnightly from ‘Cristobalina’, ‘Garnet’, ‘Regina’ and ‘Fertard’ trees, between November and flowering time (March–April). Branches were incubated in water pots placed under forcing conditions in a growth chamber (25 °C, 16 h light/ 8 h dark, 60–70% humidity). The water was replaced every 3–4 days. After 10 days under forcing conditions, the total number of flower buds that reached the BBCH stage 53 [35, 47] was recorded. The date of dormancy release was estimated as the date when the percentage of buds at BBCH stage 53 was above 50% after 10 days under forcing conditions (Fig. 1a).

RNA extraction and library preparation

Total RNA was extracted from 50 to 60 mg of frozen and pulverised flower buds using RNeasy Plant Mini kit (Qiagen) with minor modification: 1.5% PVP-40 was added in the extraction buffer RLT. RNA quality was evaluated using Tapestation 4200 (Agilent Genomics). Library preparation was performed on 1 μg of high quality RNA (RNA integrity number equivalent superior or equivalent to 8.5) using the TruSeq Stranded mRNA Library Prep Kit High Throughput (Illumina cat. no. RS-122-2103) for ‘Cristobalina’, ‘Garnet’ and ‘Regina’ cultivars. DNA quality from libraries was evaluated using Tapestation 4200. The libraries were sequenced on a NextSeq500 (Illumina), at the Sainsbury Laboratory Cambridge University (SLCU), using paired-end sequencing of 75 bp in length.

Mapping and differential expression analysis

The raw reads obtained from the sequencing were analysed using several publicly available software and in-house scripts. The quality of reads was assessed using FastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc/) and possible adaptor contaminations were removed using Trimmomatic [96]. Trimmed reads were mapped to the peach (Prunus persica (L) Batsch) reference genome v.2 [97] (genome sequence and information can be found at the following address: https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Ppersica) using Tophat [38]. Possible optical duplicates were removed using Picard tools (https://github.com/broadinstitute/picard). The total number of mapped reads of each samples are given in (Additional file 2: Table S6 ). For each gene, raw read counts and TPM (Transcripts Per Million) numbers were calculated [98].

We performed a differential expression analysis on data obtained from the ‘Garnet’ samples. First, data were filtered by removing lowly expressed genes (average read count <3), genes not expressed in most samples (read counts = 0 in more than 75% of the samples); and genes presenting little change in expression between samples (coefficient of variation <0.3). Then, differentially expressed genes (DEGs) between non dormant and dormant stages were assessed using DEseq2 R Bioconductor package [99], in the statistical software R (R Core Team 2018), on filtered data. Genes with an adjusted p-value (padj) < 0.05, using the Benjamini-Hochberg multiple testing correction method, were assigned as DEGs (Additional file 2: Table S2). To enable researchers to access this resource, we have created a graphical web interface to allow easy visualisation of transcriptional profiles throughout flower bud dormancy in the three cultivars for genes of interest (bwenden.shinyapps.io/DorPatterns).

Principal component analyses and hierarchical clustering

Distances between the DEGs expression patterns over the time course were calculated based on Pearson’s correlation on ‘Garnet’ TPM values. We applied a hierarchical clustering analysis on the distance matrix to define ten clusters (Additional file 2: Table S2). For expression patterns representation, we normalized the data using z-score for each gene:

$$ z\ score=\frac{\left({TPM}_{ij}-{mean}_i\right)}{Standard\ Deviation} $$

where TPMij is the TPM value of the gene i in the sample j, meani and standard deviationi are the mean and standard deviation of the TPM values for the gene i over all samples.

Principal component analyses (PCA) were performed on TPM values from different datasets using the prcomp function from R.

For each cluster, using data for ‘Garnet’, ‘Regina’ and ‘Cristobalina’, mean expression pattern was calculated as the mean z-score value for all genes belonging to the cluster. We then calculated the Pearson’s correlation between the z-score values for each gene and the mean z-score for each cluster. We defined the marker genes as genes with the highest correlation values, i.e. genes that represent the best the average pattern of the clusters. Keeping in mind that the marker genes should be easy to handle, we then selected the optimal marker genes displaying high expression levels while not belonging to extended protein families.

Motif and transcription factor targets enrichment analysis

We performed enrichment analysis on the DEG in the different clusters for transcription factor targets genes and target motifs.

Motif discovery on the DEG set was performed using Find Individual Motif occurrences (FIMO) [39]. Motif list available for peach was obtained from PlantTFDB 4.0 [37]. To calculate the overrepresentation of motifs, DEGs were grouped by motif (grouping several genes and transcripts in which the motif was found). Overrepresentation of motifs was performed using hypergeometric tests using Hypergeometric {stats} available in R. Comparison was performed for the number of appearances of a motif in one cluster against the number of appearances on the overall set of DEG. As multiple testing implies the increment of false positives, p-values obtained were corrected using False Discovery Rate [100] correction method using p.adjust {stats} function available in R.

A list of predicted regulation between transcription factors and target genes is available for peach in PlantTFDB [37]. We collected the list and used it to analyse the overrepresentation of genes targeted by TF, using Hypergeometric {stats} available in R, comparing the number of appearances of a gene controlled by one TF in one cluster against the number of appearances on the overall set of DEG. p-values obtained were corrected using a false discovery rate as described above. We only present results obtained for TFs that are themselves DEGs. Predicted gene homology to Arabidopsis thaliana and functions were retrieved from the data files available for Prunus persica (GDR, https://www.rosaceae.org/species/prunus_persica/genome_v2.0.a1).

GO enrichment analysis

The list for the gene ontology (GO) terms was retrieved from the database resource PlantRegMap [37]. Using the topGO package [36], we performed an enrichment analysis on GO terms for biological processes, cellular components and molecular functions based on a classic Fisher algorithm. Enriched GO terms were filtered with a p-value < 0.005 and the ten GO terms with the lowest p-value were selected for representation.

Marker genes selection and RT-qPCR analyses

The seven marker genes were selected based on the following criteria:

  • Their expression presented the best correlation with the average expression profiles of their cluster.

  • They were not members of large families (in order to reduce issues caused by redundancy).

  • We only kept genes for which we could design high efficiency primers for RT-qPCR.

Marker genes were not selected based on modelling fit, nor based on their function.

cDNA was synthetised from 1 μg of total RNA using the iScript Reverse Transcriptase Kit (Bio-rad Cat no 1708891) in 20 μl of final volume. 2 μL of cDNA diluted to a third was used to perform the qPCR in a 20 μL total reaction volume. qPCRs were performed using a Roche LightCycler 480. Three biological replicates for each sample were performed. Primers used in this study for qPCR are available in (Additional file 2: Table S7 ). Primers were tested for non-specific products by separation on 1.5% agarose gel electrophoresis and by sequencing each amplicon. Realtime data were analyzed using custom R scripts. Expression was estimated for each gene in each sample using the relative standard curve method based on cDNA diluted standards. For the visualization of the marker genes’ relative expression, we normalized the RT-qPCR results for each marker gene by the average RT-qPCR data for the reference genes PavRPII and PavEF1.

Bud stage predictive modelling

In order to predict the bud stage based on the marker genes transcriptomic data, we used TPM values for the marker genes to train and test several models. First, all samples were projected into a 2-dimensional space using PCA, to transform potentially correlated data to an orthogonal space. The new coordinates were used to train and test the models to predict the five bud stage categories. In addition, we tested the model on RT-qPCR data for samples harvested from the ‘Fertard’ cultivar. For the modelling purposes, expression data for the seven marker genes were normalized by the expression corresponding to the October sample. We chose the date of October as the reference because it corresponds to the beginning of dormancy and it was available for all cultivars. For each date, the October-normalized expression values of the seven marker genes were projected in the PCA 2-dimension plan calculated for the RNA-seq data and they were tested against the model trained on ‘Cristobalina’, ‘Garnet’ and ‘Regina’ RNA-seq data.

We tested five different models (Multinomial logistic regression – LR, Random forest classifier – RF, k-nearest neighbour classifier – KNN, multi-layer perceptron – MLP, and support vector machine classifier – SVM) for 500 different combination of training/testing RNA-seq datasets, all implemented using the scikit-learn Python package [101] (See Additional file 3 for details on the used parameters). The models were 5-fold cross-validated to ensure the robustness of the coefficients and to reduce overfitting. The models F1-scores, which are used in multi-class cases and are calculated as the weighted average of the precision and recall of each class, were calculated for the RNA-seq testing sets and the RT-qPCR datasets. Results presented in (Additional file 1: Figure S10) show that, although the highest model F1-scores were obtained for the RF and MLP when considering only the RNA-seq training dataset, the best results based on the RT-qPCR dataset were obtained for the SVM and the LR models. We selected the LR model for this study because the coefficients are more easily described, with two coefficients for each dormancy stage (Additional file 1: Figure S9b). The LR model used in this study was optimised using the LogisticRegressionCV function with default parameters, multi_class: ‘multinomial’, max_iter: 1000 and the ‘lbfgs’ solver for the optimization.