Introduction

Advances in sequencing have revolutionised the field of gene expression, particularly in non-model organisms1,2,3,4. The onset of next-generation technologies and the rapid decrease in per-base sequencing costs has boosted the use of massively parallel cDNA sequencing (RNA-seq)5,6. RNA-seq is now a cost-effective technique, enabling the acquisition of large amounts of transcriptomic data from different sources. RNA-seq can be used to determine the amount of differentially expressed transcripts, novel genes, transcription factors, alternatively spliced variants and Single Nucleotide Polymorphisms (SNPs), even in species that lack a reference genome7. De novo transcriptome assembly is particularly advantageous when studying ecological, biological, cellular and molecular processes in distantly related non-model plant species4,8,9,10,11,12.

However, the data generated in RNA-seq or microarray analysis requires an authentic tool for validation13,14,15,16. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is the gold-standard used by researchers to validate data obtained from protocols such as RNA-seq17. The sensitivity, precision, reproducibility and real-time progression of the PCR reaction allows it to be the most accurate and reliable resource for confirming expression analysis data obtained from sources such as microarrays and RNA-seq18,19. It allows monitoring of expression profiles and mRNA abundance levels across different samples concurrently20. Moreover, RT-qPCR can be used for the detection of low expressed transcripts which cannot be determined through high-throughput sequencing analysis21. It can provide either an absolute number of cDNA copies or a relative quantification of the desired transcript across samples22,23. In the latter case, the amount of expression of the target gene is compared with the expression of an internal reference gene to normalise the copy numbers in different samples24,25.

Strategies have been employed to accurately normalise the variation in different samples for reproducible and precise measurements including starting with an optimum amount of material, having an adequate number of biological replicates as well as appropriate technical replication. Good quality RNA, cDNA quality checks, no template and negative RT controls, an amplification factor of 2.0, qPCR efficiency ranging between 90–110%, proper controls for normalisation and statistical analysis are all also necessary26. To date, normalisation of the target genes across different samples has been a key step in minimising the variations produced in qPCR27,28. Normalisation involves the selection of appropriate control/reference genes, whose expression is compared to the gene of interest in the test and control samples. The reference genes, previously referred to as housekeeping genes, are genes selected because they are constitutively expressed among different tissues and cells in different conditions. The expression of an ideal reference gene is stable across tissues and remains unaffected by experimental treatments29. Such genes generally belong to basic cellular processes, primary metabolism, or are cell structure determinants. Thus, many traditional reference genes used in plant-based research include actin (ACT), elongation factor 1α (EF1α), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), eukaryotic initiation factor 4ε (eIF4ε), ubiquitin (UBQ), and 18S ribosomal RNA (U18S).

There are numerous reports of selection and validation of reference genes in model plant species including Arabidopsis thaliana20, Oryza sativa30, Helianthus annuus31, Triticum aestivum32,33, Zea mays34 and Glycine max35. The list also includes diverse non-model plant species including flax36, white clover37, bamboo38, peach39, Mimulus40, watermelon41, grape vine42, and lettuce43. However, in most of these cases the selection of suitable reference gene(s) was based on tightly controlled experimental conditions, whereas it has been reported that the expression of these gene(s) can vary in different tissues under different experimentally controlled conditions26,44,45,46,47,48. Selection of such biased reference genes can result in misinterpretation of the qPCR data and, consequently, output of misleading expression data. In recent years, researchers have stressed the importance of selection and proper validation of reference genes under natural environmental conditions as a mandatory step before normalisation against the expression of the gene of interest27,42. The necessity of proper validation increases in research involving non-model plant species where genomic data is unavailable.

This report deals with identification, selection and validation of suitable candidate reference genes for our study of the ecological phenomenon of masting in two non-model plant species, Chionochloa pallens (Poaceae) and Celmisia lyallii (Asteraceae)49. Masting is synchronised intermittent flowering and production of seeds by a perennial plant population over a wide geographical area50,51. Chionochloa pallens and C. lyallii are endemic to New Zealand and are the two strongest masting alpine plants (Fig. 1)52. Masting is a major problem in New Zealand. For example, Nopthofagus sp (beech) trees flower irregularly, typically once in every 2 to 6 years, resulting in the intermittent production of large quantities of seed. During the masting event, populations of seed predators, such as introduced rodents and stoats, increase dramatically. Due to the abundance of food, these invasive predators reproduce more frequently and their numbers rise rapidly. When the large populations of rodents and stoats have consumed the seeds, they turn to preying on indigenous species including kiwi, kaka, kakapo, kea and native bats. Rapid expansion of the rodent population can lead to an 80–90% decline in the population of mohua birds53. Such attacks are threatening the survival of endangered birds. Moreover, climate change may have effects at both an individual and at a population level in terms of the masting phenomenon. Consequently, it is crucial to identify the molecular markers regulating flowering-time control, so they can be used as predictive markers of masting in order to launch pre-emptive conservation measures.

Figure 1
figure 1

Masting in Celmisia lyallii and Chionochloa pallens using data collected from the Mt Hutt field site over the past 26 years. The y-axis corresponds to the number of flowers per plant, and the x-axis corresponds to the timescale represented in years.

It has been shown that the temperature difference between the two previous summers (ΔT) can be used to predict a masting season. The ΔT model is based on the seed fall measured in 40 different masting species over 26 years52. To induce mast flowering, we carried out translocation experiments in the field to provide, or not, an appropriate ΔT. We gathered samples for analysis of gene expression in order to gain an understanding of the molecular controls of mast flowering. Consequently, it became crucial to identify highly stable reference genes which remained unaffected in the diverse environmental conditions of our experiment. Improper identification and validation of the reference genes would lead to errors and misinterpretation of the gene expression data. Thus, it was critical to test and validate various reference genes prior to determining the molecular basis of flowering in these non-model plant species.

In this study, novel and potentially stable genes were identified from the de novo transcriptome assembly54 of Chionochloa and Celmisia. These genes were analysed based on the statistical programs geNorm55, Normfinder56, BestKeeper57, ΔCt58 and RefFinder59 in order to identify and rank the best possible reference gene(s) or combination of genes for the normalisation of the qPCR data obtained from the field experiments. Potential reference genes were tested against altitudinal translocations, and changes over season and development in leaves from vegetative and flowering plants. These parameters acted as strong controls for an unbiased identification of appropriate reference genes for further gene expression analysis.

Results

A list of reference gene sequences was prepared based on previous reports of reference gene selection for plants in either the Asteraceae or Poaceae. Gene sequences encoding elongation factor 1-α (EF-)60, glyceraldehyde-6-phosphate dehydrogenase (GAPDH)20, eukaryotic initiation factor 4ε (eIF-)34, phosphoglycerate kinase 1 (PGK1)61, protein phosphatase 2a (PP2a)61, metalloprotease (MTP)61, SAND61, actin (ACT)62, β-tubulin (TUB)30, expressed protein (ExP)63, nuclease-binding protein (NBP)63, and tumour homolog protein (THP)63 were downloaded from the Genbank database and used to search for the corresponding homologous sequences in the Celmisia and Chionochloa transcriptomes (Table 1). Candidate reference gene sequences were identified from the transcriptomic data (unpublished) using TBLASTN64. Since most of the gene sequences belonged to a protein family, the identified candidate reference gene sequences were subjected to a phylogenetic analysis using MEGA 7.065 to confirm their identity (Supplementary Data S1). All the putative reference gene sequences clustered with their corresponding protein family.

Table 1 Identification of the candidate reference genes from the de novo transcriptome assembly of Celmisia and Chionochloa.

Primer specificity and amplification efficiency

All the designed primers for the corresponding selected reference genes with their abbreviations, primer melting temperature (Tm), amplicon length, amplification efficiency and correlation coefficient are provided in Table 2. For 18S ribosomal RNA (U18S), universal primers were used for ribosomal gene quantification. All the primers were tested for amplification efficiency using the LinRegPCR software. The software allows estimation of amplification efficiencies for individual samples for a single gene primer pair. The mean amplification efficiency ranged from 95.4 to 110.6. The coefficient of correlation for the primers ranged from 0.998 to 0.9999. Additionally, the designed primers were validated for specificity from the melt curve analysis. All the primers showed a single peak in the dissociation curve indicating the presence of a single product (Supplementary Fig. S2a,b). The amplified PCR products showed single bands when run on 1.5% agarose gel (Supplementary Fig. S3). The sequences of the amplified PCR products confirmed the specificity of the primers.

Table 2 Primer efficiency for housekeeping genes in Celmisia and Chionochloa.

Expression analysis of the candidate reference genes

The raw quantification cycle (Cq) values for each of the eight candidate reference genes in Celmisia and Chionochloa are shown in Fig. 2 (Supplementary Data S4). In Celmisia, the Cq values varied from 8.44 to 25.10 while the mean Cq value ranged from 10.63 to 22.44. 18S showed the greatest variance amongst the eight candidate reference genes. In Chionochloa, the Cq values ranged from 8.15 to 23.15 with an average Cq value ranging from 10.02 to 22.87. In Chionochloa, 18S had the highest variability amongst the eight candidate reference genes. The stability was analysed in comparison with developmental stage (vegetative and flowering), time-course (January, May, September, and March), and attitudinal changes (1520, 1350, 1070 m at Mt Hutt and near sea level at the University of Canterbury).

Figure 2
figure 2

Cycle threshold values for candidate reference genes in Celmisia and Chionochloa are shown in box and whisker plots. The bars represent maximum and minimum Cq value.

Stability of the candidate reference genes in Celmisia

A potential reference gene should have a stable expression across the developmental, altitudinal and seasonal changes. GeNorm analysis showed that eIF and PP2a were the more stable genes across all the different comparisons (Fig. 3). The optimal number of reference gene(s) required for normalisation in all the experimental manipulations was calculated using geNorm. The calculated pairwise variation for all the different experimental groups indicated two genes were sufficient for normalisation as the cut-off value was below 0.15 even when additional reference genes were added (V2/3 = 0.000 and V3/4 = 0.000). NormFinder analysis showed SAND and PP2a as the higher stability genes compared to other genes in the collected samples (Table 3), whereas Bestkeeper analysis pointed to PP2a as the most stable gene (Fig. 4). A similar ranking was seen when all the reference genes were analysed by the ΔCt method (Table 4).

Figure 3
figure 3

GeNorm analysis of the candidate reference genes in 48 collected Celmisia samples across different experimental groups (Total, time-course, altitude and developmental stage). The gene with the lowest stability value is the most stable gene.

Table 3 NormFinder analysis for selection of candidate reference gene(s) in Celmisia and Chionochloa.
Figure 4
figure 4

BestKeeper analysis of the candidate reference genes in 48 Celmisia samples tested across different experimental groups (Total, time-course, altitude and developmental stage). The gene with the lowest standard deviation value is the most stable gene.

Table 4 ΔCt analysis for selection of candidate reference gene(s) in Celmisia and Chionochloa.

RefFinder generated a comprehensive list of ranking order of the candidate reference genes. In the comparative analysis for leaves from vegetative and flowering samples, PP2a and GAPDH were found to be the two most stable reference genes. Under altitudinal translocations, PP2a, eIF and GAPDH were the most stable candidate reference genes. However, the time-course analysis of the candidate reference genes showed eIF, SAND and GAPDH as the better reference genes for data normalisation. From the analysed data, PP2a and GAPDH were found to be best for gene expression data normalisation in Celmisia (Table 5).

Table 5 Comprehensive ranking of candidate reference genes in Celmisia and Chionochloa.

Stability of reference genes in Chionochloa

The geNorm analysis showed ExP and THP to be the most stable reference genes in most of the experimental groups (Fig. 5). TUB and GAPDH were found to be more stable in the January samples collected from the University of Canterbury and 1070 m base of Mt Hutt, while EF was also found to be an equally stable reference gene in the samples collected in March and September. Similar to Celmisia, the pairwise variation for an optimal number of reference genes was found to be less than 0.15 for adding additional genes more than two. This suggests that two reference genes are sufficient to normalise the gene expression data from Chionochloa samples. NormFinder identified EF as the most stable gene across all the samples followed by TUB except for samples collected from 1070 and January time point. ExP and ACT were more stable in the January and February samples, respectively (Table 3). According to the Best Keeper analysis based on standard deviation (std dev) values, ExP was the most stable gene for all the samples (Fig. 6). The coefficient of variation was lowest for NBP in most of the samples across different experimental groups. However, the ranking of NBP was different in vegetative and flowering samples. Interestingly, U18S was ranked highest for samples collected from 1520 m base of Mt Hutt.

Figure 5
figure 5

GeNorm analysis of the candidate reference genes in 54 collected Chionochloa samples across different experimental groups (Total, time-course, altitude and developmental stage). The gene with the lowest stability value is the most stable gene.

Figure 6
figure 6

BestKeeper analysis of the candidate reference genes in 54 Chionochloa samples tested across different experimental groups (Total, time-course, altitude and developmental stage). The gene with the lowest standard deviation value is the most stable gene.

During the time course experimental group, THP, EF, and GAPDH were found to be more stable in the March, September, and May samples. The ranking for the candidate reference genes using the ∆Ct method was variable in different experimental groups as shown in Table 4. EF was consistently a stable gene in most of the experimental groups. Similar variability can be seen in the comprehensive ranking order generated by RefFinder across all the different experimental groups. ExP and THP, or either of them were always present in the top two positions in terms of expression stability. Based on the relative fold change in the expression levels and outputs from the statistical analysis, ExP and THP were selected as the best gene combination for normalisation of the gene expression analysis (Table 5).

Validation of the selected reference gene

The reliability of the selected reference genes was further verified by analysing the expression profile of CONSTANS (CO) and Heading date 1 (Hd1). Heading date 1 is an orthologue of CO present in monocots66. Relative expression levels of CO and Hd1 were normalised using the normalisation factor derived from the two most stable reference genes (PP2a and GAPDH in Celmisia, and ExP and THP in Chionochloa) (Fig. 7), and the least stable-reference gene (U18S) (Fig. 8).

Figure 7
figure 7

Validation of the selected reference genes in Celmisia and Chionochloa. (A) Expression analysis of CO in samples collected from Mt Hutt using PP2a and GAPDH. (B) Expression analysis of CO in vegetative and flowering samples collected from the University of Canterbury using PP2a and GAPDH. (C) Expression analysis of Hd1 in vegetative and flowering samples collected from Mt Hutt using ExP and THP. (D) Expression analysis of Hd1 in vegetative and flowering samples collected from the University of Canterbury using ExP and THP. The data represent the mean ± S.D. of two independent biological replicates, each with three technical replicates. Legends ending with flw or veg corresponds to the fate of the leaves used in the analysis as flowering or vegetative respectively.

Figure 8
figure 8

Expression analysis of CO and Hd1 in Celmisia and Chionochloa using the least stable reference gene, 18S ribosomal gene. (A) CO expression in samples collected from Mt Hutt. (B) CO expression in vegetative and flowering samples collected from the University of Canterbury. (C) Hd1 expression in vegetative and flowering samples collected from Mt Hutt. (D) Hd1 expression in vegetative and flowering samples collected from the University of Canterbury. The data represent the mean ± S.D. of two independent biological replicates, each with three technical replicates. Legends ending with flw or veg corresponds to the fate of the leaves used in the analysis as flowering or vegetative respectively.

The expression pattern of CO in Celmisia was generally found to be similar to its ortholog of Hd1 in Chionochloa when normalised with selected reference genes in the respective species (Fig. 7). The expression patterns of CO and Hd1 varied across the season (Fig. 7B,D) (P-value < 0.0001). The expression of Hd1 was significantly greater in the samples collected in January compared to other time points from the plants at the University of Canterbury (UC) and Mt Hutt (P-value < 0.0001) (Fig. 7C and D). Irrespective of whether the plants subsequently flowered or not, the patterns for CO and Hd1 expression were the same in the leaves collected over time from UC in Celmisia (Fig. 7B) (P-value < 0.0001, F = 349.6). Expression of Hd1 was much greater in leaves from the Chionochloa plants that flowered in the next season compared to leaves from plants that remained vegetative.

The expression data was also normalised using U18S, the least stable candidate gene in Chionochloa and Celmisia. Because the expression of U18S was itself highly variable, this is reflected in the variable expression of CO and Hd1 in both the species in all the experimental groups as seen in the relative fold changes when comparing the y-axes in Figs 7 and 8. There were no significant differences in the expression of CO or Hd1 at different time points in either species. Consequently, normalising the data using U18S would lead to inaccurate conclusions.

Discussion

RT-qPCR is a widely used technique to study transcript abundance of a particular gene in distinct biological samples. Due to its high sensitivity and reliability, it is considered to be the gold standard for detecting and quantifying the expression pattern of gene(s) of interest. The choice of an appropriate normalisation factor is a key step in the RT-qPCR analysis to obtain accurate interpretation of the results. The normalisation factor is based on reference gene(s). Reference gene(s) are selected based on the prerequisite that they will be stably expressed in all the tissues/cells, and unaffected by experimental manipulation. Thus, the majority of the traditionally used reference genes are metabolic genes involved with basal cellular activities such as carbon metabolism, cellular structure maintenance and protein translation. Actin, elongation factor 1α, β-tubulin, glyceraldehyde-3-phosphate dehydrogenase and ubiquitin have been repeatedly used for RT-qPCR analysis in plants, as they are expected to be uniformly expressed in different tissues and organs. Consequently, variability in the expression of a reference gene can demonstrate the variability introduced to samples due to imperfections of the technology and sample preparation steps.

There are several steps during an RT-qPCR analysis where it is possible to introduce variations in the final results including mRNA extractions, cDNA synthesis, the PCR procedure, and design of primers67. Most researchers would agree that these potential variations can be reduced through the use of appropriate reference genes. These reference genes, with proper validation, can be used to correct such methodological errors in an effective and convenient manner. This ensures that any variation is covered to a similar extent in controls and experimental treatments.

However, there is no single gene known that remains stable in diverse experimental conditions, and use of a single reference gene as the normalisation factor can lead to accumulation of relatively large errors in the interpretation of the gene expression data47. Unfortunately, an awareness of the importance of systematic validation of appropriate reference genes has still not completely permeated the scientific community. In one study in 2008, only 3.2% of 188 papers published in plant science journals from July 2007 to December 2007 used a validated reference gene27. Even today, many of the RT-qPCR analyses published in Plant Physiology and The Plant Cell in the past six months have used only a single reference gene, many of which were not reported to have been validated (Supplementary Fig. S5).

There are already reports of traditional reference genes that, while shown to be stably expressed under various biotic and abiotic stresses, have differential expression under other experimental conditions35,43,47. Studies using Arabidopsis have demonstrated variations in reference gene stability under different experimental conditions20. Similar studies have also been published in rice30, citrus68, pea69, ryegrass70, carrot71, soybean45,46 and maize34 suggesting variation in the ranking of the reference gene stability can be attributed to different stresses applied to the plants of interest. In one such study in citrus, different pairs of reference genes provided better normalisation factors under different biotic stresses68. Such studies show that the use of non-statistically validated reference genes can negatively impact the normalisation procedure and introduce errors in the expression patterns of the genes of interest. The most appropriate reference gene(s) should be properly identified and validated in all biological samples across the different experimental groups.

However, field conditions differ from controlled environment conditions, and this adds another level of complexity to the selection of appropriate reference genes. Systematic validation of the reference genes in real-time PCR analyses in non-model plant species in the field is required as pointed out in Tashiro et al.42. Tashiro et al., identified stably expressed genes in different grapevine cultivars emphasising the importance of validation of ranking stability of reference genes in a heterologous plant population. There are limited reports of selection of reference genes for normalisation in experimentally manipulated plants in the field42,70,72, and no such study has been published for masting plants.

We aimed to identify stably expressed genes to avoid the limitations imposed by (a) inappropriate selection, (b) improper validation, and (c) field conditions in order to study gene expression in the non-model masting plants under prevailing environmental conditions. Plants used in this study, Chionochloa and Celmisia, were translocated to different altitudes to provide conditions predicted to promote flowering52. It was interesting to note that only the Celmisia and Chionochloa plants translocated to sea level flowered heavily, along with a few of the Chionochloa plants translocated to the 1520 m site at Mt Hutt. Candidate reference genes were selected and analysed in different temporal-spatial manipulations for both of the species using four different statistical algorithms - geNorm, NormFinder, Bestkeeper, and ∆Ct. Since the ranking of stably expressed genes can differ across these four algorithms due to varied strengths and application conditions of each of the software55,58, it is often difficult for researchers to identify the most stable reference genes. In cases where multiple algorithms are used, a comprehensive ranking is required for the selection of appropriate reference genes. RefFinder is a web-based analysis tool that generates a final comprehensive ranking of reference genes based on their stability output as evaluated by the four independent algorithms. The global ranking is based on the integration of the geometric mean (GM) of the ranking values obtained from the four statistical tools. Importantly, RefFinder evaluates the ranking independent of the unrelated cut-offs and appropriate weights of the GM of the ranking values28,73.

The candidate reference genes were selected based on their previous validation in related species specifically in response to temperature shifts. Even then, the stability ranking of the candidate reference genes in Celmisia and Chionochloa, as determined by geNorm, NormFinder, BestKeeper and ∆Ct, was found to be variable in different experimental groups. This variation may arise due to the different limitations of the statistical algorithms used in each program.

In Chionochloa, the geNorm analysis suggests that all the selected reference genes could be used as a potential normalisation factor as they have stability values of less than 1.5, an indicator of stably expressed reference genes. Reference genes showed a similar range of stability values during altitudinal and developmental shifts (Fig. 3). During the time-course analysis, an increase in the stability of reference genes was observed from January to February. Such change can be attributed to post-translocation effects. Since most of the selected reference genes regulate internal metabolism of the plants, the increase in the stability of the reference genes may then correspond to a plant’s ability to adapt to the new environment. Furthermore, either ExP or THP regularly showed up as the best normalisation factor for all the collected Chionochloa samples with other statistical tests as well. Two reference genes, ExP and THP, were found to be the most stably transcribed among all the samples, as previously reported for rice as well63 – both rice and Chionochloa are in the Poaceae. Since the pairwise variation for V3/4 was less than 0.15, ExP and THP were selected as the best candidate reference genes to be used as the normalisation factor for gene expression analysis in Chionochloa. ExP and THP code for an esterase and a mercury-dependent metal transporter, respectively. Esterases are generally associated with fatty acid degradation and modification. THP, a metal transporter protein, should have a stable expression under our experimental conditions. In our study, neither of these factors changed significantly in response to temperature changes.

In Celmisia, ranking of the candidate genes in all the experimental groups (Total) was found to be similar when analysed by the four algorithms. GeNorm identified eIF as the most stable gene in all the experimental samples. The stability ranking was found to be more variable in the NormFinder analysis. PP2a and SAND showed more stable expression across altitudinal and time-course change experiments. The BestKeeper and ΔCt analysis identified PP2A and GAPDH as having the most stable expression pattern. Moreover, either GAPDH and/or PP2a always showed up in the top three positions in the comprehensive ranking, as the most stable reference genes across all the experimental groups. Based on the calculated pairwise variation output, PP2a and GAPDH were selected to validate the gene expression data of the flowering promoter CONSTANS. PP2a has already been established and validated to be one of the best candidate reference genes for normalisation in many plant species74,75,76. The protein is an essential component of cellular signalling pathways to achieve coordinated functioning between different cell types against oxidative stress. On the other hand, GAPDH is a classical reference gene used by many researchers, although its expression has been shown to vary under different environmental conditions77. Expression of GAPDH in coffee was found to be the most stable among all the tested candidates between different tissues78. In flax, GAPDH was found to be highly stable during different stages of plant development36. Expression of GAPDH may increase in conditions of abiotic and biotic stress where a plant requires more sugar utilisation and energy to stabilise its growth condition79. Since the sequence of GAPDH is highly conserved, it may be considered a good reference gene in many non-model plant species80. However, we highly recommend that GAPDH also be validated under experimental conditions in the field, as it was not the most stable gene in Chionochloa.

CONSTANS, was selected as a target gene to validate the credibility of the selected reference genes. CONSTANS is a CAAT-box transcription factor that has been shown to regulate photoperiodic control of flowering in Arabidopsis and in many other plant species56. The expression analysis of CO in Celmisia and Chionochloa was found to be similar to that in published reports when normalised using the two pairs of reference genes81. The two-way anova analysis suggests that the expression of CO was significantly different in all the seasons in distinct altitudes (P-value < 0.0001, F = 20.5). This is as expected as the transcription of the CO gene is regulated by photoperiodic signals. The expression was not significantly different between distinct altitudes of Mt Hutt. But the interaction between different seasons and altitude showed a positive response to the relative expression of CO in both species. The flowering genetic network is different between monocots and dicots even though the core genes remain the same82. This could explain the variation in the expression pattern of Hd1 in the January leaf samples with different fates in Chionochloa, compared to Celmisia (Fig. 7D).

Even though the ranking order of the candidate reference genes may have differed, all the statistical programs consistently excluded U18S, ranking it as the most unstable gene among all the candidates. Expression analysis using the least stable gene (U18S) as the normalisation factor for both Celmisia and Chionochloa, showed significantly different results to the expression pattern using the selected reference gene pairs described above (comparing Figs 7 and 8). The expression values differed up to 14 log2 fold change in Celmisia and 12 log2 fold change in Chionochloa when analysed with U18S as the normalisation factor (Supplementary Fig. S7). These results clearly suggest that inappropriate selection of reference genes can introduce significant bias in the expression analysis and lead to misinterpretation of results. The use of U18S has also been discouraged among plant scientists due to its greater abundance in a sample compared to the target gene which can introduce greater errors in the analysis26.

Conclusion: A Systematic Validation for Plants in Field Experiments

RT-qPCR is an extremely sensitive and important technique for real time amplification of transcripts. The selection of appropriate reference genes to use as the normalisation factor is equally as important as the technique. Since conditions in the field add a layer of complexity over those in controlled environments, it is crucial to verify the reliability of all potential reference genes in order to avoid misinterpretation of the genetic analysis. Discrepancies in the ranking order of candidate reference genes emphasises how crucial it is to validate every reference gene for any gene expression output in the natural environment. This work constitutes the first systematic analysis to identify, select and validate appropriate reference genes for genetic analysis in masting plants growing under prevailing environmental conditions. We have identified stably expressed reference genes from the draft transcriptomes of two masting plants, Chionochloa and Celmisia. These genes showed constant expression levels across seasonal, altitudinal and developmental changes and can be used to validate gene expression analysis. Our study will benefit both molecular and ecological research towards forecasting mast flowering in non-model plants under global climate change. Additionally, it will enable the New Zealand government to design efficient conservation measures aimed at the protection of endangered, endemic species.

Methods

Experimental design and plant material

Experiments were set up at the field site at Mt Hutt (43°32′S, 171°33′E). The control plots for Celmisia and Chionochloa were present at the 1350 m and 1070 m base of Mt Hutt, respectively. A set of 20 plants, randomly selected from the control plots, were moved to sea level (University of Canterbury, (43°31′S, 172°35′E). A similar set of Chionochloa was also translocated to an altitude of 1520 m on Mt Hutt, and a set of Celmisia plants was translocated to 1070 m and 1520 m. Leaf samples were collected four times throughout the year from each of the plants that were then segregated into two sets depending on whether the plants subsequently flowered (flowering sample) or remained vegetative (vegetative sample). Furthermore, an additional set of samples were also collected in the late summer for Chionochloa plants translocated to sea level to study the effect of a complete summer season on the flowering response. All of the samples were harvested and frozen using dry ice. The collected material was stored in −80 °C until subsequent analysis. Two independent biological replicates were used in the study, each consisting of a pool of three independent plants.

Total RNA extraction

All the independent replicates were ground in mortar and pestle using liquid nitrogen. RNAzap was sprayed to clean the desk and types of equipment prior to use to inactivate the RNases present in the surroundings. Total RNA was extracted using the Trizol-chloroform method: 1 ml of Trizol along with 60 µl of 20% (w/v) Sarkosyl was added to the ground sample (~100 mg). The sample was centrifuged at 12000 g for 5 mins. The supernatant was transferred to a clean tube followed by addition of 200 µl of chloroform for phase separation. The mixture was allowed to stand for 3–4 mins followed by centrifugation at 12000 g for 15 mins. The aqueous phase from the above step was collected into a new tube and 500 µl of isopropanol was added to precipitate the RNA. The mixture was again centrifuged for 5 mins at 12000 g. The supernatant was discarded and the pellet was washed twice with 80% ethanol at 12000 rpm for 2 min. The clean, translucent pellet was air dried for 5–6 minutes. The RNA pellet was dissolved in RNase-free water in a dry bath at 65 °C and stored at −80 °C. The concentration and purity of the extracted RNA was assessed using a Nanodrop 2000c Spectrophotometer (Thermo Scientific, US). The absorbance ratios of A260/280 and A260/230 were the parameter used to check the quality of the extracted RNA. The values are required to be in between 2.0–2.1 for A260/280 and 2.0–2.2 for A260/230 for a pure RNA sample. The integrity of RNA was assessed using 1.2% (w/v) agarose gel electrophoresis.

Reverse transcription-quantitative PCR (RT-qPCR)

Purified RNA (1 µg) was treated with DNase I to remove any genomic traces from the sample. 1 µl of 25 mM EDTA was added to the treated RNA followed by an incubation of 10 min at 65 °C to inactivate the DNase I. For reverse transcription, 1 µl of 50 µM oligodT, 1 µl of 50 µM random hexamers and 10 mM dNTPs were added to the treated RNA83. The sample was incubated at 65 °C for 5 min and then on ice for 2 min. 7 µl of reverse transcription master mix, comprising 5X first strand buffer, 0.1 M DTT, 40 U/µl RNaseOUT, and 200 U/µL Superscript III reverse transcriptase were added to the above RNA mix. The sample was then incubated for the following conditions to synthesize the first strand of cDNA: 50 °C for 60 min followed by 70 °C for 15 min. The cDNA product was diluted five-fold and stored at −20 °C until quantitative PCR (qPCR).

RT-qPCR reactions were performed in a 15-µl final volume with 7.5 µl of a SYBR master mix (Kappa, Sigma), 4.5 µl of water, 10 µM of each forward and reverse primer. Cycling conditions were 95 °C for 10 min hold, followed by 40 cycles of 95 °C for 10 s, 60 °C for 15 s and 72 °C for 20 s. A melt curve analysis was performed to determine the primer specificity with the following conditions: Ramping of temperature from 72–95 °C by one degree Celsius at each step, with a waiting time of 5 s after each ramping step. All the RT-qPCR reactions were run in triplicate for all the samples across the eight candidate reference genes with negative water and reverse transcriptase control using the RotorGene-Q (Qiagen, Germany).

In silico identification, gene sequence search and primer design

Based on a literature search, eight candidate reference genes for each of Chionochloa and Celmisia were selected for the study. Reference gene sequences present in sunflower, Chrysanthemum, rice, and maize were used as query sequences to search for the orthologous sequences in Celmisia and Chionochloa. These query sequences were blasted against the transcriptome assembly of Celmisia and Chionochloa (unpublished data) to identify the candidate reference genes. The putative sequences were further analysed using multiple sequence alignment and phylogenetic analysis. Both the forward and reverse primers for each of the candidate reference genes were designed using Primer premier 6.084 with the following conditions: Tm values ranging between 58–62 °C, GC content between 40–60%, primer lengths of 18–30 base-pair (bp), and product length of 100–300 bp. The primers were synthesized by the Macrogen Company (South Korea). All the primers were tested for amplification efficiency and coefficient of determination across all the samples using LinReg PCR software85. This software calculates the amplification efficiency based on the data points measured from the amplification curve when plotted against the log-florescence axis. The amplified products were run on 1.5% (w/v) agarose gel to verify the specificity of the primers along with dissociation curve analysis. The amplified products were cleaned up using a DNA agarose extraction kits (New England Biolabs) as per the manufacturer’s instructions for sequencing. These products were again sequenced using Sanger sequencing86 at the University of Canterbury, to confirm the identity of the amplified PCR product. The sequences have been lodged in GenBank. The Accession numbers are provided in Supplementary File 6.

Determination of reference gene expression stability

The stability of expression of the reference genes across distinct temporal-spatially collected samples was assessed using five different statistical packages: geNorm, Normfinder, BestKeeper, ΔCt and RefFinder. All the packages were used according to the manufacturer’s instructions. Averaged Cq values from the triplicate reactions for each sample and for each gene combination were used for the analysis. For geNorm and Normfinder, the raw Cq values were transformed to relative quantities (RQ) using the formula \({\rm{RQ}}={2}^{({{\rm{C}}}_{q}{\rm{\min }}-{{\rm{C}}}_{q}{\rm{sample}})}\), where Cq min is the lowest Cq value across the sample pool. The geNorm algorithm calculates the expression stability value (M) and pairwise variation (V). All the genes across different conditions were ranked by their stability value, the lower the value of M, the more stable the gene expression. The computed pairwise variation, based on the normalisation factor, allows for the determination of the minimum number of reference genes required for normalisation of the data. A value of V less than 0.15 indicates the appropriate number of reference genes required for analysis55. Normfinder evaluates the expression stability of candidate reference genes at inter-group and intra-group levels. Ideally, the two genes with the lowest stability values are the most appropriate genes to be used for normalisation56. BestKeeper analyses the candidate genes based on the standard deviation and coefficient of variation (CV) of the Cq values in a sample pool. The smaller the standard deviation and CV, and the higher the coefficient of regression, the more stable is the gene57. The ΔCt method compares the relative expression of the pair of genes in a sample pool, suggesting a correlation between the variations in the Cq values of a candidate reference gene across the samples with their stability. RefFinder is another tool that determines a comprehensive ranking of the stability of the candidate reference genes based on the inputs from geNorm, Normfinder, Bestkeeper and ∆Ct methods. The tool analyses the output from all the statistical software and generates a comprehensive ranking list of reference genes based on their stability59.

Validation of the reference gene analysis

Cp Heading Date1 (Hd1) and Cl CONSTANS (CO) were used as the targets to validate the selected reference genes for data normalisation in Chionochloa and Celmisia, respectively. CO and Hd1 are two orthologous proteins found in Arabidopsis and rice, respectively66,87. These proteins belong to the zinc-finger transcription factor family that upregulates the expression of florigen (FT) that codes for the flowering hormone56. These genes physically bind to the promoter of FT to induce flowering in plants. The expression analysis was done as described above and the results statistically tested using two-way analysis of variance (ANOVA).

Ethics approval and consent to participate

All plant material was collected with appropriate permissions and consultation. Department of Conservation permit number 40225-FLO.