Primer design and amplification efficiencies are crucial for reliability of quantitative PCR studies of caffeine biosynthetic N-methyltransferases in coffee

Primers having suboptimal amplification efficiencies were shown to falsely represent fold change expression of the N-methyltransferases gene family involved in caffeine biosynthesis in Coffea canephora. To study this phenomenon, the role of stability of the internal reference gene, as well as the amplification efficiency correction of the primers was investigated. GAPDH and Ubiquitin exhibited a good stability for studying the ontogeny of endosperm tissue, as well as the leaf transcriptome during stress from salicylic acid, methyl jasmonate, PEG-mediated drought and sudden exposure to light. Ubiquitin manifested low variation in Cq under all these stress regimes and in endosperm ontogeny with 30.1–30.9 in the best dataset and 28.8–30.9 in the most deviating dataset. It was observed that problems arising due to improper amplification efficiency of the target or reference genes or both could lead to misinterpretation of gene expression levels. Quantitative RT-PCR performed at a sub-optimal efficiency of GAPDH reference gene at 1.68 led to the faulty interpretation of 2.007 folds upregulation by the 2−ΔΔCt method and 1.705 folds upregulation by Efficiency method for the first NMT (Xanthosine methyltransferase), which actually is repressed during dark acclimatization of coffee plants. Efficiency correction improved the reliability of the expression data and also indicated a downregulation of this gene by 0.485 folds and 0.474 folds using 2−ΔΔCt and E method, respectively, in concordance to earlier reports. Hence, efficiency correction of the primers having suboptimal efficiencies is an absolute prerequisite for the accurate calculation of fold change using quantitative RT-PCR.


Introduction
Caffeine is the most researched molecule from coffee and is designated to a functional role under the 'Chemical Defense Theory' (Frischknecht et al. 1985). Nevertheless, there has been much debate about the actual role and the evolution of caffeine in different plant systems. Caffeine accounts for the major purine alkaloid in coffee and is synthesized from the ubiquitous committed precursor, xanthosine, by the Salicylic Acid Benzoic Acid THeobromine Synthase, (SABATH) superfamily of methyltransferases (Kato and Mizuno 2004). With the availability of large EST database and complete sequencing of C. canephora genome (Denouend et al. 2014) much research needs to now focus on transcriptomics and functional characterization of the coffee genome. Caffeine biosynthetic N-methyltransferases are upregulated under influence from light (Kumar et al. 2015a), developmental stage and genotype (Perrios et al. 2015), salicylic acid and methyl jasmonate (Kumar et al. 2015b(Kumar et al. , 2017. In addition, salinity and drought negatively regulate caffeine content (Kumar et al. 2015c). Future studies on the regulation of caffeine biosynthesis require the assignment of a suitable reference gene for normalization of quantitative PCR.
Quantitative RT-PCR remains a popular tool for transcriptomics due to its ease of application and economic feasibility. Search for appropriate reference gene is indispensable for normalization of qPCR quantification (Kozera and Rapacz 2013). Housekeeping genes like actin, tubulin, GAPDH, ubiquitin, rpl39, and rRNA are routinely used as internal reference genes in different systems (Joseph et al. 2018 (Bustin 2009) recommends the use of most stable reference gene or their combinations for the quantitative analysis of the experimental sets. Technically, it is not feasible to identify a single reference control that can be used for all the experimental sets. Nevertheless, using the statistical programs most suitable combination of reference genes can be selected for each biological experiment sets. The most popular algorithms used for identifying stability of the reference genes include GeNorm (Vandesompele et al. 2002), NormFinder (Andersen et al. 2004), BestKeeper (Pfaffl et al. 2004) and comparative delta-CT (Silver et al. 2006) method, all of which provide a value of stability or normalization factor based on expression of the reference gene in different test and control samples. Conceptually, relative gene expression by qPCR is calculated by the efficiency method (E method) (Pfaffl 2001), which account for the differences in PCR efficiency of the internal reference gene and target gene and 2 −ΔΔCt method that assumes 100% efficiency for both target and reference (Livak 2001). PCR efficiency between samples varies due to dissimilarities in the quantity and quality of cDNA, primer quality, the copy number of transcripts and annealing temperatures. Suboptimal quality of template and primer contribute to errors in calculated fold change due to an appearance of non-stochastic Cq values in the standard curve (Ruijter et al. 2012). PCR efficiency is a critical indicator for the performance of qPCR analysis of multigene families especially involved in secondary metabolism (Arunraj and Samuel 2018). Specificities of primers are not always guaranteed while working with multigene families and the quality of template may vary between samples depending on the impurities in prepared template due to changes in secondary metabolism. Hence, neither 2 −ΔΔCT method nor Pfaffl's E method are necessarily the most accurate description of the actual fold change until efficiencies are optimally corrected. Large-scale qPCR analysis of multigene families requires proper optimization considering all these factors for the generation of reliable data. The present study establishes such errors arising in the quantitation of caffeine biosynthetic NMT gene family in Coffee under light stress and the effect of efficiency correction on the reliability of expression data.

Plant materials and sample preparation
Total RNA was isolated from Coffea canephora Pierre ex. Froehner var. robusta cv. S274 leaves of stress treatments (Kumar et al. 2015a(Kumar et al. , b, 2017 and developing endosperms (Giridhar et al. 2012) and all the primers were adopted from the same studies. Only plant samples from 50 μM salicylic acid, 10 µl methyl jasmonate, 200 mM sodium chloride and 15% (w/v) polyethylene glycol treatments were considered for the present study. Light and endosperm sampling was exactly similar to the method used previously (Kumar et al. 2015a, b). First-strand cDNA synthesis with 1 µg of total RNA was carried out using iScript cDNA synthesis kit (BioRad) with pre-mixed cocktail of Oligo-dT and random hexamer primers. The cDNA preparations from circadian undisturbed and dark acclimatized samples alone were used for the study of fold change.

Test of stability of internal reference
Stability of the reference genes, GAPDH and Ubiquitin, were carried out by comparing Cq values in control and treated samples of stressed plants and in the ten different growth phases of endosperms. Twofold dilutions of control leaf and endosperm cDNA (1, 1:2, 1:4, 1:8, 1:16, 1:32 and 1:64 dilutions) was used to generate standard plots of the reference genes and the Cq was corrected using the formula c2 = c1*log(a1)/log(a2) where c2 is the corrected Cq value, if the sample would have amplified with the efficiency equaling a2. For leaf samples, the Cq of Ubiquitin was corrected according to the efficiency of GAPDH and in ontogeny the Cq of GAPDH was corrected according to that of Ubiquitin. Amplification curves obtained for the experimental setup using 1:15 diluted cDNA as a template was analyzed using RefFinder program (http://leonx ie.esy.es/RefFi nder/). qPCR was performed in Applied Biosystems Quantstudio5 instrument using Ssofast Evagreen master mix (BioRad) supplemented with ROXII (Takara biosciences) with following parameters: initial denaturation of 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 5 s and annealing/ extension at (58 °C) for 30 s.

Standard curve for efficiency calculation
Standard plot was made from a dilution series of 1:5, 1:10, 1:20 and 1:40 dilutions of mixed cDNA pooled from all the stress samples. Reactions were repeated in three different annealing temperatures (55 °C, 57 °C and 59 °C). qPCR reactions were carried out in Applied Biosystems Quantstudio 5 instrument with Ssofast Evagreen master mix (BioRad) supplemented with ROXII (Takara biosciences). The cycle parameters included initial denaturation of 95 °C for 30 s, cycle denaturation at 95 °C for 5 s and annealing/extension at (55-59 °C) for 30 s (40 cycle). Efficiency of reaction was calculated from the slope using the formula E = 10 −1/slope . Fold change between cDNA of circadian undisturbed and dark acclimatized plant samples for XMT and MXMT genes was calculated using the method of Pfaffl (2001), and 2 −ΔΔCt method (Livak 2001).

Stability of Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and Ubiquitin internal reference genes
Most suitable internal reference gene for qPCR by SYBR chemistry has been identified for coffee to study rust infection (Vieira et al. 2011), coffee berry disease (Figueiredo et al. 2013), embryogenic tissue (Freitas et al. 2017 Cruz et al. 2009). In the present study, the stability of both the internal reference gene was tested under conditions of light exposure, salicylic acid, methyl jasmonate, salinity and drought stress on leaf cDNA and during development in endosperm tissues. A box plot graph indicates the higher power of Ubiquitin as a reference gene in the analysis between control and stress treated samples from leaves of salicylic acid, methyl jasmonate, PEG-mediated drought and light exposed plants as well as in developing endosperms (Fig. 1a, b). Neither GAPDH nor Ubiquitin were suitable for studying salinity stress. Strictly essential genes like EF1 and EF1α were predicted to be suitable reference genes for salinity stress in another study on C. arabica (Carvalho et al. 2013). The frequencies of Cq values for control and treated samples of leaf and for developing endosperms distributed normally (Fig. 1c, d). Standard curves also indicated that Ubiquitin has more power in studying the difference in gene expression between leaf and endosperm tissues (Fig. 1e, f). The amplification curves could be most suitably used at dilutions of 1:4 to 1:32 for both GAPDH and Ubiquitin. Stability values were obtained for GAPDH and Ubiquitin in intra and inter-group comparisons using comparative delta CT, Bestkeeper, NormFinder and GeNorm encompassed in the RefFinder package. The data described in Table 1 indicated that stability values improve by use of efficiency corrected Cq. This difference in efficiency could affect the range of fold change while using these two reference genes or while comparing a reference with the target. M-value of the stress subset was 0.760 in GeNorm analysis of the uncorrected Cq values. However, the M-value improved to 0.039 after efficiency correction of Cq indicating GAPDH and Ubiquitin to be a stable combination of reference for fold change calculation involving stress experiments. Addition of the endosperm dataset to the consolidated data, disrupted the stability value (M = 0.856 for uncorrected and M = 3.240 for corrected). Stability values for GAPDH and Ubiquitin using NormFinder was lowest for salicylic acid treatment (0.009) and highest for endosperm ontogeny (0.053). RefFinder analysis also indicated that the average standard deviation with delta CT method reduced upon Cq correction in individual subsets as well as 'treatment alone' consolidated data, whereas, an opposite was observed on the total combined stress and endosperm dataset. Moreover, the uncorrected Cq values for certain subsets varied at > 1.0 values. Hence, 2 −ΔΔCT would not be an appropriate method to calculate fold change using these reference genes. Also, correcting Cq for reference genes reduced the standard deviations of ΔCt observed for different datasets.

Effect of efficiency correction on data analysis of fold change
Quantitative RT-PCR analysis of transcripts of multigene families, especially genes involved in secondary metabolism, suffer due to issues relating to the specificity of primer and purity of isolated RNA. C. canephora NMTs share greater than 80% similarity (Denoued et al. 2014), an example of which is depicted in Fig. 2a. The primers used in this study were designed based on EST sequences prior to the publication of the coffee genome (Fig. 2b). Though the primers amplified specific genes, the CcMXMT1 forward and CcDXMT reverse primer has a base substitution at 10th /24 and 18th /24 nucleotide position from the 5´end of the primer, respectively. Also, the MXMT reverse primer shows cross specificity to CcMTL1 and the CcDXMT reverse primer to CcMTL and CcMXMT-both of which could effectively alter their PCR efficiencies and its dynamics under different experimental datasets. Additionally, the primer binding site on the genome are slightly prone to contain SNPs and hence may have single mismatches depending of genotypes. Also, the primer pairs had varying amplification efficiencies at different temperatures. For GAPDH, efficiencies are 1.68 (55 °C), 2.01 (57 °C) and 1.84 (59 °C). Ubiquitin amplified with more consistent efficiencies of 2.01, 1.98, 1.98, respectively at these temperatures. XMT primers amplified at efficiencies of 1.93 (55 °C) and 2.10 (57 °C) whereas, MXMT primers at 1.68 and 1.97 and DXMT primers at 1.96 and 2.39, respectively. MXMT1 primers worked only at 55 °C with 1.67 efficiency. It was noted that primers that had optimal design according to Coffee genomic sequence, for example, XMT and Ubiquitin, did not vary much in efficiency at different annealing temperatures. However, the sub-optimally designed GAPDH and MXMT exhibited higher difference in efficiency at a slight change of annealing temperature.
A minute repression of caffeine and NMT expression is observed in the leaves of dark acclimatized plants of coffee when compared to circadian undisturbed plants (Kumar et al. 2015a). Hence, these samples were used to study the effect of amplification efficiencies on fold change calculation. The calibrator DNA was prepared from pooled samples of a larger dataset as mentioned in material and methods. The fold change was calculated by the 2 −ΔΔCt method (Livak 2001) and the E method (Pfaffl 2001), with consideration of correction under optimal PCR annealing temperatures. XMT and Ubiquitin genes amplify with good efficiency at both 55 °C and 57 °C. Fold change in XMT using Ubiquitin reference gave comparable results using 2 −ΔΔCt method (0.915 fold) and E method (0.913 fold) at 55 °C and efficiencies of 1.89 and 1.99. Fold change calculation by efficiency correction using standard plot at annealing temperature of 57 °C was also comparable between 2 −ΔΔCt method (0.680 fold) and E method (0.668 fold). However, at suboptimal    (Kumar et al. 2015a). It is interpreted that at suboptimal efficiencies of reference gene, neither the 2 −ΔΔCt method nor E method were able to correct an erroneous upregulation plotted for XMT target gene. However, the efficiency of GAPDH and XMT corrected by standard curve plot at 57 °C, markedly reduced the fold change to 0.485 folds and 0.474 folds by 2 −ΔΔCt and E method, respectively. Upon correction of efficiency, the fold change calculated for XMT with GAPDH became comparable to the fold change calculated using Ubiquitin. MXMT amplification is prone to relaxed specificity from the reverse primer and also the primer set work at low efficiency at 55 °C as similar to GAPDH. Performing efficiency corrections at 57 °C led to more consistency in fold change comparisons using the different methods of 2 −ΔΔCt and E method as for XMT. However, the dynamics of fold change were different when comparing between different reference genes.

Conclusions
GAPDH and Ubiquitin genes have low variability between control samples and treatments involving salicylic acid, methyl jasmonate, light exposure, PEG and ontogeny of endosperms with correction of their efficiencies. Primer efficiency is crucial for reduction of errors in fold change calculations of caffeine biosynthetic NMTs. Efficiency correction by qPCR standardization overcomes the errors in the range of fold changes. Hence, we reiterate the importance of efficiency calculations of individual primers as inevitable prior to expression studies. In addition, this could reduce the chances of falsely interpreting overexpression for genes that may actually be repressed or to prevent exaggeration of fold changes.