Reference Gene Validation via RT–qPCR for Human iPSC-Derived Neural Stem Cells and Neural Progenitors

Correct selection of the reference gene(s) is the most important step in gene expression analysis. The aims of this study were to identify and evaluate the panel of possible reference genes in neural stem cells (NSC), early neural progenitors (eNP) and neural progenitors (NP) obtained from human-induced pluripotent stem cells (hiPSC). The stability of expression of genes commonly used as the reference in cells during neural differentiation is variable and does not meet the criteria for reference genes. In the present work, we evaluated the stability of expression of 16 candidate reference genes using the four most popular algorithms: the ΔCt method, BestKeeper, geNorm and NormFinder. All data were analysed using the online tool RefFinder to obtain a comprehensive ranking. Our results indicate that NormFinder is the best tool for reference gene selection in early stages of hiPSC neural differentiation. None of the 16 tested genes is suitable as reference gene for all three stages of development. We recommend using different genes (panel of genes) to normalise RT–qPCR data for each of the neural differentiation stages. Electronic supplementary material The online version of this article (10.1007/s12035-019-1538-x) contains supplementary material, which is available to authorized users.


Introduction
In order to fully understand the processes of neural differentiation, it is necessary to use quantitative real-time polymerase chain reaction (RT-qPCR) for the accurate determination of the relative levels of transcripts of interest. This requires normalisation using a reference gene(s), which show constant expression under the experimental conditions. Suitable reference genes have been identified for many cell types and specific experimental conditions, but the panel of reference genes for human-induced pluripotent stem cells (hiPSC)-derived NSC, eNP and NP populations, has not been described so far. The rationale for selection of putative reference genes in this report was to include the most commonly used and found in the publications referring to different stages of human neural development [1][2][3].
Human induced Pluripotent Stem Cells (hiPSC) are generated from somatic cells by the introduction of a combination of pluripotency-associated genes such as POU5F1, SOX2 and NANOG. hiPSC have the potential to differentiate into any desired cell type, including neurons [4,5]. The process of reprogramming and differentiation are accompanied by many changes (proteomic, genetic, epigenetic and metabolic) [5][6][7][8].
During the neural differentiation process of human hiPSC, different stages of neural development distinguished by specific gene expression profile are generated. In first steps, neuroectoderm-like structures resembling "rosetts" appear, which in suspension culture can form aggregates called neurospheres. At this stage of development neural stem cell (NSC) markers: PAX6, SOX1 and Nestin (NES) are expressed [9]. During further steps of differentiation, iPSCderived neural precursor cells and neurons thoroughly change their gene expression patterns: some genes are activated, while the expression of others decreases, or are completely switched off. Early neural progenitors (eNP) which are still expressing NES, but also TUBB3 and other early neural markers represent the developmental stage between NSC and neural progenitors (NP). The NP population is characterised by the increased expression of neuronal markers (e.g. TUBB3 and MAP2) and appearance of glial markers [10]. These changes are accompanied by the neurite outgrowth and a proliferation rate decrease [8,[10][11][12][13], while in the further differentiation process in the defined media-mature, functional neurons are obtained [9].Thus, eNP and NP are the intermediate stages of neural development between NSC and mature neurons.
Therefore, the aims of this study were to assess the stability of expression for ACTB, CAPN10, CCNG1, EEF1A1, EID2, GAPDH, HPRT1, MYC, NAT1, PHB, RABEP2, RPLP0, TBP, TUBB3, UBC and ZNF324B in hiPSC during neural differentiation to NSC, eNP and NP populations. The stability of expression of 16 candidate genes was evaluated by use of 4 statistical algorithms: geNorm [14], NormFinder [15], BestKeeper [16] and the comparative ΔCt method [17]. In our analysis, two approaches have been used; at first, we searched for the most stable reference genes for all developmental stages (NSC, eNP, NP) separately, in the second-for all stages together.

Cell Culture
The induced pluripotent stem cells (hiPSC) feeder-free cell line was derived from the CD34+ fraction of human cord blood cells by transfection with the EBNA-based episomal system composed of seven factors: SOX2, OCT4, KLF4, MYC, NANOG, LIN28 and SV40L T (The Gibco® Human Episomal iPSC Line, Life Technologies). The process of hiPSC culture and neural differentiation has been described in detail previously [10]. The phenotype of hiPSC-derived NSC, eNP and NP was confirmed qualitatively and quantitatively by immunocytochemistry staining, RT-PCR and RNA-seq [18].

Gene Ontology Enrichment Analysis
To further explore the functional properties of the analysed genes (ACTB, GAPDH, HPRT1, TUBB3, EID2, CAPN10,  RABEP2, ZNF324B, NAT1, TBP, PHB, UBC, CCNG1,  MYC, EEF1A1, RPLP0): functional protein association networks and Gene Ontology (GO) enrichment analysis were prepared in the STRING: functional protein association networks software (https://string-db.org/). Candidates for reference genes have been classified (FDR < 0.01) in accordance with the following: biological process (GO); molecular function (GO), cellular component (GO); and KEGG pathway. A summary of the description of the tested genes was prepared directly from GeneCards®: The Human Gene Database, UniProtKB/Swiss-Prot Database (https://www. genecards.org/), and STRING: functional protein association networks (https://string-db.org/). Detailed analysis was presented in the Supplementary data_1.

Design and Specificity of the Primers
All primers were designed using Primer3 software (http:// bioinfo.ut.ee/primer3-0.4.0/) based on RNA or DNA sequences found in GenBank (https://www.ncbi.nlm.nih. gov/genbank/). If possible, only primers spanning an exonexon junction were selected. All of the primers for the candidate genes were chosen according to the general rules of RT-qPCR primer design. In addition, all designed primer pairs were checked for nonspecific amplification by in-silico PCR (UCSC, http://genome.ucsc.edu) and by performing a Primer-BLAST search (http://www.ncbi.nlm.nih.gov/tools/primerblast/). uMELT (Melting Curve Predictions Software) was used for melting curve prediction (https://www.dna.utah.edu/ umelt/umelt.html). Each pair of primers was confirmed to identify only one specific PCR amplicon with a known expected length. Melting curve analysis showed a single peak for all of the primer pairs, which further confirmed their specificity. All primers were synthesised in the Laboratory of DNA Sequencing and Oligonucleotide Synthesis, Institute of Biochemistry and Biophysics, Polish Academy of Sciences (http://oligo.pl/). The primer details for all 16 candidate reference genes are shown in Table 1 [10][11][12].

RNA Extraction and cDNA Synthesis
Total RNA extraction including DNase treatment (Clean-Up RNA Concentrator kit, A&A BIOTECHNOLOGY, Gdynia, Poland) was carried out using the Total RNA Mini Kit (A&A BIOTECHNOLOGY, Gdynia, Poland) according to the manufacturer's instructions. Extracted RNAs were quantified by NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, USA) and the absorbance ratios at 260/ 280 nm and 260/230 nm were measured to control RNA purity. The RNA integrity was checked by electrophoresis in 1.5% agarose gels. Total RNA (2 μg) was reverse transcribed using the High-Capacity RNA-to-cDNA™ Kit (Thermo Fisher Scientific) in a volume of 20 μL, according to the manufacturer's instructions.

Quantitative Polymerase Chain Reaction
The RT-qPCR reactions were performed in the LightCycler® 96 (Roche Diagnostics GmbH, Mannheim, Germany). Each reaction was performed in triplicate in 96-well plates (FrameStar® 480/96 for Roche LightCyler, 4titude® Ltd., Wotton, UK) in a reaction volume of 20 μL. All reactions contained 5 μL of cDNA (10 ng), 10 μL of 3color 2× HS-qPCR Master Mix SYBR (A&A BIOTECHNOLOGY, Gdynia, Poland), 1 μL of 10 mM of each primer and 3 μL of DEPC-treated water. The reaction protocol starts with a 3-min initial denaturation step at 95°C, 40 cycles of 95°C for 15 s and 60°C for 60 s. Subsequently, the melting curve was verified by amplification of a single product, which was generated starting at 65°C and increasing to 99°C by 1°C for each 30-s cycle. Each experiment included a no template control. All experiments described in this article were carried out following strict MIQE guidelines [19] with different types of negative controls. To monitor genomic DNA contamination, no reverse transcriptase control (-RT) was used. To avoid sample contamination and primer-dimer formation that could produce false positive results, no template control (-NTC) was used.

Data Analysis
The quantification cycle (Cq) values were automatically calculated by the qPCR instrument software (LightCycler® 96 Software, Roche Diagnostics GmbH, Mannheim, Germany). The data were analysed using GeneEx 6.1 software (MultiD Analyses AB, Göteborg, Sweden). Different statistical algorithms such as geNorm [14], NormFinder [15], BestKeeper [16] and ΔCt [17] were used to evaluate expression stability from the Cq values of each candidate for reference gene. The RefFinder online tool (http://fulxie.0fees.us/?i=1) was chosen to calculate comprehensive ranking based on the data from abovementioned algorithms [19]. In BestKeeper and ΔCt method analysis, stability of candidate reference genes was calculated directly from raw Cq value, while Cq values were imported to the geNorm and NormFinder software raw after conversion into relative quantities, according to the formula Q = E −ΔCq , in which E = amplification e f f i c i e n c y a n d Δ C q = t h e c o r r e s p o n d i n g C q value-minimum Cq. The genes with the lowest standard deviation (SD) and coefficient of variance (CV) were treated as the most stable reference genes. Measures of expression stability (M value) for candidate reference genes in geNorm were based on the ratio: the level of pairwise variation for each reference gene with all other control genes and the standard deviation (SD) of the logarithmically transformed expression [14].

Selection of Putative Reference Genes
A total of 16 candidate reference genes were selected from the relevant literature related to human neural development [1,2]. The selection included some of the most frequently used reference genes: GAPDH, TUBB3 and ACTB. However, a previous study has shown that these commonly used housekeeping genes (HKGs) are inappropriate in hiPSC due to the substantial variability of their expression during differentiation [3]. The characteristics of candidate reference genes including their function are presented in Table 1 and Supplementary data_1.

Expression Profiles of Candidate Reference Genes
hiPSC cells are derived from one clone and are a very homogeneous population. In the studies described in this manuscript, we used six biological repetitions, and each experiment was performed in triplicate. That means each Cq value is the average of 6 × 3 = 18 repetitions. The absolute Cq values expressed as the mean of average of the replicates (3 independent experiments in 3 replicates and each replicate in 3 RT-qPCR runs). The 25th and 75th percentiles for each candidate reference gene for all studied developmental stages (NSC, eNP, NP) were represented in the form of box plots (Fig. 1). The same data set was analysed separately for each developmental stage (Fig. 1a-c) or for all stages combined (Fig. 1d).
The expression levels of 16 tested candidate reference genes were determined using the quantification cycle (Cq) values. The results showed that ACTB (Cq = 17.066), RPLP0 (Cq = 17.499) and GAPDH (Cq = 18.212) were the highest expressed genes in the NSC stage. The same set of genes was also expressed on the highest level in eNP and NP stages and when combined data for all three stages were taken into account simultaneously. The lowest expressed genes, with the highest Cq values, were NAT1 and UBC. Each candidate reference gene showed expression variation, but this variation is particularly high at the eNP stage of development (Fig. 1b). However, even at this stage, it is possible to find genes that display stable expression. Box plots are useful for comparing distributions between several groups but without making any assumptions about the underlying statistical distribution or any other analysis. To find genes with the most stable expression, it is necessary to use specialised statistical tools. In our study, we used the most popular computational programs: ΔCt method, BestKeeper, geNorm and NormFinder.

Reference Gene Ranking Based on the ΔCt Method
The ΔCt method compares relative expression of "pairs of genes" within each sample and takes into account all possible gene combinations [17]. This method shows a different pair of reference genes for each developmental stage, and yet another one for all stages analysed together. The genes with the most stable expression (with the lowest average standard deviation) were RPLP0, TUBB3 for NSC; RABEP2, UBC for eNP; CAPN10, ZNF324B for NP and TBP, UBC for all stages together (Fig. 2). In general, the expression of selected genes appears to be much more stable at NSC and NP stages of development compared to the eNP stage. Due to high variability at the eNP stage, high variability is also observed when

Reference Gene Ranking Based on BestKeeper
BestKeeper estimates gene expression stability using two parameters: standard deviation (SD) and coefficient of variation (CP) of each gene in all samples [16]. As in the case of the ΔCt method, BestKeeper found different genes with the most stable expression for each stage of development (Fig. 3). TUBB3 and RPLP0 genes were found as the most stably expressed for the NSC stage (SD ( ± CP)) values for these two genes are exactly 0.000, so they are "ideal" reference genes), MYC-for eNP and NP stages and the UBC gene for all three stages of development. In the case of this algorithm, the order of genes is very interesting. The most stable expression, and therefore the best candidate for the reference gene for the eNP and NP stages (MYC), is the worst (with the most unstable expression level) gene in the NSC stage. It is also penultimate in the ranking of all data taken together. The results obtained from BestKeeper did not completely agree with those obtained from the ΔCt method. The reference genes with the last stable expression ACTB for the eNP stage and CCNG1 for the NP stage were ordered exactly like in the ΔCt method.
Reference Gene Ranking Based on geNorm geNorm calculates the stability expression value (M) which is the mean pairwise variation for expression of a gene compared with all other tested potential reference genes. The final result of the stepwise exclusion of genes with unstable expression levels is to select two genes with the most stable expression [14]. geNorm analysis of candidate reference genes in the NSC developmental stage showed that RPLP0 and TUBB3 had the lowest M value of 0.000, suggesting them to be the best and even ideal potential reference genes. In the eNP stage, UBC and TBP had the M value of 0.583 and displayed stable expression. In the NP stage, RPLP0 and UBC presented stable expression (M = 0.323), and finally for all three stages together the best reference genes were UBC and RPLP0 with the stability value of 0.584 (Fig. 4). When using geNorm, it should be kept in mind that it does not correct for co-

Reference Gene Ranking Based on NormFinder
NormFinder, in contrast to other algorithms, takes into account the intra-and the inter-group expression variation. It ranks the set of candidate reference genes according to their expression stability in a given sample set and given experimental design. A lower stability value indicates higher expression stability.

Comprehensive Ranking
Taking into consideration advantages and disadvantages of the four different statistical methods described above, we used the RefFinder (a web-based tool) to calculate an overall comprehensive ranking for the best RT-qPCR quantification comparator from all tested 16 potentially reference genes. The algorithm assigns an appropriate weight to an individual gene and calculates the geometric means of their weights for the overall ranking numbers produced by the ΔCt method, BestKeeper, geNorm and NormFinder. Genes with the smallest geometric means were considered to have the most stable level [20]. As shown in Fig. 6, the gene with the most stable expression for the NSC stage was RPL0, for the eNP-UBC, for the NP-CAPN10 and for all stages together-TBP. The genes with the worst expression stability appeared as MYC, ACTB, CCNG1 and again MYC, respectively.

Optimal Reference Gene Numbers
To determine the optimal number of genes required for normalisation, NormFinder software calculates the standard deviation (SD) for each gene as well as the accumulated standard deviation (Acc. SD). The accumulated standard deviation is an excellent indicator of the optimal number of reference genes. Acc. SD reached its lowest value 0.00651 with up to 13 reference genes for the NSC developmental stage. In this case, using only four reference genes, it increases Acc. SD to the level 0.0134. The use of only two reference genes (Acc. SD = 0.0947) should be sufficient in the eNP developmental stage. For the NP developmental stage and for all stages together, the optimal number of reference genes is 14 (Acc. SD = 0.1069) and 10 (Acc. SD = 0.3235), respectively (Supplementary data_2). For the NSC and NP populations, many genes show low variability and therefore they are

Discussion
The aim of this report was to generate the panel of putative reference genes for NSC, eNP and NP according to gene ontology (GO) in sillico and validate these genes in vitro. So far, reference gene validation studies for these stages of human neural development have not been described.
The RT-qPCR method is considered to be the gold standard in gene expression analysis. Correct determination of the reference genes which are used as a comparator is the most important step of gene expression analysis since wrong selection of the reference gene(s) leads to incorrect data analysis. The finding of a reference gene(s) for hiPS cells during neural differentiation is a particularly difficult task. At the molecular level, the differentiation of iPSC towards neural stem cells and neural progenitors is orchestrated by the sequential expression of distinct sets of genes in specific stages of development. However, to generate valid data from such studies, the correct endogenous control reference gene(s) for normalisation of data must be found.
The selected putative reference genes included the most commonly used (ACTB, GAPDH and HPRT1) and found in the publications referring to different stages of human neural development [1][2][3]. Candidates for reference genes were classified with functional protein association networks software (STRING, https://string-db.org/) by taking into account the following: biological process (GO); molecular function (GO) and cellular component (GO) (Supplementary data_1).
The hiPSC-derived NSC, eNP and NP showed wide variations in housekeeping gene Cq values ranging from 17.00 to 31.50. These wide variations confirm that during neural differentiation, the expression of genes essential for cell metabolism, and therefore used as reference genes, changes dramatically. The arbitrary selection of a single reference gene should be avoided, because these reference genes may be differentially regulated and thus increase the probability of producing false data. The four commonly used algorithms (comparative ΔCt method, BestKeeper, geNorm and NormFinder) to check the potential reference gene expression stability have been employed in this study. Each of the applied methods has led to the selection of different gene as having the most stable expression level. This confirms once again that differentiating cells are a very demanding research model. In simple models, different algorithms give very similar and sometimes even identical results. Table 3 and Supplementary data_4 summarise the gene expression stability values calculated by all methods for all stages of neural development (separately and together). Regardless of the method used, the lowest value attests to grater reference gene expression stability. The numerical values of gene expression stability are considerably lower for NSC and NP developmental stages than for eNP and NSC + eNP + NP. The other general conclusion can be drawn from Table 3: expression of none of the tested genes is the most stable in all differentiation stages. Gene expression in NSC and NP developmental stages is stable; the difference in stability between the most stable and the most unstable gene expression is very small and is observed regardless of the program used (Fig. 2a, c; Fig. 3a, c; Fig. 4a, c; Fig. 5a, c). At present, there is no standard method for the selection of reference genes. Some researchers, with respect to the geNorm algorithm, suggest that genes with M values below 1.5 indicate a good measure of gene level stability [21]. Taking into account this criterion, all examined reference genes at the NSC stage and at the NP stage are good reference genes. Twelve genes at the eNP stage have an M value lower than 1.5. BestKeeper calculates the SD and CV based on the Cq values of each reference gene. Genes with an SD value of < 1.0 have a stable expression, and the gene with the lowest SD and CV values was identified as having the most stable expression level [22]. All potential reference genes at the NSC stage of development meet this criterion, as do all with the exception of one at the NP stage. However, only three genes meet such criteria (of the lowest SD and CV values) in the case  Comparison of the positions of individual genes in the ranking ( Table 2, Supplementary data_3) shows complete compatibility between NormFinder and Comprehensive Ranking. NormFinder [15] combines the advantages of the three other approaches by estimation of both intra-and inter-group expression variations. It directly and robustly evaluates not only gene expression stability but also calculates the optimal number of reference genes required for normalisation, so it should be preferred to the other methods.
Different reference genes were validated for each developmental stage (RPLP0 for NSC, UBC for eNP, CAPN10 for NP). Moreover, the most stable expressed gene for the NSC stage (RPLP0 gene) belongs to the most unstably expressed (13 in the ranking) at the NP stage. None of the genes, RPL0   In this case, the TBP gene is the most stable. In turn, this gene is only eighth in the ranking for the NSC stage (Table 2). Considering all of this data, we suggest using separate reference gene(s) for each developmental stage. Using one for all three stages can lead to false results and misinterpretations. A similar topic was addressed by the recently published article of Artyukhov and colleagues (2017) [23]. This study cannot be compared directly to our results for several reasons. Artyukhov and colleges (2017) [23] analysed candidates for reference genes for different stages of hiPSC neural differentiation as compared to the analysis performed by our group: they used hiPSC, NSC and neurons (terminally differentiated), while our group analysed populations typical for early neural development: NSC, eNP and NP. The other reason is that only four genes (ACTB, GAPDH, HPRT1, UBC) recur in both panels of candidates for reference genes. In addition to different stages of neural development, in both experiments, stem cells were derived from different sources and were cultured under diverse conditions (Table 3).
Nevertheless, the general conclusions of the work of Artyukhov and colleagues (2017) [23] and our results are consistent to show that cell populations at different developmental stages require their own, customised set of reference genes.
In another study, Holmgren and colleagues (2015) [1] analysed candidates of reference genes for human embryonic stem cells (hESC) and human-induced pluripotent stem cells (hiPSCs) differentiated to ecto-, meso-and endoderm; thus, they used different developmental stages of hiPSCs as compared to our study. The most stable genes indicated by Holmgren et al. (2015) (EID2, ZNF324B, CAPN10, RABEP2) were included in our panel of candidate reference genes. Vossaert and colleagues s (2013) [22] analysed candidates for reference genes for hESC after induction of ectodermal differentiation. Some genes used in the study (TBP, GAPDH, HPRT1, ACTB, UBC) were included to our panel of our reference gene candidates because of the early developmental aspect of the investigation; however, this report also did not include stages of hiPSC neural differentiation that were analysed in our study [24]. Finally, our group has concentrated only on the early steps of neural development (NSC, eNP, NP), which are difficult to distinguish and have not been compared in the reports of other groups discussed above. The articles discussed above and our results provide the useful method for the identification of genes appropriate for test panels in many different experimental set-ups.
In summary, the panel of 16 candidates for reference genes proposed here can be successfully used for the prediction of reference genes for three different hiPSC-derived stages of neural development. We found that the NormFinder software package was the most robust method of evaluating reference gene expression stability, because it takes into consideration both the interand intra-variability during stabilisation assessment. Our results also demonstrate that no single reference gene or reference gene combination is suitable for all developmental stages analysed. Therefore each stage of development requires its own panel for optimal normalisation of RT-qPCR data. Furthermore, the results point to the importance of using different algorithms in this type of analysis to guarantee strong confidence in the correct choice of reference genes.
In conclusion, the results of our work emphasise the importance of proper selection of reference genes and the need for the customised validation of their stability in studies in stem cell research.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.