Gene expression analysis is widely used to study various biological processes. Different transcript quantification methodologies exist, all of which rely on utilization of proper normalization techniques. Such analysis requires several variables need to be accounted for, including amount and quality of RNA, enzymatic efficiencies, and differences between tissues or cells in overall transcriptional activity. The current, most universally utilized normalization method for PCR and Northern blotting relies on the use of constitutively expressed endogenous reference genes, often termed ''house-keeping genes.'' It has been shown, however, that the expression of such reference genes can vary significantly between different tissues or conditions, thus making it impossible to use the same reference genes for various tissues [15]. Moreover, the identification of appropriate control genes presents a circular problem, because in order to find suitable genes for normalization, prior knowledge of their stable gene expression is needed, which also relies on properly normalized data.

The conventional use of a single gene for normalization can introduce a significant error in the quantification of transcript levels [6]. It is recommended that an optimal set of reference genes be identified for each individual experimental setting or tissue. In this study, we used whole-genome human microarrays and surveyed the expression of over 39,500 genes in 526 whole-blood samples from control subjects and patients with Tourette syndrome, stroke, migraine, muscular dystrophy, and autism to identify the best candidate control genes. The 10 best candidate control genes, as well as commonly used house-keeping controls and PPIB, were validated with independent samples of whole-blood RNA from both patients with multiple sclerosis (MS) and age- and gender-matched controls by qRT-PCR. GeNorm was used to identify an optimal set of the most stably expressed genes for normalization in whole-blood expression studies [6].


Patients and samples

We used a large cohort of available human Affymetrx array data that we have generated over the last several years from previous and on-going studies (Table 1). Many, but not all, of the subjects included in our analyses have previously been reported in the following studies: stroke [710], Tourette syndrome [1113], controls [1418], migraine [19], muscular dystrophy [20, 21] and autism [22, 23].

Table 1 Characteristics of subjects who had gene expression assessed in whole blood on Affymetrix Human microarrarys.

RNA Isolation and Quality Control

Whole blood was collected and total RNA was isolated using PAXgene tubes and kits (Qiagen, Valencia, CA, USA). RNA samples were examined for concentration and purity using a Nanodrop ND-1000 spectrophotometer, and integrity was checked using the Agilent 2100 Bioanalyzer. High quality RNA with a RIN number above 8.0 was used for the qRT-PCR experiment. The RIN number takes into consideration not only the conventional ratio of 28 S to 18 S ribosomal RNA, but also additional critical regions of the entire RNA electrophoregram.

Affymetrix experiments

Affymetrix Human 2.0 Plus arrays (Affymetrix, Santa Clara, CA, USA), surveying over 54,000 probe-sets, were used in this study. The 54,000 probe-sets represent over 39,500 potential human genes (Affymetrix Manual). The standard Affymetrix protocol was followed for the sample labeling, hybridization, and image scanning [23].

qRT-PCR cDNA synthesis

The cDNA synthesis and real-time PCR was performed at the Lucy Whittier Molecular and Diagnostic Core Facility (University of California, Davis, U.S.A.). The following cDNA synthesis protocol was validated through the Lucy Whittier Molecular and Diagnostic Core Facility using a variety of sample types and species. 10 μl (750 ng) of template RNA per sample, 1 μl gDNA WipeOut Buffer, 1 μl RNase-free water was incubated at 42°C for two minutes and briefly centrifuged. A 1 μL aliquot was TaqMan® analyzed with human GAPDH to confirm all gDNA had been digested. Then 8 μl from the reverse transcription reaction mix (0.5 μl Quantitect Reverse Transcriptase, 2 μl 5× Quantitect RT buffer, 0.5 μl RT Primer Mix, 0.5 μl 20 pmol Random Primers, 4.5 μl RNase free water) was added to each well, incubated at 42°C for 40 minutes, inactivated at 95°C for 3 minutes and briefly centrifuged. Finally, 100 μl of water was added to each well and mixed thoroughly.

Selection of TaqMan Assays

TaqMan assays were selected based on several criteria: 1) only pre-developed and pre-validated gene expression assays were selected from Applied Biosystems (ABI, Foster City, CA, U.S.A.), for which the company guarantees amplification efficiency over 96%. 2) Only _m1 assays (spanning exon -exon boundaries) were selected to ensure no genomic DNA will be amplified. 3) If available, the assays were preferentially selected to amplify the same target sequence where the Affymetrix probe-set was located. 4) If a gene selected for qRT-PCR validation was represented by multiple probe-sets on the Affymetrix array, the standard deviation (SD) of the expression of all probe-sets was inspected and if some were found to have a high standard deviation, their position in the reference sequence was avoided.

Real-Time TaqMan®PCR

Pre-validated gene expression assays were purchased from Applied Biosystems. See Table 1 for the Assay and Gene IDs. An aliquot of the cDNA from a subset of samples was TaqMan® analyzed with human GAPDH prior to profiling. TaqMan® analysis was done in duplicate reactions using the low density array format as described by Osman et al [24]. This was validated and processed through the Lucy Whittier Molecular and Diagnostic Core Facility (1). The samples were placed in a 384 well plate and amplified in an automated fluorometer (ABI PRISM 7900 HTA FAST, ABI). ABI's standard amplification conditions were used: 2 min at 50°C, 10 minutes at 95°C, 40 cycles of 15 seconds at 95°C and 60 seconds at 60°C. Fluorescent signals were obtained during the annealing temperature and Ct values exported with a threshold of 0.1 and a baseline of 3–10.

Data Analysis

Affymetrix Chips

All 526 arrays were summarized using the GC-RMA algorithm [25]. They were normalized within GeneSpring 7.3 software (Agilent Technologies, Palo Alto, CA) using a three-step normalization procedure, including data transformation, per chip and per gene normalization (GeneSpring Manual).

qRT-PCR GeNorm Analysis

All samples for which the standard deviation (SD) of the technical duplicates was ≥ 1.41 (or higher than 2-fold change) were considered missing data points and were not included in the analysis. The GeNorm tool was used to calculate the internal control genes stability measures (M) [6]. In short, input data was generated using the comparative Ct-method [26], where relative expression levels were calculated by setting the lowest Ct value of a particular gene to 1, and the relative expression levels for the same gene in the rest of the samples were calculated. Then for every control gene, the pair-wise variation with all other control genes is estimated in GeNorm as the SD of the log-transformed expression ratios. The internal control gene stability measure M is defined as the average pair-wise variation of a particular control gene with all other control genes. Stepwise exclusion of the gene with the highest M value is performed. Because a pair of control genes is needed to calculate M, the two most stably expressed genes cannot be ranked any further. The optimal number of control genes for normalization is determined by calculating the pair-wise variation Vn/Vn+1 between each set of two sequential normalization factors [6]. Consistent with the Vandesompele recommendations, the minimum number of genes for which the V was smaller than 0.15, was used to define the optimal set of control genes for normalization. A normalization factor is calculated based on the geometric mean of the optimal set of control genes.

qRT-PCR NormFinder Analysis

The NormFinder Excel plug-in [27] was used as an alternative algorithm to the GeNorm algorithm for ranking the expression stability of the control genes. It ranks the candidate control genes by estimating their expression stability while also taking into account the experimental design (control and disease groups). The input qRT-PCR results were the linearized quantities (2-ΔCt). The main differences between the GeNorm algorithm [6] and the model-based approach in NormFinder [27] is that the latter takes into account the inter- and intra-group variation, as well as the estimation of variances.

Analysis of Variance (ANOVA)

Partek Genomics Suite (St. Louis, MI, USA) was used to perform an ANOVA using the Diagnosis as fixed and the Plate as a random factor (μ + Diagnosis + Plate + (Diagnosis × Plate) + ε). The input data were the normalized data to the geometric average of the optimal set of control genes (TRAP1, DECR1, PPIB, and HMBS) using the comparative Ct method.

ΔΔCt Method

The ΔΔCt method [28] was used to calculate the normalized calibrated data. TRAP1, identified by GeNorm as one of the two most stably expressed genes was used as a normalizer. The average of the healthy controls was used as the calibrator. In short, the Ct value of the normalizer was subtracted from the Ct value of each sample (ΔCt). Then the value of the calibrator was subtracted (ΔΔCt) and the linearized values were calculated (2-(ΔΔCt)). A prerequisite for using this method of analysis is that the normalizer and the gene of interest have very similar amplification efficiency. The ABI assays used in this study have been pre-validated by the company to have over 96% amplification efficiency.


Patients and samples

Of the 526 samples, the mean ages of the subjects for each study varied from age 4 (autism study patients and controls) to age 68 (stroke study patients) with a range of 2 to 78 years of age (Table 1). There was a predominance of males (387) to females (139). Control subjects had no known medical, surgical, or psychiatric illnesses. Study subjects had a variety of medical illnesses with varying pathophysiologies, including neurodevelopmental and neurological disorders (autism, mental retardation, Tourette syndrome), stroke (ischemic brain injury), muscular dystrophy (genetic muscle disease), migraine headaches, as well as a presumed autoimmune disease, multiple sclerosis (Table 1).

Selection of Reference Genes from Whole-Genome Affymetrix Arrays

Our goal was to identify stably expressed genes with a broad range of expression levels from our Affymetrix data on the 526 samples (Table 1). We chose the 20 most stably expressed genes from each of five intensity intervals (n = 100 genes): log2-transformed intensity values between 5–6, 6–7, 7–8, 8–9 and 9–10 (Figure 1). For a list of the most stable genes over all of the expression levels, see Additional file 1. Based on our selection criteria, these 100 most stably expressed genes covered a wide range of raw expression values from 32 to 1024. Functional categorization using the Database for Annotation and Visualization, and Integrated Discovery (DAVID, showed that 50% of the annotated genes are involved in primary metabolic processes. For the list of the 100 most stably expressed genes on the 526 microarrays see Additional file 1.

Figure 1
figure 1

The Average log 2 - intensity (x-axis) of 39,500 genes from a total of 526 Affymetrix Human 2.0 Plus microarrays is plotted against the standard deviation of the log 2 - intensity (y-axis) of those genes. Probe-sets highlighted in red are the 100 probe-sets with the lowest standard deviation within an interval of the log2-transformed intensity values between 5 and 10.

Selection of Candidate Control Genes for Validation by qRT-PCR

The set of 100 stably expressed genes on microarrays was filtered down to probe-sets with available gene annotation, as well as to probe-sets detecting only a single known gene (Affymetrix annotation *_at). Based on these criteria, 23 probe-sets were identified (data not shown). From them, 10 candidate genes were selected for validation with, on average, two genes per intensity interval. These 10 candidate genes included TRAP1, DECR1, FPGS, FARP1, MAPRE2, PEX16, GINS2, CRY2, CSNK1G2 and A4GALT. In the validation, we also included four commonly employed reference genes, ACTB, GAPDH, B2M and HMBS. In addition, PPIB was included because it was previously reported to be stably expressed in blood [3]. Each gene and ABI assay information is presented in Table 2. The average expression values from the Affymetrix array data of the genes selected for validation by qRT-PCR are shown in Figure 2. The range of the SDs of the the log2-transformed intensity values on the microarrays was between 0.10 and 0.42 for the 10 candidate reference genes on the microarrays (solid bars, Figure 2). The SDs for the remainder genes (open bars, B2M, ACTB, GAPDH, PPIB and HMBS) ranged between 0.47 and 0.72 on the microarrays (Figure 2). Based on the Affymetrix array data alone, FARP1 was the most stably expressed. A general trend of increasing SD with increasing intensity was noted for the candidate reference genes (Figure 2).

Figure 2
figure 2

Average expression calculated from the 526 Affymetrix Human 2.0 Plus microarrays for the candidate control genes selected for qRT-PCR validation. Average of the log2-transformed intensity values ± standard deviation (y-axis) is plotted for each gene. Gray columns represent the candidate control genes selected from Figure 1. White columns represent commonly used house-keeping genes and PPIB.

Table 2 The panel of candidate reference genes chosen for validation by qRT-PCR.

qRT-PCR Analysis and Expression of Candidate Control Genes

The analysis flowchart presented in Figure 3 describes the steps used to analyze the qRT-PCR data. The subset of 15 candidate reference genes (Table 2), validated on qRT-PCR, exhibited a broad range of expression levels from 17.93 to 36.66 cycle threshold (Ct) values (Table 3, Figure 4). Samples for which the SD on the technical replicates was higher than 1.4 (a fold-change higher than 2) were considered unreliable data and were not included in the downstream analysis, thus reducing the sample size for some assays to lower than 40 (Table 3). Genes that had SD ≤ 1.0 on the average raw Ct values of all samples were considered to be stably expressed. Based on that criteria, TRAP1, B2M, DECR1, PPIB, FPGS, HMBS and FARP1 (Table 3, Figure 4, below the dotted line) showed stable expression over all 40 samples. In contrast, MAPRE2, PEX16, GINS2, ACTB, GAPDH, CRY2, CSNK1G2 and A4GALT did not show stable expression over the samples (Table 3, Figure 4 above the dotted line). The stably expressed genes covered a wide range of expression levels, from a high level of expression of 17.9 Ct for B2M, to low levels of 35.71 Ct for FARP1 (Table 3, Figure 4).

Figure 3
figure 3

Flow chart of the data analysis performed for the qRT-PCR studies.

Figure 4
figure 4

Expression (average raw Ct, x axis) is plotted versus Standard Deviation of average raw Ct (y axis) for all of the candidate control genes. qRT-PCR was performed on RNA samples from 20 multiple sclerosis (MS) patients and 20 age and gender matched controls. The average expression and its standard deviation was calculated for each gene for all of the patients combined (n = 40). (Ct = Cycle threshold).

Table 3 qRT-PCR on multiple sclerosis patients and normal controls

Expression Stability and Ranking Analysis

To identify the most suitable set of genes for normalization in whole-blood, the average expression stability (M) for those reference genes which showed SD of raw Ct-values ≤ 1.0 on all samples was calculated using the GeNorm tool (see Methods). Those genes included TRAP1, FPGS, B2M, DECR1, PPIB and HMBS (Figure 4). All of these genes also had a SD of the Ct-values for the technical duplicates of ≤ 1.41 and a SD of the raw Ct values of all samples ≤ 1.0. The FARP1 gene was omitted from the analysis of the expression stability because GeNorm does not allow missing values. However, re-analysis where FARP1 was included with the rest of the five stably expressed genes (with n = 35) did not change the final results (data not shown).

An important pre-requisite for estimating the Expression Stability Measure is that the genes are not coordinately regulated. The Gene Ontology terms of the six genes, analyzed in GeNorm, are presented in Table 4. These genes are involved in different biological process, molecular functions and/or are located in different cellular compartments.

Table 4 Gene Ontology (GO) Annotation of genes analyzed in GeNorm

The GeNorm algorithm calculates the M value of a gene based on the average pair-wise variation between all genes in the analysis. Figure 5A plots the average expression stability of the six reference genes. This curve was generated by a step-wise exclusion of the least stable reference gene. It shows that B2M was the least stable, while TRAP1 and FPGS were the two most stably expressed genes in whole blood of patients with MS and gender and age matched controls.

Figure 5
figure 5

Identification of the most stably expressed reference genes using GeNorm analysis. RNA samples from whole-blood from 20 MS patients and 20 matched controls were analyzed using qRT-PCR. A. Average expression stability values (M) of the reference genes during step-wise exclusion of the least stable control gene. The x- axis shows the gene with the highest M value (the lowest stability) for this set of genes. The least stable gene was excluded after each iteration. B. Determination ofthe optimal number of reference genes for normalization. V is the pair-wise variation of two sequential normalization factors. The least number of genes for each V<0.15 was selected as the optimal set of genes for normalization [6]. In this study, the four most stably expressed genes (TRAP1, FPGS, DECR1 and PPIB) represent the optimal set of genes for normalization.

Determination of the optimal number of reference genes for normalization of the dataset uses a metric called V. It is the pair-wise variation of two sequential normalization factors. For consistency with the recommendations of Vandesompele et al [6], the least number of genes for which V<0.15 should be selected as the optimal set of genes for normalization. In the study, the four most stably expressed genes, TRAP1, FPGS, DECR1 and PPIB, represent the optimal set of genes for normalization (Figure 5B). We estimated the gene expression stability and ranking of the control genes using an alternative algorithm (NormFinder). The relative ranking of the genes was essentially the same as the one obtained using GeNorm. The most stable gene was TRAP1 (stability value 0.066), followed by FPGS (0.067), DECR1 (0.078), PPIB (0.079), HMBS (0.095) and B2M (0.108). The main differences between the GeNorm algorithm (reference) and the model-based variance estimation approach in NormFinder [27] is that the latter takes into account the inter- and intra-group variation and uses direct estimation of variances as opposed to pair-wise variation.

Analysis of Variance (ANOVA)

To identify genes that showed differential expression between the MS subjects and the healthy controls, an ANOVA was performed on the qRT-PCR data. Diagnosis and each PCR-plate were included as factors. CSNK1G2 showed significant differences in expression between MS and control samples (Table 5, p < 0.05). However, if multiple-comparison correction is applied to this p-value, it is not statistically significant. Expression of the remaining genes did not differ significantly between the matched pairs (Table 5). No significant p-values (p ≤ 0.05) for plate, nor for the Diagnosis-versus-plate interaction were found, suggesting no significant technical bias in the study (data not shown). The fold-change of expression between the MS and the healthy controls for the stably expressed genes (Figure 4, below the dotted line) was closest to 1.0 for FPGS (Table 5). ACTB and A4GALT fold-changes could not be estimated in Partek due to multiple missing values. The greatest deviation from the 1.0 ratio for all the genes was observed for CSNK1G2. We also used the ΔΔCt method [28], as described in the Methods section, to calculate fold-change. B2M showed the smallest deviation from the 1.0 ratio for the most stably expressed genes (Table 5). TRAP1, identified by GeNorm as one of the most stably expressed genes, was used as the Normalizer; therefore, its ratio is not relevant. The highest deviation from the 1.0 ratio for all the genes was observed for CSNK1G2, as expected.

Table 5 Diagnosis effect on the candidate control genes.

Data on the four genes, which constitute the optimal set of normalizers (TRAP1, FPGS, DECR1 and PPIB) on the MS and healthy control samples are shown separately as box-and-whisker plots (Figure 6). Data for both the ΔΔCt and Comparative Ct methods for MS versus control patients are shown in Table 5, with similar results being obtained with both methods.

Figure 6
figure 6

RNA expression values are shown for the four most stably expressed genes (TRAP1, FPGS, DECR1 and PPIB) for multiple sclerosis (MS) and controls subjects. Circles represent individual qRT-PCR Ct average raw values for MS samples (blue, n = 20) and age and gender matched controls (red, n = 20). Boxes represent the 25th and 75th quartile and lines represent the median. Whiskers represent the range of data for the 10th and 90th quartile.


Gene expression techniques such as qRT-PCR, microarrays and Northern blotting require accurate normalization methods in order to obtain reliable, quantitative data. A common approach is to use an endogenous reference gene. The purpose of the reference gene(s) is to remove differences not attributable to real biological variation. However, numerous reports indicate that differences in the expression levels of commonly used endogenous reference genes vary considerably between different tissues and between different experimental conditions. Thus, there are no universal reference genes for all tissues or experimental conditions [2, 3, 2935].

To address this problem, the best reference genes must be determined for each individual tissue or experimental setting. Moreover, a combination of several endogenous control genes produces more reliable normalization than any single control gene [6]. We therefore examined the expression of over 39,000 genes in 526 samples of whole blood from men, women and children that included healthy controls and individuals with a variety of different diseases. The 100 most stably expressed probe-sets were identified based upon having the least variance across all of these samples over a broad range of expression values. These 100 genes could also be useful as endogenous internal controls for microarray studies – an approach not commonly used at present for microarray studies. Fifty percent of the annotated genes were involved in primary metabolic processes. Thus these genes appear to be good ''housekeeping genes'' for human blood because of their stable expression across various ages, genders, and diseases.

qRT-PCR validation was then performed on 10 candidate reference genes, four commonly used reference genes (ACTB, GAPDH, B2M, and HMBS) and PPIB, which is reported to be stably expressed in blood [3]. The two frequently used control reference genes in nervous tissue, ACTB and GAPDH, were not stably expressed under our experimental conditions and are therefore not suitable reference genes for human whole blood. They have also been shown to vary in a variety of experimental settings and tissues [2, 33, 3639], including in whole blood [3]. B2M and PPIB showed stable expression in our study. They were not identified as stably expressed in our Affymetrix data because their expression was higher than our upper limit for selection. PPIB was also reported to be stably expressed in whole blood of patients with local and systemic inflammatory syndromes and healthy controls [3]. HMBS had a high SD based on our Affymetrix arrays on the 526 samples (SD of log-2-transformed values = 0.72). For comparison, the most stably expressed genes in the same intensity interval had a standard deviation (SD) of the log-2-transformed values around 0.2. HMBS, however, showed stable expression when using qRT-PCR in the MS versus healthy controls experiment. Closer inspection of the expression stability on the 526 samples from the Affymetrix chips showed that the HMBS SD of the log-2-transformed values for most of the studies was around 0.5, but it was 0.94 for the stroke study. This result suggests that there were study-specific differences for stroke versus the other diseases, leading to its unstable expression. Our conclusion that HMBS was unstably expressed across our 526 samples is consistent with a previous study reporting that HMBS was unstably expressed in leukocytes [6]. Among the 10 candidate reference genes, TRAP1, DECR1, FPGS, and FARP1 showed the most stable expression.

In most research studies, the target genes or genes regulated within the study are expressed at different levels. In such situations, it is preferable if the comparable endogenous or "housekeeping genes" are expressed at comparable levels as the target gene. This is essential if the microarray, RT-PCR or Northern blotting platform is non-linear at very low or at very high levels. Thus, having a set of endogenous reference genes that are stably expressed at different levels makes it possible to help select those whose expression is comparable to the target genes.

An optimal set of reference genes was identified based on the expression stability measure M as utilized in the GeNorm algorithm. It is based on the concept that the ratio between two ideal reference genes should always be constant regardless of the experimental conditions used. An important assumption here is that the genes in the analysis are not coordinately regulated and have independent functions. The six genes analyzed in GeNorm, seem to be involved in different biological processes, molecular functions and/or are located in different cellular compartments (Table 4). TRAP1 and PPIB are involved in the same biological process however their cellular compartmentalization is different. TRAP1 is a mitochondrial molecular chaperone also known as heat shock protein 75 (Hsp75). PPIB, encoding the cyclophillin B protein, is a cytoplasmic peptide isomerase. The broad spectrum of functions of the stably expressed genes makes it less likely that they are coordinately regulated.

An advantage of the approach taken in this study is that we performed a whole-genome survey of a large number of subjects of different ages, genders and diseases to identify the most stably expressed genes in whole blood. Drawbacks to this approach include the fact that it is an array-based discovery method that is less likely to reliably detect very low expressing genes. In addition, the subjects chosen do reflect some bias related to the diseases studied, and reflect other biases in the subject selection – such as predominantly hospitalied and male patients.

It should be emphasized that the ultimate usefulness of the proposed endogenous reference genes for whole blood has not been demonstrated in this study. To do this, future studies must show that after correcting technical and individual variability according to the control genes, the discriminatory power of the diagnostic genes should be substantially improved.

It is important to point out the utility of the reference genes identified here for use in clinical human testing. Should the reference genes identified in this study provide sufficiently accurate normalization, then genes of interest could be normalized to these reference genes in healthy individuals and in individuals with a given disease or physiological condition. Once reference expression levels and standard deviations are derived in a large group of healthy individuals, deviations could be identified in target individuals using these endogenous genes for normalization without the need for repeated samples from normal, healthy individuals. For non-clinical research studies, such normalization to the same endogenous reference genes would allow for comparison across studies.


We took advantage of the large amount of expression data on 33,000 genes from 526 individuals to select genes that had stable expression in whole blood. This gave us an added advantage to assess not only the gene expression of commonly used housekeeping genes but to perform a whole genome survey for stably expressed genes over multiple ages, disease conditions and different expression levels. Ranking based on the expression stability from qRT-PCR data of the candidate control genes and estimation of the minimum number of genes needed for nomalization showed that the fewest, most stably expressed genes needed for acurate normalization in RNA expression studies of human whole blood is a combination of TRAP1, FPGS, DECR1 and PPIB. These genes should be useful as normalization genes for a wide-range of whole blood expression studies in humans.