Background

Methamphetamine (METH), an N-methyl derivative of amphetamine commonly abused recreationally, is a powerfully addictive psychostimulant that affects the central nervous system (CNS) dramatically [1]. Dependence on the drug has risen to an epidemic level worldwide [2], and in 2007, 529,000 Americans (ca. 0.2% of the US population) were METH users [3].

METH induces long-term changes in behavior, including sensitization and dependence [4],[5], as well as deficits in cognitive function [6]-[8], and causes psychiatric symptoms such as hallucinations and delusions [9]. Its use and abuse have been associated with several significant health risks, including cardiac dysrhythmia, stroke, high blood pressure, hyperthermia, and CNS abnormalities [10],[11] that are thought to reflect changes in the signaling and metabolism of neurotransmitters such as dopamine, serotonin, and glutamate [10],[12]-[16].

Unfortunately, no efficacious medication for METH dependence has been developed to date [17]. There is a great need, not only for novel treatments, but for understanding of its molecular mechanisms. Topiramate (TPM), a sulfamate-substituted derivative of the monosaccharide D-fructose [18], has been efficacious in the treatment of alcohol dependence [19] and in promoting smoking cessation among alcohol-dependent smokers [20]. A preliminary study suggests that it also may be useful for treating cocaine dependence [21]. These therapeutic effects have been attributed to its hypothesized potential to reduce the release of cortico-mesolimbic dopamine, the neurotransmitter primarily responsible for the acquisition and maintenance of drug-seeking behaviors for the majority of abused drugs, including amphetamines. Thus, TPM might be efficacious for treating METH dependence [22]. However, the effects of TPM on METH-dependent subjects seem to be complex. Whereas earlier studies have not uncovered any deleterious interactions between TPM and METH with respect to cognitive performance, attention, or concentration, TPM tends to enhance METH-induced increases in attention and decrease perceptual-motor function [22]. Also, TPM accentuates markedly the positive subjective effects of METH, although not craving or reinforcement [23]. Although several hypotheses have been offered on the basis of clinical laboratory studies for the effects of TPM on METH dependence [22]-[25], the molecular mechanisms remain unclear.

In a recently completed double-blind, multi-center, placebo-controlled clinical trial of the treatment of METH dependence with TPM, mixed results were obtained [26]. Thus, although TPM did not increase abstinence from METH use, it significantly reduced urine METH concentrations and observer-rated severity of dependence [26]. From this trial, a genome-wide expression analysis was conducted on RNA extracted from the blood of participants, with the goal of identifying differentially expressed genes and pathways in the responders and non-responders. Such a global gene expression investigation not only provides evidence at the molecular level explaining the interaction of TPM and METH but also may help us to evaluate the pharmacological effect of TPM on METH dependence.

Results

Grouping study participants used for transcriptome analysis

On the basis of 209 chips that passed quality control, 49 participants in the placebo group and 50 in the TPM group were included. According to the criteria for primary efficacy outcome [26] (also see Methods), these participants were classified as responders or non-responders. For these participants, only 43 had a gene expression study at all three time points, 27 and 24 of which could be classified as responders or non-responders, respectively. For the other 16 participants, either no valid urine samples were tested or the patients were excluded for other reasons at Weeks 8 and 12 (see Additional file 1: Figure S1). To increase the sample size, we included some participants having valid gene expression data at Week 8 but not at Week 0 (baseline) among the Week 8 samples, as well as those participants with valid gene expression data at both Weeks 8 and 12 but not at baseline. Finally, we identified 5 responders and 17 non-responders in the Week 8 TPM group, 4 responders and 17 non-responders in the Week 8 placebo group (see Additional file 1: Figure S1A), 6 responders and 11 non-responders in the Week 12 TPM group, and 2 responders and 13 non-responders in the Week 12 placebo group (see Additional file 1: Figure S1B).

Identification of genes differentially expressed in responders and non-responders at Weeks 8 and 12

At a significance level of 0.01, we identified 1,054 (FDR: 0.009 ± 0.010; range <1 × 10−5 - 0.035), 502 (FDR: 0.027 ± 0.021; range: <1 × 10−5 - 0.070), 204 (FDR: 0.113 ± 0.034; range: 0.003 - 0.160), and 404 (FDR: 0.033 ± 0.024; range: <1 × 10−5 - 0.084) differentially expressed genes between responders and non-responders for the Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo groups, respectively (see Additional file 2: Tables S1-S4 for details). Of these four groups, the Week 8 TPM group had the lowest FDR. To take into account the number of genes tested in the four groups, 159, 38, 2, and 21 genes, respectively, remained significant at Bonferroni-corrected P values < 0.05.

In the Week 8 TPM group, 159 genes were significantly changed with a Bonferroni-corrected P value of < 0.05, with 97 being up-regulated and 62 down-regulated comparing positive and negative responders. Importantly, none of these 159 genes overlapped with the 38 genes detected in the Week 8 placebo group at a Bonferroni-corrected P value of 0.05 (Additional file 2: Table S2). Tables 1 and 2 show, respectively, the representative up-regulated and down-regulated genes whose functions are related to cell adhesion/motion, nervous system development and function/synaptic plasticity, signal transduction, ubiquitination/intracellular protein transport, mitochondrial function/metabolism and energy pathways, and immune system function categories.

Table 1 A list of 48 representative genes significantly up-regulated in week 8 topiramate group a
Table 2 A List of 45 representative genes significantly down-regulated in week 8 topiramate group a

In the Week 12 TPM group, we detected only two genes, ITCH and MKNK2, whose expression remained significant after Bonferroni correction for multiple testing (Additional file 2: Table S3). Both were down-regulated by TPM and are in the nervous system development and function/synaptic plasticity category. Although the exact reason is unknown, we suspect that the small size of the Week 12 TPM group might have contributed. None of them overlapped with those 21 genes changed by placebo at Week 12 with Bonferroni-corrected P values < 0.05 (Additional file 2: Table S4).

Pathways identified by IPA

The differentially expressed genes were subjected to pathway analysis using the IPA. A total of 114, 41, 54, and 25 pathways with at least three genes overexpressed were enriched at a nominal P value of < 0.05 between responders and non-responders for Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo, respectively. Among these pathways, 21 significantly enriched pathways with an FDR of < 0.05 at either time point or FDR < 0.10 at both time points were shared exclusively by the Week 8 and Week 12 TPM groups (Table 3), suggesting they are more likely to be the pathways related to the treatment effect of TPM in METH-dependent subjects. No significantly enriched pathways were shared exclusively by the Week 8 and Week 12 placebo groups, with FDRs < 0.10 at both time points. Although 163, 149, 137, and 120 pathways were detected for the Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo groups, respectively, at a significance level of 0.05, only 46, 5, 6, and 0 pathways remained significant after Bonferroni correction for multiple testing. A comparison of these significant pathways after correction for multiple testing revealed that only two pathways (i.e., B-cell receptor signaling and renin-angiotensin signaling) were shared exclusively by the Week 8 and Week 12 TPM groups, and no pathways were shared by the Week 8 and Week 12 placebo groups.

Table 3 Significantly enriched pathways detected exclusively in week 8 and week 12 topiramate groups (n = 27)a

Pathways identified by onto-tools pathway-express

Next, we performed pathway analysis on the nominally significantly expressed genes using Onto-Tools Pathway-Express. A total of 47, 21, 32, and 25 KEGG pathways with at least three overexpressed genes were enriched at nominal P values < 0.05 between responders and non-responders for the Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo groups, respectively. Among them, eight significantly enriched KEGG pathways with FDRs < 0.05 at either time point or FDRs < 0.10 at both time points were shared exclusively by the Week 8 and Week 12 TPM groups (Table 3). Comparing the pathways detected by Onto-tools with those detected by IPA, we found three were shared: synaptic long-term potentiation, Fc epsilon RI signaling, and natural killer-cell signaling. In contrast, no significantly enriched KEGG pathways were shared by the Week 8 and Week 12 placebo groups, with FDRs < 0.10 at both time points. Again, although 81, 64, 60, and 65 pathways were detected for the Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo groups, after Bonferroni correction, only 19, 5, 3, and 5 pathways remained significant. Furthermore, only two pathways (i.e., MAPK signaling and T-cell receptor signaling) were shared exclusively by the Week 8 and Week 12 TPM groups, and no pathways were shared by the Week 8 and Week 12 placebo groups.

Combining the results of IPA and Onto-Tools Pathway-Express, at the nominal P values < 0.05 and further restricting by FDRs < 0.05 at either time point or < 0.10 at both, a total of 27 pathways were identified (see Table 3). These pathways are involved in a spectrum of physiological functions: some are associated mainly with signal transduction (Fc epsilon RI signaling, LPS-stimulated MAPK signaling, p38 MAPK signaling, and SAPK/JNK signaling), whereas others are related to cardiovascular function (cardiac hypertrophy signaling, and renin-angiotensin signaling), and inflammation/immune function (B-cell activating-factor signaling, CCR3 signaling in eosinophils, CCR5 signaling in macrophages, chemokine signaling, CXCR4 signaling, epithelial cell signaling in Helicobacter pylori infection, natural killer cell signaling, and role of PKR in interferon induction and antiviral response).

The essential pathways related to neuronal function/synaptic plasticity include alpha-adrenergic signaling, ephrin receptor signaling, ErbB signaling, FGF signaling, GnRH signaling, mTOR signaling, neurotrophin/TRK signaling, and synaptic long-term potentiation. The genes in the synaptic long-term potentiation pathway that were changed by TPM at Week 8 and Week 12 are depicted in Figure 1.

Figure 1
figure 1

Enriched synaptic long-term potentiation canonical pathway , identified by ingenuity pathway analysis based on differentially expressed genes ( P value<0.05 ) with the ordinary students t -test. The pathway was also detected by onto-tools pathway-express. (A) Week 8 TPM group (29 genes: ATF2, CAMK2D, CAMK2G, CREB1, EP300, GNAQ, GRINA, MAP2K1, MAPK1, MAPK3, PLCB2, PPP1CA, PPP1CB, PPP1CC, PPP1R10, PPP1R12A, PPP1R14B, PPP1R7, PPP3CB, PPP3CC, PRKACA, PRKACB, PRKAR1A, PRKCD, PRKCH, PRKCI, PRKCQ, PRKCZ, and RRAS); and (B) Week 12 TPM group (10 genes; ATF4, CREB5, EP300, GNAQ, KRAS, PPP1R10, PRKACB, PRKAR2A, PRKCB, and PRKCQ). Symbols with a single border represent single genes; those with a double border represent complexes of genes or the possibility that alternative genes might act in the pathway. Red symbols represent up-regulated gene clusters and green symbols represent down-regulated clusters.

Discussion

The current study is the first genome-wide expression investigation into the effects of TPM for the treatment of METH dependence. By profiling genome-wide expression patterns in human white blood cells from METH-dependent subjects who received either oral TPM or placebo, we identified various number of genes that are differentially expressed between responders and non-responders in the TPM-treated and placebo control groups. Further clustering of these altered genes according to their function revealed the significantly enriched pathways governing neuroplasticity and neurotoxicity/neurodegeneration (see Figure 2). Given the primary purpose of this clinical trial, in this discussion, we focus primarily on how TPM may regulate molecular pathways of synaptic plasticity underlying METH’s reward and reinforcing effects that influence abstinence.

Figure 2
figure 2

Integrated model of the biological pathways related to TPM treatment for methamphetamine addiction. The joint effects of TPM and methamphetamine act on multiple molecular pathways that eventually result in modulations of neuroplasticity and neurotoxicity/neurodegeneration, which have a combined effect on cognitive/behavioral function. Pathways enriched exclusively in the TPM responder groups at Weeks 8 and 12 are highlighted in gray.

Exposure to drugs of abuse triggers various gene expression changes resulting in complex neural adaptations that determine the addictive properties of abusive drugs [27]. Among these changes, modifications in long-term synaptic potentiation (LTP) of neuroplasticity are fundamental in instilling reward and reinforcing the drug effects [28]. With evidence from decades of molecular research, it is established that METH alters LTP through activation of dopamine or glutamate surface receptors or both [29] that are linked to the intracellular signal transduction extracellular-signal-regulated-kinase (ERK) pathway. Among the surface neurotransmitter receptors regulating this pathway, the only receptor gene we found to be differentially expressed in the responders and non-responders to TPM was ionotropic glutamate receptor N-methyl D-aspartate-associated protein 1 (GRINA). This protein is a subtype of N-methyl D-aspartate (NMDA) receptors that are antagonized by TPM [30],[31] and activated by METH [32],[33]. In the TPM responder group, GRINA was down-regulated at Week 8, implying fewer NMDA receptors at the synapse. Although expression of none of the other primary target genes of TPM (such as GABAA and AMPA/kainate glutamate receptors) was altered by TPM, three genes coding for membrane trafficking proteins (DLG1, SNAP23, and TRAK2) associated with these receptors were up-regulated in TPM responders compared with non-responders and placebo-treated subjects. The DLG1 gene encodes a synaptic scaffolding protein (also known as synapse-associated protein 97) involved in synapse formation [34] and trafficking of AMPA [35],[36], kainate [37], and NMDA [38],[39] glutamate receptors. SNAP23 is a scaffolding protein that aids in stabilizing NMDA receptors at the neuronal surface [40]. The TRAK2 product is involved in GABAB-receptor trafficking [41]. In rat neocortex, DLG1 mRNA is up-regulated by the NMDA antagonist phencyclidine but not by METH [42]. Considering these factors, it is possible that TPM-associated alterations in the expression of AMPA, kainite, and GABAA receptors at the neuronal surface are more likely to be further governed by post-transcriptional modifications such as receptor phosphorylation and trafficking to and from the synaptic membrane, rather than through alterations in their transcription.

Once METH activates its neural receptors, they activate ERK via the upstream cytoplasmic regulators of the ERK pathway; activated ERK translocates from the cytoplasm to the nucleus and phosphorylates cAMP response element binding protein (CREB) [27] to facilitate METH-induced gene expression, serving as the mediator between the nucleus and the target receptors of METH at the neuronal surface [43]. The drug might activate the ERK pathway via either up-regulation of gene transcription, post-transcriptional activation of protein phosphorylation, or both. In the present study, expression of several genes of the ERK pathway was down-regulated in responders at Week 8 of TPM treatment compared with non-responders and the placebo-treated subjects (see Figure 1A), suggesting a “reversal” of METH-induced up-regulation of ERK pathway genes. At Week 8, these TPM-related down-regulated genes included ERK-1 (MAPK3) and its upstream regulators, protein kinase A (PRKACA), protein kinases C and Z (PRKCD and PRKCZ), Ras-related genes (ARHGEF2, RHOT2, and RRAS), and EP300, which encodes a transcriptional co-activator that forms a complex with CREB-binding protein (CBP). By Week 12, besides EP300, the transcription factor CREB gene CREB5 expression was down-regulated in TPM responders.

Given these findings, it is reasonable to hypothesize that (1) reversal of ERK and CREB over-expression results in blocking of METH-dependent transcription activity and consequently disruption of METH-induced LTP and (2) non-responders may harbor variants that affect expression of genes that are down-regulated in TPM responders. These possibilities have gained support from several lines of evidence reported by other investigators. For example, Narita et al. [29] demonstrated that blockade of protein kinase C (PKC) abolishes behavioral sensitization to METH. Human laboratory studies have indicated a partial inhibition of METH’s reinforcing effects by TPM at the same dosage used in the current study [23]. More importantly, our findings corroborate the concept that TPM would be a possible treatment for METH addiction through facilitating the inhibitory effects of GABA and blocking glutamate excitatory effects on dopamine neurons [22],[23].

Apart from pathways governing neural plasticity, the functional category with the largest number of affected pathways was in immune function (see Table 3). Data on TPM’s effects on immune mediators is sparse, with a few studies emerging recently. Among the ten immune function-related pathways detected in the current study, only the T-cell receptor signaling pathway has been reported previously to be regulated by TPM [44]. A common feature of all these pathways and the pathways governing neuroplasticity is their use of the mitogen-activated-protein-kinase (MAPK) pathway as the central component. As this is not the primary focus of this report, a detailed discussion of those immune-related pathways will not be provided here.

The reliability of our findings is strengthened by a number of aspects of our study design: First, the present study included both a positive (TPM non-responders) and a negative (placebo) control group. Inclusion of a placebo group provided us with a reference necessary for the exploration of gene expression alterations induced specifically by TPM rather than by the absence or reduction of METH use or any other non-specific factors. For example, EP300, a member of the CREB gene family, was down-regulated in both the Weeks 8 and 12 TPM responder and the Week 12 placebo responder groups, suggesting that the regulatory effect of the gene is not specific to TPM, whereas CREB5, discussed above, was down-regulated only in Week 12 TPM responders, suggesting a TPM-specific effect. Further, the inclusion of a positive control group aided us in identifying genes and pathways associated with METH abstinence, which was the primary outcome of this clinical study. Second, we analyzed expression data from three time points, namely, the baseline (prior to starting TPM treatment) and Weeks 8 and 12 for each individual. This approach allowed us to correct for any confounding effects that might be caused by significant individual gene expression differences among subjects at baseline, by normalizing the extent of expression at Weeks 8 and 12 with the patient’s own baseline expression and increasing the reliability of the findings by utilizing Week 12 expression patterns to confirm those that occurred at Week 8. The third main strength of the present study is that the dose of TPM administered throughout the treatment period was well within the drug’s therapeutic range [22],[23], and therefore, we can confidently conclude that the TPM-dependent expression alterations we detected were not related to TPM’s toxic effects, but rather to its therapeutic effects. Finally, we believe that, with the level of rigorousness of the clinical and statistical criteria employed in defining treatment responders and significantly altered genes and pathways, the chance that our findings are falsely positive is minimal.

However, this study is limited by several factors, of which the most notable is the small sample size for some comparison groups. Especially, the number of responders in the TPM and placebo groups were not balanced for either Week 8 or Week 12 (4 and 2 subjects for Weeks 8 and 12 in the placebo group vs. 5 and 6 subjects in TPM group for Weeks 8 and 12). However, these numbers are not distinctly smaller than those in other pharmacogenomic/expression studies published in the literature [45] and provided us with an 85% statistical power to draw conclusions about individual genes and pathways [46]. On the other hand, it could be argued that the imbalance in the number of responders in the two groups was attributable in part to weaker effects of the placebo in promoting abstinence compared with TPM. Because of the small samples, we did not consider covariate effects such as age, sex, and ethnicity in assessing single-gene effects. Although we believe the results obtained from such samples are reliable, extra attention should be paid in interpreting the expression pattern of single genes, especially those identified from the placebo groups. Another main limitation of our study is that we used a peripheral white blood cell model to study the gene expression alterations associated with neuronal functions. Peripheral blood is an easily accessible source of RNA for analysis of environmental exposure and disease conditions [47]-[49]. Circulating leukocytes can be used to infer gene expression in other tissues [50]. Indeed, constituents of blood maintain the balance of homeostasis, modulate immunity or inflammation, partake in stress signaling, and facilitate cellular communication in vascular-associated tissues, including those of the CNS [51]. Sullivan et al. [52] conducted a secondary data analysis of transcriptional profiling of 79 diverse human tissues and found that whole blood shared substantial gene expression similarities with multiple brain tissues such as the amygdala, caudate nucleus, prefrontal cortex, and whole brain (the median Spearman correlation coefficient for the group was 0.52), indicating that gene expression in whole blood can be a robust and valid surrogate for gene expression in the brain [52]. However, in another recent study, only weak correlation was detected between gene expression in the brain and that in blood samples [53]. Under such conditions, although the gene expression data from whole blood may provide useful information to infer the biological processes underlying the interaction of TPM and METH in the neuronal system, more direct evidence obtained from brain tissues is necessary in order to verify the findings reported in this study.

Conclusions

In summary, with application of rigorous clinical and statistical criteria, we demonstrated that TPM mitigates METH’s reinforcing effects, possibly through reversal of some of the dysregulated genes in pathways governing synaptic plasticity to their normal state. Further studies are necessary to replicate these findings as well as to identify genetic variations that may have resulted in regulatory differences observed in TPM responders vs. non-responders. Identification of such molecular mechanisms will help greatly in developing efficacious medications for the treatment of METH dependence.

Methods

Study design and blood sample collection

This was a double-blind, multi-center, placebo-controlled, randomized, parallel-group study for METH-dependent outpatients [26]. Under the inter-agency agreement between the National Institute on Drug Abuse and the Veterans Affairs (VA) Cooperative Programs, eight medical centers participated. The sites’ Institutional Review Boards and the VA Human Rights Committee approved the protocol for and conduct of the study.

Subjects meeting the eligibility criteria after a 14-day screening period and a baseline assessment were randomized into equivalent-size groups for oral treatment with TPM or placebo daily for 91 days. There was a dose titration phase (Days 1 to 35) to a maximum tolerated dose of TPM not to exceed 200 mg/day, a maintenance phase (Days 36 to 84), and a taper phase (Days 85 to 91). To continue in the study, subjects had to maintain a minimum daily dose of 50 mg. Blood samples were collected on Day 1 (considered the baseline) and at the end of Weeks 8 and 12 from every participant who consented to participate in the genetics/expression study. The rationale for using weeks 8 and 12 of TPM treatment in the genetic/expression study was that these two time points were in the middle of the maintenance phase of the maximum dose for each patient and the end of treatment, respectively. At the two time points, because the TPM dose given to each patient became relatively stable, this would reduce variability of drugs received among patients, thus likely increasing statistical power of identifying differentially expressed genes and pathways. All blood samples for this study were collected in PAXgeneTM blood tubes using standard phlebotomy technique.

Primary efficacy outcome measure

The primary efficacy outcome measure was METH use or non-use during each week of the entire period from weeks 1 to 12. For each participant, urine samples were collected three times per week. A positive use week was defined as any week in which at least one of the urine tests was positive for METH and a negative use week as one in which all three tests were negative. The value was considered to be missing if no urine sample was collected. On the basis of the primary efficacy outcome measures for the entire trial period, each study participant in either the TPM or the placebo group was classified as either a positive or negative responder to the treatment, which was referred to as responder or non-responder in this study. For example, a TPM responder at Week 8 means for this participant receiving TPM treatment no METH was detected in all the three urine samples for Week 8 (negative use week); whereas for a TPM non-responder, METH was detected in one or more urine samples. Our aim was to determine which genes were differentially expressed in the responders and non-responders of the TPM or placebo group during a given week. Because we collected blood samples from each participant at baseline and Weeks 8 and 12, we formed four analysis groups: Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo, according to the positive or negative use information at Weeks 8 and 12, respectively. Because not all participants contributed blood samples at both time points, the final sample sizes were different for each group.

RNA isolation and gene expression analysis

Blood samples were collected at approximately the same time of day for each participant for all three time points to control for potential circadian rhythm effects on gene expression. Total RNA was extracted using the PAXgene™ Blood RNA Isolation Kit (Qiagen, Valencia, CA, USA). Genome-wide expression of each sample was assessed with a Human Genome U-133 Plus 2.0 array (Affymetrix Inc., Santa Clara, CA, USA) by Expression Analysis Inc. (Durham, NC). Briefly, the double-stranded cDNA was used in a T7 RNA polymerase in vitro transcription reaction (Ambion, Austin, TX, USA) containing biotin-labeled ribonucleotides CTP and UTP. The resulting labeled cRNAs were then hybridized to HG-U133plus2.0 arrays.

Quality control and bioinformatics analysis of array data

Outlier array detection and quality assessment: In total, there were 212 HG-U133plus2.0 arrays from 99 study participants, which included 91 arrays at baseline, 65 at Week 8, and 56 at Week 12. The 212 “.CEL” files generated by the Microarray Suite (MAS 5.0; Affymetrix) were converted into “.DCP” files using dChip 2008 software (http://biosun1.harvard.edu/~cli/dchip_2008_05.exe). We used the “% array outlier” diagnostic metric to detect outlier arrays, defined as the percentage of outlier probe sets in one array [54]. If this percentage exceeded 5%, the array was called an “outlier.” Three arrays at baseline were found to have a “% array outlier” metric > 5% and were excluded from further analysis. For quality assessment of the remaining 209 chips, the distributions of log2-transformed raw probe-level intensities were visualized by boxplots, and no anomalies were found (data not shown).

Data pre-processing and normalization: Data quality assessment was followed by data pre-processing and normalization with the Robust Multi-Array Average (RMA) algorithm [55], implemented in the RMA function in the Bioconductor Affy package [56]. The RMA is a statistical method comprising three procedures performing the following functions: (i) convolution background correction; (ii) probe-level quantile normalization; and (iii) median polish summarization for each probe set to estimate the log2 scale expression values. A matrix of expression values was computed for the 209 “.CEL” files. The expression values after normalization were similar across arrays.

Probe set filtering: The HG-U133plus2.0 array contains 54,675 oligonucleotide-based probe sets. However, not all of these sets correspond to well-defined genes. By using the latest Affymetrix annotation file (dated November 30, 2008), we found that a total of 33,752 (61.73%) probe sets correspond to unique genes, whereas the remaining probe sets do not and were thus excluded from our statistical analysis. Furthermore, we implemented a series of filtering procedures to reduce the number of probe sets to be tested, which is summarized as follows: (i) FilteringAbsence callprobe sets: We applied a Bioconductor package called “Presence-Absence Calls with Negative Probesets” (PANP) that uses Affymetrix-reported probe sets with no known hybridization partners. PANP uses a simple empirically derived approach to generate P values for thresholds to define “presence/absence” calls. The “presence/absence” calls and P values are returned as two matrices: “Pcalls” and “Pvals,” respectively. Probe sets with < 50% present calls among all arrays within each group were removed, which is considered restrictive [57],[58], leaving ~15,000 probe sets for further analysis. (ii) Filtering biologically irrelevant genes and duplicate probe set(s) for each selected gene: Among the ~15,000 probe sets, control sets of various housekeeping genes (e.g., GAPDH) and spiked-in controls (e.g., Ec-bioB, Ec-bioC, Ec-bioD), as well as those genes that are not well defined or have unknown functions were removed. After removing duplicate probe set(s) for the same gene, such that only the probe set with the smallest test statistic was kept for each gene [59], about 7,500 genes remained. (iii) Filtering out genes with low fold changes (FCs): Genes with log2(FC) < 0.67 × standard deviation (SD) away from the group mean (i.e., between the first and the third quartile assuming that log2(FC) follows a normal distribution) were removed. After these sequential steps of filtering, about 3,500 genes were left for downstream statistical analyses for each group. A schematic diagram of the detailed data mining and analysis plan is shown in Figure 3.

Figure 3
figure 3

Schematic diagram of study workflow , including probe set filtering steps and statistical test strategies for detecting significant single genes and pathways. The probe intensities measured in 209 hybridized Affymetrix HG-U133 plus 2.0 arrays were normalized by Robust Multichip Average followed by a baseline correction step. Probes marked ‘Presence’ in fewer than four arrays in each group (because for Week 12 placebo group, only two positive responders were included, probes with two valid measurements were kept) were removed. Probes corresponding to control or less well-defined genes, and duplicated probes were removed. Genes with low FCs; i.e., within 1 standard deviation (denoted by σ) for a total of L (~7500) genes also were removed, as most of them were not likely to be differentially expressed to a statistically significant extent. The remaining genes were tested by the ordinary Student’s t-test, and genes with P values < 0.05 were used for pathway analysis. In total, 3698, 3532, 3328, and 3405 genes were tested for the Week 8 TPM, Week 8 placebo, Week 12 TPM, and Week 12 placebo groups, respectively.

Statistical analysis to identify differentially expressed genes and pathways

After data quality checking, pre-processing, normalization, and probe set filtering, we analyzed the microarray data at both the single-gene level (where one seeks to determine whether each gene is expressed differently under different conditions) and the pathway level (where one intends to determine if a biological pathway shows a different expression pattern under different conditions). Considering the individual variations at the baseline, we normalized each individual’s Week 8 and Week 12 expression values by the corresponding baseline values prior to the identification of differentially expressed genes and biological pathways.

Single-gene analysis

The primary goal of this step is to detect those genes with significantly different expressions in two comparison groups that cannot be ascribed to chance or natural variability [60]. The ordinary Student’s t-test, implemented by MATLAB (MathWorks, Natick, MA), was employed for testing differential expressions in a gene-by-gene manner. To correct for multiple testing, both Bonferroni correction and false discovery rate (FDR); i.e., the expected proportion of falsely rejected null hypotheses among the rejected hypotheses, which was estimated by the Benjamini-Hochberg (BH) procedure [61], were applied.

Pathway analysis

Because gene expression is a well-coordinated system, expressions of different genes generally are not independent. Pathway analysis can reduce the number of hypotheses to a more manageable number that directly addresses questions of biological interest. During the past few years, various bioinformatics tools have been developed for pathway analysis, although none has gained widespread acceptance [60]. Therefore, in the current study, significantly enriched pathways of differentially expressed genes were detected using the following bioinformatics tools:

Ingenuity Pathway Analysis (IPA) (http://www.ingenuity.com/)

The IPA is a web-based bioinformatics tool [62]. A given set of input genes was associated with molecular networks based on their connectivities in the Ingenuity Pathways Knowledge Base. Fisher’s exact test was used to determine the probability that each biological function assigned to that data set was attributable to chance alone [63].

Onto-Tools Pathway-Express (http://vortex.cs.wayne.edu/projects.htm)

The Onto-Tools Pathway-Express [64],[65] implements an innovative “Impact Factor Analysis” based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. Distinct from either “Over-Representation Analysis” (ORA) or “Gene Set Enrichment Analysis” (GSEA), Onto-Tools Pathway-Express uses a systems biology approach to identify pathways that are significantly impacted in any condition monitored by high-throughput gene expression technology. This new “Impact Factor Analysis” not only incorporates the classical probabilistic component but also includes important biological factors that are not captured by the existing techniques; e.g., the magnitude of the expression changes of each gene, the position of the differentially expressed genes on given pathways, the topology of the pathway that describes how genes interact, and the type of signaling interactions between them [64]. Based on a given set of input genes, for each pathway detected, a perturbation factor gamma P value and a corresponding FDR were calculated, taking into consideration the normalized FC of the gene and the number of genes upstream of its position in the pathway. Because IPA and Onto-Tools Pathway-Express have applied distinct statistical algorithms based on independent knowledge databases, these two bioinformatics tools are complementary, and thus their results are combined.

Additional files