Detection of different transcriptional responses to the DNT model compounds, valproic acid and methylmercury
To explore the dynamics and specificity of the transcriptional response of novel hESC-based in vitro systems (Fig. 1), we chose VPA and MeHg as two positive control toxicants with described effects on DNT and D-mannitol as the negative control compound. The three test compounds were initially evaluated in three of the test systems (UKK, UKN1 and JRC) at the ‘maximum tolerated concentration’. This benchmark concentration (BMC) was determined experimentally for each of the test systems as the highest concentration that reduced overall cell viability by not more than 10 % (Fig. S1). In the case of mannitol, a large range of concentrations, from 1 μM to 100 mM, was used and no cytotoxicity was detected (data not shown). For the UKN1 system, the response to mannitol was tested by quantitative PCR for three toxicant-responsive genes (OCT4, Pax6 and OTX2) (data not shown). As no changes were observed for concentrations up to 40 mM, and data on this compound were provided by the other test systems, DMSO (28 µM) was chosen as the DMA-negative control for UKN1. The transcriptional alterations triggered by the BMC of the two toxicants (VPA/MeHg) or by the two negative controls (mannitol/DMSO) were measured in 4–5 independent experiments on Affymetrix DMA, and the genes that were differentially expressed between culture medium-only controls and test compounds were determined by modern stringent statistical methods (Limma t test, Benjamini–Yekutieli FDR correction). The complete set of data is displayed in supplementary Table S1.
For a visual monitoring of the different compound effects, the hundred most regulated (defined by the lowest FDR-corrected p values) genes (top 50 for VPA and top 50 for MeHg) were selected for each test system (Table S1), and their relative expression levels were displayed as heat maps. For all test systems, striking differences were observed between the regulation patterns of VPA and MeHg. Clustering analysis showed that VPA samples were clearly separated from the MeHg samples (Fig. 2). This effect was even more pronounced when clustering was performed with the 100 top genes regulated by VPA (Fig. S2A). Under these conditions, the differences between MeHg and negative controls were small or not apparent. Therefore, clustering was also performed with the top 100 genes regulated by MeHg. Under these conditions, MeHg samples were clearly separated from those treated with D-mannitol/DMSO (Fig. S2B).
The number of significantly altered Affymetrix DMA probe sets (PS) was much higher for VPA compared to MeHg. The sum of all PS changed by VPA in the test systems UKK, UKN1 and JRC was 15386; for MeHg, the sum was 1246 PS (Table S1, Fig. 3). This striking difference was observed, although both compounds were used at their respective BMC in each test system. Exposure to the negative controls did not result in any significant changes (Fig. 3). Thus, the extent of the responses of the neurally differentiating hESC to the different developmental neurotoxicants appears to be compound-specific. Moreover, the responses to the two model toxicants differed qualitatively (Fig. 2; Fig. S2). The ability to clearly distinguish known toxicants suggests that the test systems would distinguish unknown classes of potential toxicants. It may be speculated that safety liabilities of unknown chemicals for humans may be predicted by comparing their effects in the test systems with those of known toxicants and non-toxicants. The technical and statistical basis of the above initial findings, together with their potential biological and toxicological implications was explored further in the following extended test battery.
Differential constitutive and toxicant-induced responses of the test battery
One may hypothesise that MeHg showed only relatively weak effects in the initial testing (UKK, UKN1 and JRC) as all these systems only generate immature cells, and such cells may be relatively resistant to MeHg. Alternatively, such test systems may lack key targets of mercury toxicity. Such an assumption would be in agreement with findings in neuronally differentiating murine ESC, which were highly sensitive to MeHg during the late neuronal maturation phase, but relatively insensitive during the initial phase of neural precursor formation (Zimmer et al. 2011b). For a broader coverage of effects during later phases of neurogenesis, two additional test systems were used (Fig. 1, UNIGE and UKN4). The UNIGE hESC-based test system covers the developmental phase after neural stem cell formation. The UKN4 test system was used as reference, as this system is well characterised not only for transcriptome changes, but in particular for functional and phenotypic effects (Stiegler et al. 2011). From the literature, it is known that MeHg inhibits neurite outgrowth in this system, and transcriptome analysis was performed at a concentration known from previous studies to affect neurites (Stiegler et al. 2011).
The extended test battery (UKK, JRC, UKN1, UKN4 and UNIGE) was used for additional testing. The effects of MeHg were examined in all systems at the respective BMC, in addition to one lower concentration (LOW). The latter was determined by dividing the BMC by a factor of four (Fig. S1). Additional experiments were also performed with VPA. The compound was tested at two relatively similar concentrations in JRC (to test the reproducibility of the response). It was also examined at fourfold different concentrations in UKK (to test potential concentration dependencies of the response). The number of differentially expressed PS for each condition is summarised in Fig. 3. This broad experimental approach showed that the transcriptional response of differentiating hESC to MeHg is indeed very limited. Also, the test systems using more mature cells (UKN4 and UNIGE) did not show any significant response when stringent FDR corrections were used.
Comparison of the results before and after FDR correction showed the unmistakeable need for appropriate statistical treatment of the data. Although the choice of a 5 % significance level will generate on average 2734 false positives when 54675 PS are analysed (as in this study), it can at times still be counter-intuitive for toxicologists when none of the more than 2000 identified genes is significant after FDR correction. The effect of FDR correction in the present study is visualized in the form of volcano plots. This form of display orthogonally separates the two parameters usually considered important in gene expression analysis: the fold change and the significance level. As the FDR correction only affects the significance level, one can see the ‘volcano’ heights being compressed, while the width remains the same; for instance, in the case of JRC incubated with 273 nM MeHg (BMC), all apparently significant PS dropped below the usual significance level (p < 0.05). Also, with UKK exposed to 500 μM VPA (20 % of the BMC), the number of 2524 PS that appeared to be significantly up-regulated before FDR correction dropped down to four really significant PS after FDR correction. Notably, the apparent significances were ‘lost’, although several PS appeared to be ‘regulated’ more than twofold, at times even up to fourfold (Fig. 4, Fig. S3). It should be noted that the gene expression response occurred within a narrow range of concentrations. The FDR-corrected data sets showed that the number of regulated probe sets can change from several thousands to zero within a fourfold concentration range. Even a lowering of the test concentration by only 20 % (relative to the BMC) resulted in a reduction of the identified PS, at least in one system in which this was tested (JRC). However, more than 90 % of the PS identified at the low concentration in this assay were also identified at the high concentration (Fig. 5). This good overlap confirmed a robust and reproducible test system response. When more stringent conditions were used for filtering, such as the requirement for a ≥4-fold change or for a lower p value, the good overlap between the two concentrations was maintained (Fig. 5). Altogether, these data suggest that the most pronounced and robust transcriptional responses can be measured at toxicant concentrations, which are close to or at the BMC.
To obtain a better overview of how the different test systems are related to one another, we performed a principal component analysis (PCA) encompassing untreated controls and non-differentiated H9 hESC, in addition to all treated samples. This approach allowed the visualization of the overall transcript patterns measured by 190 DMA on a 2-dimensional PCA space (Fig. 6a). Several conclusions can be drawn from a qualitative analysis of the PCA presentation: First, all test systems clearly differed from non-differentiated hESC. Second, all test systems differed from one another, that is, the variance between the different test systems was larger than the variance of individual samples within a given test system. Third, samples from one test system clustered together, whether they had been treated with VPA, MeHg or solvent. On the other hand, samples treated, for example, with MeHg in different test systems did not cluster together in this form of data presentation. It is noteworthy, that presentation of data in the form of such a comprehensive PCA does not allow the identification of compound effects, although large, statistically significant transcriptome changes occurred (e.g. VPA vs solvent control). To better visualise compound effects, a different statistical treatment is required before the data are presented; for instance, the large influence of the different test systems can be attenuated by the subtraction of the corresponding controls before display (see below and Fig. 7).
The distinct clustering of all test systems to a different area of the PCA plot suggests that the test battery is not redundant. Each individual test system seems to react with different transcriptome changes, and the combination of the tests may thus provide richer data than any individual test. This would imply that the different systems would be able to identify different toxicant effects and thus be complementary in their toxicological information. The test battery may thus constitute an important step towards the replacement of animal tests by information-rich human cell-based models (Hartung and Leist 2008; Leist et al. 2008b). This will, however, require further testing and validation (Leist et al. 2012). A second important observation was the presence of outliers in some samples, which will be investigated in greater detail in the following section (Fig. 6a).
Control of intra-group variability and batch effects
The PCA indicated that eight of the DMA of UKN1 clustered separately from all other UKN1 samples. The commonality amongst the eight DMA was that they were measured on a different day compared to the other samples. Four corresponded to controls and four to samples treated with VPA. Thus, the clustering was not treatment-related. A similar situation was observed for ten samples of UNIGE (Fig. 6a). When only the 500 probe sets with the highest variance were considered for the PCA, the ‘outliers’ moved partially or completely back, that is, they clustered together with the other samples within their test system (Fig. 6b). This suggested that genes with a low variance had contributed to the outlier effect. A graphical presentation of the variances of all DMA performed for this study indeed indicated that the ‘outliers’ had a higher variance of the fluorescence signals, although the average signals were quite similar to all other DMA (Fig. 6c). These data suggest that the ‘distant clustering’ samples are the consequence of a batch effect.
The presented study is still ongoing and even larger numbers of samples will have to be studied. This makes it impossible to analyse all samples in a single batch. Methods to control for batch effects will therefore be required. As indicated here, one possibility is to include only the PS with highest variability between the samples into the analysis. As an alternative approach, the corresponding control values were subtracted from the compound-treated samples before the PCA. This form of presentation clearly separated VPA and MeHg incubated samples, and the results obtained by clustering analysis within the individual test systems were confirmed, also when this multi-systems approach was chosen (Fig. 7a). The subtraction of the controls resulted in the visualization of treatment effects in the PCA that were not visible when the non-processed data were used (Fig. 6). When only the 500 PS with the highest variance—rather than all 54,575 PS—were included, there was a more defined clustering of the VPA samples compared to the MeHg samples (Fig. 7b). The reduction to 500 PS also resulted in a better clustering of other ‘distant clustering’ samples. A stepwise reduction of PS showed that 500 PS seems to represent a reasonable choice, although even smaller numbers, for example, 200 PS, would be possible (Fig S4). An interesting implication of this observation is that the scattering of samples within one group can be caused by relatively large numbers of PS with low variability and not necessarily by the PS which show the highest variance. These ‘high variance PS’ appear to be highly relevant for further analysis.
Robustness analysis: role of the number of biological replicates
In the present study, five biological replicates (independent experiments performed at different days) were generated for most test conditions. One technical replicate (one DMA) was analysed per experiment. To study whether lower numbers of DMA would also lead to similar results in the present data set, we chose a statistical permutation approach that simulated the situation of choosing only 2, 3 or 4 of the 5 experimental replicates (Note that each replicate consisted of a matched pair of DMA for control and for treated cells). For each possible combination of these pairs (here for simplicity called DMA or replicates), the number of PS that overlapped with the original set of PS was identified. In addition, new PS that had not been originally identified were also detected. The expectation was that whether 5 DMAs were redundant, then the percentage of original PS identified with 3 or 4 DMA should also be high, and the number of new PS arising from the new analysis should be low. This approach was run under different conditions. The significant genes were identified by the less stringent Benjamini–Hochberg FDR correction (Fig. 8) or by the very stringent Benjamini–Yekutieli correction (Fig. S5). Moreover, either all PS were considered, or only the ones regulated more than twofold (Fig. 8, Fig. S5).
The results showed that there was only a moderate advantage of using 5 DMA instead of 4 when only PS with ≥2-fold changes were considered in the current data set. Under this condition, and using less stringent FDR correction, even 3 DMA would have resulted in the identification of a large majority of genes. The permutation analysis was also found to be a suitable tool to test data consistency and robustness of the analysis method used. For most test systems, removal of any of the 5 DMA (pairs) to generate a new data set based on 4 DMA yielded largely similar results. This suggests that all different experiments had generated largely similar data, although they were performed with different cell cultures on different days. The situation was different for the MeHg samples from UKN1, where removal of one specific DMA resulted in the identification of more than twice as many significant PS compared to the remaining 4 DMA. All combinations of the three remaining DMA that lacked the apparent ‘outlier’ identified much larger numbers of PS compared to the combinations that included that specific DMA (pair) (Fig. 8). Such an analysis may therefore be used to develop statistical techniques for the identification of outliers.
The relationship between cytotoxic response and DNT-specific transcriptome changes
The choice of toxicant concentrations for gene expression analysis is a critical step. If too high concentrations are used, cell viability will be compromised. The cell death occurring under these conditions may result in unspecific ‘toxicity-associated’ gene expression responses. Conversely, the use of too low concentrations of test compounds would result in false-negative responses and in the inability to identify any alterations of the transcriptome. The magnitude of the response may be dependent on the concentration of the test compound, which is especially important when compounds are compared and possibly classified or ranked according to their specific responses. Furthermore, information on the concentration dependence may be used for more detailed characterisation of compound effects, and possibly for the identification of the hazardous responses as opposed to counter-regulations and unspecific responses (Theunissen et al. 2012a, b).
In the present study, the BMC of the cytotoxicity test (i.e. the highest non-cytotoxic concentration) was used as the standard test concentration (Fig S1). Although transcriptional responses can be triggered by MeHg and VPA at concentrations considerably lower than the cytotoxic concentration (Balmer et al. 2012; Zimmer et al. 2011b), we found here that the majority of responses to MeHg in UKN1 was lost even at only fourfold lower concentrations than the BMC. We made similar observations for VPA in other test systems.
In in vivo studies, DNT is defined as effects on the pups in the absence of maternal toxicity. A corresponding definition for in vitro test systems would be ‘specific alterations of differentiation in the absence of overt (unspecific) cytotoxicity’. Fulfilment of this condition was carefully explored, and several features of our data indicate that measurements at the BMC do in fact allow us to draw conclusions on DNT-specific disturbances triggered by the test compounds: First, we tested whether known toxic concentrations (800 nM MeHg in UNIGE; BMC was 160 nM) would lead to unspecific transcriptional responses (Fig. 3). Also under this condition, no significant PS were identified, that is, no cell death genes were triggered. We also examined the effect of accidental variations of the cytotoxicity from experiment to experiment. The fixed BMC indicated here was determined from a set of pilot experiments. However, the actual cytotoxicity in the individual experiments in which mRNA levels were analysed showed some biological variation, which was documented, for example, for UKN1 and UKN4. Examination of these data showed that the MeHg concentration used for UKN4 reduced cell viability more than the one used for UKN1. However, no response was observed in UKN4, while an apparently specific response was triggered in UKN1. Second, some concentrations used for testing VPA in UKN1 triggered toxicities of more than 10 % (data not shown) in the experiments used for DMA analysis (due to daily experimental variations in sensitivity), but cell death-related GO terms were not identified. In contrast, MeHg in the same system did not trigger measurable cytotoxicity, but GO term analysis indicated an up-regulation of genes related to apoptosis and neuronal death. Thus, the use of compounds at the BMC does not seem to be problematic. In the case of MeHg, triggering of cytotoxic responses is rather a specific feature of the compound (protein modifier, trigger of oxidative stress). This may be an explanation for the low or absent transcriptional responses in the test systems. Third, candidate genes typically related to cell death, DNA damage and oxidative stress were examined in UKN1. Such genes were not overrepresented amongst the VPA-regulated genes. Moreover, their extent of regulation did not correlate with the overall magnitude of regulation in the individual experiments (not shown). Fourth, it was examined how far the responses to different toxicants overlapped. In case of a strong component of cytotoxicity, it was expected that typical stress genes were induced and similarities would be observed in the regulation pattern of different toxicants. However, only a small fraction of the overall altered PS overlapped between VPA and MeHg [as examined in detail below, (Fig. 10)]. Even though a ‘common transcription factor response’ between VPA and MeHg of 16 transcription factors (TFs) was observed, there was still a majority of TFs unique for MeHg or VPA. Thus, two compounds, both used at the BMC, triggered different responses, with no common cytotoxicity pattern.
In summary, the data indicate that the measurement of transcriptional responses at the BMC is a reasonable approach, although further studies are required for a better understanding of a possible ‘common toxicity-associated response’. Our limited set of data indicates that concentrations beyond the BMC do not necessarily result in an unspecific transcriptional response reflecting cytotoxicity.
Relationship of the BMC with respect to the in vivo relevant concentration range
Besides the technical considerations concerning the BMC and cytotoxicity, the relevance of the chosen concentrations for the in vivo conditions needs to be considered. When in vitro concentrations differ by more than one order of magnitude from concentrations causing toxicity in vivo, pathways of toxicity may become activated that are not relevant to the in vivo situation. Unfortunately, human exposure measurements of DNT compounds are often poorly documented and concentrations in the brain are only rarely known. Nevertheless, human relevant concentrations of 0.005–0.5 μM MeHg and 500–1,000 μM VPA have been reported in a recently published review (Kadereit et al. 2012). To obtain a clearer picture, we used physiology-based pharmacokinetic (PBPK) modelling to calculate in vivo relevant blood and brain concentrations from the doses that caused DNT in animal studies (Fig. 9; Fig. S6A). Oral exposure to MeHg of 0.01 mg/kg on gestation days 6–9 is predicted to result in a maximum total blood concentration of 0.9 μM (Fig. 9a). Thus, similar nominal concentrations should show activity in vitro, although the actual amount of MeHg penetrating the cells may additionally depend on cysteine concentrations in the different media of the test systems. A VPA plasma peak concentration of 6.6 mM is predicted after a single oral dose of 350 mg/kg. This dose resulted in the same model in DNT (Rodier et al. 1996) (Fig. 9b). For extrapolation of such data to in vitro systems, corrections for differences in protein binding and lipid partitioning in plasma vs cell culture medium have to be considered (Fig. S6B). Our calculations suggest that the expected equivalent nominal concentrations in vitro are 3.3 mM for UKK, 2.7 mM for UKN1 and 0.9 mM for JRC, UKN4 and UNIGE. These results show that the BMC concentrations used in this study are within the same order of magnitude as the in vivo concentrations which caused DNT in humans and animals.
Remarkable overlap of overrepresented TFBS amongst genes influenced by VPA and MeHg
The main focus of this study was to investigate the technical feasibility of using transcriptomics as a major endpoint to characterise responses of hESC-based test systems. For a detailed characterisation of the biological responses of the test systems to the compounds, a different experimental design would be required. Nevertheless, we performed some initial comparisons of gene ontologies (GO) and transcription factor binding sites (TFBS) that were overrepresented amongst the regulated PS. The main aim was to find out whether simple analysis tools can reveal differences and commonalities of the transcriptome responses.
For this approach, five sets of data were compared: the responses of UKN1, JRC and UKK to VPA and the responses of UKN1 and UKK to MeHg (all at BMC concentration). To obtain an overview over the main biological processes affected by co-regulated genes, the statistically overrepresented GO terms were identified and displayed for each test system and condition (Fig. S7); for instance, the genes down-regulated in each test system by VPA pointed to effects of the toxicant on RNA processing, and on chromatin modification/histone acetylation. The latter results are consistent with the known activity of the compound as a histone deacetylase inhibitor (HDACi). GO terms related to effects on ‘neural tube formation’ ‘neuron development’ and ‘embryonic morphogenesis’ showed up for different conditions. These findings gave a hint that there may be an overlap of higher order biological responses across the test systems and compounds. However, we are aware of the fact that the GO term analysis is a very rough tool, and that GO term annotations of many genes can be problematic (Weng et al. 2012). Therefore, we chose the alternative approach of comparing the overlap of regulated PS between the test systems with the overrepresentation of 267 human TFBS (as indirect indicator of higher order linked biological processes).
First, the overlap of test systems treated with the same compound was analysed. VPA regulated 571 PS in all three test systems (Fig. 10a). Thus, only a relatively minor overlap occurred on the level of individual PS. The PS for VPA showed enrichment of binding sites for 56 (JRC), 57 (UKK) and 66 (UKN1) TFs. Twenty-five TFBSs overlapped between all samples treated with VPA (Fig. 10a), that is, there was a relatively high overlap of responses on the level of TFBS. A similar behaviour was observed after treatment with MeHg: less than 10 % of the PS overlapped between UKN1 and UKK. Amongst these PS, 46 TFBS (UKN1) or 44 TFBS (UKK) were overrepresented and out of these, twenty (>40 %) overlapped (Fig. 10b).
In view of these findings, it was interesting to look at an overlap of transcriptome changes common to each of the toxicants in all test systems. We identified the PS and TFBS jointly modified in all three test systems by VPA or in UKN1 and UKK by MeHg. Only 3 (0.5 %) of the PS generally altered by VPA were also significantly affected by MeHg (Fig. 10c). In contrast, more than 50 % of all TFBS common to MeHg or VPA overlapped also between the two compounds (Fig. 10c). The large overlap of commonly enriched TFBS between all test systems and compounds provides evidence for the existence of a set of ‘common transcription factors’ (including, e.g., E2F, ETF, SP1 and AP-2 (Fig. S8). The only TFBS enriched by all VPA treatments, but not MeHg, was the homeobox gene Hmx3 (also known as NKX5.1). The only TFBS enriched by all MeHg treatments, but not VPA, was the one for GCM transcriptional regulators (Fig. S8).
Similar comparisons of compound responses were also performed in individual test systems; for instance, in UKK, only 205 PS of the 3,892 PS regulated by VPA overlapped with those affected by MeHg (Fig. 10d). On the level of TFBS, the overlap was much larger, as 22 of the 57 TFBS enriched in the genes regulated by VPA, were also found for MeHg (Fig. S9A).
Treatment of the UKN1 test system with VPA or MeHg resulted in the regulation of genes associated with 66 TFBS in their promoter in the case of VPA and 46 TFBS in the case of MeHg. Of these, 29 (comprising, e.g., AP-2, EGR, STAT1, HIF-1, AhR and Sp1) were similar for both compounds, 37 (comprising, e.g., HSF-1, IRF-1, PAX5 and NKX2-5) were specific for VPA, and 17 (comprising, e.g., ATF4, HOXA4 and ZIC2) specific for MeHg (Fig. S9B). Again, the overlap of TFBS was much larger than the one of individual PS. Only 142 of the 3,697 genes regulated by VPA overlapped with those affected by MeHg (Fig. 10e).
Besides the commonly regulated TFBS, we found for each compound also TFBS that were specific for the test system and the chemical used. These may be used as signatures for related chemicals within one class, while the commonly affected TFBS may give a general indication of toxicity (Supplementary Table S2). In conclusion, a remarkable observation of the present study is that the TFBS showed an astonishingly large overlap in view of the very small overlap on the level of the individual genes. Analysis of further compounds is required to determine whether the emerging concept of a ‘common toxic response TFBS’ and a ‘compound-specific TFBS’ is universal.