Identification of transcriptome signatures and biomarkers specific for potential developmental toxicants inhibiting human neural crest cell migration

The in vitro test battery of the European research consortium ESNATS (‘novel stem cell-based test systems’) has been used to screen for potential human developmental toxicants. As part of this effort, the migration of neural crest (MINC) assay has been used to evaluate chemical effects on neural crest function. It identified some drug-like compounds in addition to known environmental toxicants. The hits included the HSP90 inhibitor geldanamycin, the chemotherapeutic arsenic trioxide, the flame-retardant PBDE-99, the pesticide triadimefon and the histone deacetylase inhibitors valproic acid and trichostatin A. Transcriptome changes triggered by these substances in human neural crest cells were recorded and analysed here to answer three questions: (1) can toxicants be individually identified based on their transcript profile; (2) how can the toxicity pattern reflected by transcript changes be compacted/dimensionality-reduced for practical regulatory use; (3) how can a reduced set of biomarkers be selected for large-scale follow-up? Transcript profiling allowed clear separation of different toxicants and the identification of toxicant types in a blinded test study. We also developed a diagrammatic system to visualize and compare toxicity patterns of a group of chemicals by giving a quantitative overview of altered superordinate biological processes (e.g. activation of KEGG pathways or overrepresentation of gene ontology terms). The transcript data were mined for potential markers of toxicity, and 39 transcripts were selected to either indicate general developmental toxicity or distinguish compounds with different modes-of-action in read-across. In summary, we found inclusion of transcriptome data to largely increase the information from the MINC phenotypic test. Electronic supplementary material The online version of this article (doi:10.1007/s00204-015-1658-7) contains supplementary material, which is available to authorized users.


Table of contents
• Overview on content of supplemental-tables folder page 2 • Supplemental Material, Figure S1 page 3 Evaluation of theSVM-based classifier.
• Supplemental Material, Figure S2 page 4 "Scoring approach" applied to the candidate biomarker genes.
• Supplemental Material, Figure S3 page 5 Expression pattern of the candidate biomarker genes among the different exposure conditions.

Overview on content of supplemental-tables folder
Suppl. Table S1 -Differentially regulated probesets in neural crest cells (UNK2 system) after exposure to six different conditions. Data of 5 independent experiments on Affymetrix microarrays are displayed including gene names, fold of change and adjusted p-value, per each exposure scenario.

Suppl. Table S2
-Overrepresented GO classes in the six test conditions. The table shows the list of the gene onthology classes (GO) that were enriched in sets of differentially expressed genes (identified by microarray analysis). Data are displayed including the name and the ID of the GO class, the corresponding number of total genes belonging to the GO class, the number of found DEG belonging to the GO class, and the adjusted p-values, per each exposure scenario.

Suppl. Table S3
-Overrepresented KEGG pathways in the six test conditions. KEGG pathways (KEGG) enriched in populations of differentially expressed genes (identified by microarray analysis) were determined. Data are displayed including the name and the ID of the KEGG pathway, the corresponding number of total genes belonging to the KEGG pathway, the number of found DEG belonging to the KEGG pathway, and the adjusted p-values, per each exposure scenario.

Suppl. Table S4
-Overrepresented GO classes using up-and down-regulated probesets and respective classification in superordinate processes. The table shows the gene onthology classes (GO) that were enriched in populations of differentially expressed genes (using as input upand down-regulated probesets) and their distribution in superordinate processes. Data are displayed including the name and the ID of the GO class, the corresponding number of total genes belonging to the GO class, the number of found DEG belonging to the GO class, the adjusted p-values and the superordinate process which they belong to.  Supplemental material, Fig.S1:

Suppl. Table S5 -
The support vector machine (SVM)-based classifier as described in Fig.2 was used to predict different scenarios. A A simulation study of the SVM-based classifier (Fig.2) was performed: three replicates (of 5) per compound were randomly chosen to form the training set and to build the classifier. The identity of the remaining 2 replicates (testing set) was then predicted. The procedure was reiterated for 1000 times. Finally, the best predictions were summed (considering together the replicates of the same compound) and normalized (n°prediction/ 2000 * 100). B The compound TSA was excluded from the training and testing set. The 100 3 probe sets with highest variance ("100 PS") within the training set were newly identified, and used to build a new classifier. The best and second best predictions, based on a support vector machine approach (indicated as relative probability in the brackets), are listed for each blind replicate (first column). The real identity of the samples (truth) is indicated in the last column.