Highlight report: quality control for genome-wide expression data: how to identify sample mix-up

Grinberg, Marianna

doi:10.1007/s00204-015-1644-0

Highlight report: quality control for genome-wide expression data: how to identify sample mix-up

Editorial
Published: 27 November 2015

Volume 89, pages 2459–2461, (2015)
Cite this article

Download PDF

Archives of Toxicology Aims and scope Submit manuscript

Highlight report: quality control for genome-wide expression data: how to identify sample mix-up

Download PDF

Marianna Grinberg¹

959 Accesses
Explore all metrics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In this issue of the Archives of Toxicology, Miriam Lohr and colleagues from TU Dortmund University in Germany contributed a method how to identify sample mix-up in gene expression datasets (Lohr et al. 2015). Today, large sets of genome-wide expression data obtained either by gene array or by RNAseq are publicly available. This offers the possibility to reuse previously analyzed data and to combine several datasets in order to improve statistical power (Lohr et al. 2015). However, a risk is that samples in public databases have been misannotated. This may lead to errors which are particularly critical, when the same error compromises several studies which all rely on the same publicly available expression data. Therefore, Lohr et al. (2015) established a biostatistical method that allows the retrospective identification of misannotated samples. The authors show that two types of error occur surprisingly often in public databases: (1) A sample is analyzed twice, and the duplicate is erroneously labeled with the identification number of another patient. (2) Two samples are mixed up. When this mix-up occurs between samples from male and female individuals, it can clearly be identified by a set of sex-specific genes. The authors applied the mix-up identification procedures to 45 publicly available datasets, including 4913 individuals. Erroneous sample annotation was identified in a surprisingly high fraction of 40 % of the analyzed datasets (Lohr et al. 2015). The authors also demonstrate that the removal of the identified erroneous samples may critically influence the results of statistical analysis of individual studies.

Publicly available datasets have been particularly often used in clinical cancer research to identify prognostic genes (Schmidt et al. 2008, 2012; Cadenas et al. 2010, 2014; Stewart et al. 2012; Godoy et al. 2014; Stock et al. 2015). However, genome-wide expression data are also increasingly used in toxicological research (Song et al. 2013; Faust et al. 2013; Shao et al. 2014; Shinde et al. 2015; Reif 2014; Stöber 2014; Marchan 2014; Hammad and Ahmed 2014; Hengstler 2011; Stewart 2010). Cutting-edge topics are the establishment of classifiers and signatures in developmental toxicity (Krug et al. 2013; Rempel et al. 2015; Balmer et al. 2014; Zimmer et al. 2014; Waldmann et al. 2014) and hepatotoxicity (Campos et al. 2014; Yafune et al. 2013; Doktorova et al. 2012; Zellmer et al. 2010; Godoy et al. 2015). For these purposes, large public databases are available (Grinberg et al. 2014). Using the novel methods for the identification of sample annotation errors described by Lohr et al. (2015) will improve the reliability of genome-wide biostatistical analyses in future.

References

Lohr M et al (2015) Identification of sample annotation errors in gene expression datasets. Arch Toxicol (this issue, epub ahead of print)
Balmer NV, Klima S, Rempel E, Ivanova VN, Kolde R, Weng MK, Meganathan K, Henry M, Sachinidis A, Berthold MR, Hengstler JG, Rahnenführer J, Waldmann T, Leist M (2014) From transient transcriptome responses to disturbed neurodevelopment: role of histone acetylation and methylation as epigenetic switch between reversible and irreversible drug effects. Arch Toxicol 88(7):1451–1468. doi:10.1007/s00204-014-1279-6
Article PubMed Central CAS PubMed Google Scholar
Cadenas C, Franckenstein D, Schmidt M, Gehrmann M, Hermes M, Geppert B, Schormann W, Maccoux LJ, Schug M, Schumann A, Wilhelm C, Freis E, Ickstadt K, Rahnenführer J, Baumbach JI, Sickmann A, Hengstler JG (2010) Role of thioredoxin reductase 1 and thioredoxin interacting protein in prognosis of breast cancer. Breast Cancer Res 12(3):R44. doi:10.1186/bcr2599
Article PubMed Central PubMed Google Scholar
Cadenas C, van de Sandt L, Edlund K, Lohr M, Hellwig B, Marchan R, Schmidt M, Rahnenführer J, Oster H, Hengstler JG (2014) Loss of circadian clock gene expression is associated with tumor progression in breast cancer. Cell Cycle 13(20):3282–3291. doi:10.4161/15384101.2014.954454
Article PubMed Central CAS PubMed Google Scholar
Campos G, Schmidt-Heck W, Ghallab A, Rochlitz K, Pütter L, Medinas DB, Hetz C, Widera A, Cadenas C, Begher-Tibbe B, Reif R, Günther G, Sachinidis A, Hengstler JG, Godoy P (2014) The transcription factor CHOP, a central component of the transcriptional regulatory network induced upon CCl4 intoxication in mouse liver, is not a critical mediator of hepatotoxicity. Arch Toxicol 88(6):1267–1280. doi:10.1007/s00204-014-1240-8
Article CAS PubMed Google Scholar
Doktorova TY, Ellinger-Ziegelbauer H, Vinken M, Vanhaecke T, van Delft J, Kleinjans J, Ahr HJ, Rogiers V (2012) Comparison of genotoxicant-modified transcriptomic responses in conventional and epigenetically stabilized primary rat hepatocytes with in vivo rat liver data. Arch Toxicol 86(11):1703–1715. doi:10.1007/s00204-012-0946-8
Article CAS PubMed Google Scholar
Faust D, Vondráček J, Krčmář P, Smerdová L, Procházková J, Hrubá E, Hulinková P, Kaina B, Dietrich C, Machala M (2013) AhR-mediated changes in global gene expression in rat liver progenitor cells. Arch Toxicol 87(4):681–698. doi:10.1007/s00204-012-0979-z
Article CAS PubMed Google Scholar
Godoy P, Cadenas C, Hellwig B, Marchan R, Stewart J, Reif R, Lohr M, Gehrmann M, Rahnenführer J, Schmidt M, Hengstler JG (2014) Interferon-inducible guanylate binding protein (GBP2) is associated with better prognosis in breast cancer and indicates an efficient T cell response. Breast Cancer 21(4):491–499. doi:10.1007/s12282-012-0404-8
Article PubMed Google Scholar
Godoy P, Schmidt-Heck W, Natarajan K, Lucendo-Villarin B, Szkolnicka D, Asplund A, Björquist P, Widera A, Stöber R, Campos G, Hammad S, Sachinidis A, Chaudhari U, Damm G, Weiss TS, Nüssler A, Synnergren J, Edlund K, Küppers-Munther B, Hay DC, Hengstler JG (2015) Gene networks and transcription factor motifs defining the differentiation of stem cells into hepatocyte-like cells. J Hepatol 63(4):934–942. doi:10.1016/j.jhep.2015.05.013
Article PubMed Central CAS PubMed Google Scholar
Grinberg M, Stöber RM, Edlund K, Rempel E, Godoy P, Reif R, Widera A, Madjar K, Schmidt-Heck W, Marchan R, Sachinidis A, Spitkovsky D, Hescheler J, Carmo H, Arbo MD, van de Water B, Wink S, Vinken M, Rogiers V, Escher S, Hardy B, Mitic D, Myatt G, Waldmann T, Mardinoglu A, Damm G, Seehofer D, Nüssler A, Weiss TS, Oberemm A, Lampen A, Schaap MM, Luijten M, van Steeg H, Thasler WE, Kleinjans JC, Stierum RH, Leist M, Rahnenführer J, Hengstler JG (2014) Toxicogenomics directory of chemically exposed human hepatocytes. Arch Toxicol 88(12):2261–2287. doi:10.1007/s00204-014-1400-x
Hammad S, Ahmed H (2014) Biomarker: the universe of chemically induced gene expression alterations in human hepatocyte. EXCLI J 13:1275–1277
PubMed Central PubMed Google Scholar
Hengstler JG (2011) Cutting-edge topics in toxicology. EXCLI J 10:117–119
Google Scholar
Krug AK, Kolde R, Gaspar JA, Rempel E, Balmer NV, Meganathan K, Vojnits K, Baquié M, Waldmann T, Ensenat-Waser R, Jagtap S, Evans RM, Julien S, Peterson H, Zagoura D, Kadereit S, Gerhard D, Sotiriadou I, Heke M, Natarajan K, Henry M, Winkler J, Marchan R, Stoppini L, Bosgra S, Westerhout J, Verwei M, Vilo J, Kortenkamp A, Hescheler J, Hothorn L, Bremer S, van Thriel C, Krause KH, Hengstler JG, Rahnenführer J, Leist M, Sachinidis A (2013) Human embryonic stem cell-derived test systems for developmental neurotoxicity: a transcriptomics approach. Arch Toxicol 87(1):123–143. doi:10.1007/s00204-012-0967-3
Article PubMed Central CAS PubMed Google Scholar
Marchan R (2014) Cancer research: from prognostic genes to therapeutic targets. EXCLI J 13:1278–1280
PubMed Central PubMed Google Scholar
Reif R (2014) Concepts of predictive toxicology. EXCLI J 2014(13):1292–1294
Google Scholar
Rempel E, Hoelting L, Waldmann T, Balmer NV, Schildknecht S, Grinberg M, Das Gaspar JA, Shinde V, Stöber R, Marchan R, van Thriel C, Liebing J, Meisig J, Blüthgen N, Sachinidis A, Rahnenführer J, Hengstler JG, Leist M (2015) A transcriptome-based classifier to identify developmental toxicants by stem cell testing: design, validation and optimization for histone deacetylase inhibitors. Arch Toxicol 89(9):1599–1618. doi:10.1007/s00204-015-1573-y
Article PubMed Central CAS PubMed Google Scholar
Schmidt M, Böhm D, von Törne C, Steiner E, Puhl A, Pilch H, Lehr HA, Hengstler JG, Kölbl H, Gehrmann M (2008) The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res 68(13):5405–5413. doi:10.1158/0008-5472
Article CAS PubMed Google Scholar
Schmidt M, Hellwig B, Hammad S, Othman A, Lohr M, Chen Z, Boehm D, Gebhard S, Petry I, Lebrecht A, Cadenas C, Marchan R, Stewart JD, Solbach C, Holmberg L, Edlund K, Kultima HG, Rody A, Berglund A, Lambe M, Isaksson A, Botling J, Karn T, Müller V, Gerhold-Ay A, Cotarelo C, Sebastian M, Kronenwett R, Bojar H, Lehr HA, Sahin U, Koelbl H, Gehrmann M, Micke P, Rahnenführer J, Hengstler JG (2012) A comprehensive analysis of human gene expression profiles identifies stromal immunoglobulin κ C as a compatible prognostic marker in human solid tumors. Clin Cancer Res 18(9):2695–2703. doi:10.1158/1078-0432.CCR-11-2210
Article CAS PubMed Google Scholar
Shao J, Berger LF, Hendriksen PJ, Peijnenburg AA, van Loveren H, Volger OL (2014) Transcriptome-based functional classifiers for direct immunotoxicity. Arch Toxicol 88(3):673–689. doi:10.1007/s00204-013-1179-1
Article CAS PubMed Google Scholar
Shinde V, Klima S, Sureshkumar PS, Meganathan K, Jagtap S, Rempel E, Rahnenführer J, Hengstler JG, Waldmann T, Hescheler J, Leist M, Sachinidis A (2015) Human pluripotent stem cell based developmental toxicity assays for chemical safety screening and systems biology data generation. J Vis Exp 17(100):e52333. doi:10.3791/52333
Google Scholar
Song M, Song MK, Choi HS, Ryu JC (2013) Monitoring of deiodinase deficiency based on transcriptomic responses in SH-SY5Y cells. Arch Toxicol 87(6):1103–1113. doi:10.1007/s00204-013-1018-4
Article CAS PubMed Google Scholar
Stewart JD (2010) In vitro test systems in toxicology. EXCLI J 9:156–158
Google Scholar
Stewart JD, Marchan R, Lesjak MS, Lambert J, Hergenroeder R, Ellis JK, Lau CH, Keun HC, Schmitz G, Schiller J, Eibisch M, Hedberg C, Waldmann H, Lausch E, Tanner B, Sehouli J, Sagemueller J, Staude H, Steiner E, Hengstler JG (2012) Choline-releasing glycerophosphodiesterase EDI3 drives tumor cell migration and metastasis. Proc Natl Acad Sci USA 109(21):8155–8160. doi:10.1073/pnas.1117654109
Article PubMed Central CAS PubMed Google Scholar
Stöber R (2014) Transcriptome based differentiation of harmless, teratogenetic and cytotoxic concentration ranges of valproic acid. EXCLI J 13:1281–1282
PubMed Central PubMed Google Scholar
Stock AM, Klee F, Edlund K, Grinberg M, Hammad S, Marchan R, Cadenas C, Niggemann B, Zänker KS, Rahnenführer J, Schmidt M, Hengstler JG, Entschladen F (2015) Gelsolin is associated with longer metastasis-free survival and reduced cell migration in estrogen receptor-positive breast cancer. Anticancer Res 35(10):5277–5285
PubMed Google Scholar
Waldmann T, Rempel E, Balmer NV, König A, Kolde R, Gaspar JA, Henry M, Hescheler J, Sachinidis A, Rahnenführer J, Hengstler JG, Leist M (2014) Design principles of concentration-dependent transcriptome deviations in drug-exposed differentiating stem cells. Chem Res Toxicol 27(3):408–420. doi:10.1021/tx400402j
Article PubMed Central CAS PubMed Google Scholar
Yafune A, Taniai E, Morita R, Nakane F, Suzuki K, Mitsumori K, Shibutani M (2013) Expression patterns of cell cycle proteins in the livers of rats treated with hepatocarcinogens for 28 days. Arch Toxicol 87(6):1141–1153. doi:10.1007/s00204-013-1011-y
Article CAS PubMed Google Scholar
Zellmer S, Schmidt-Heck W, Godoy P, Weng H, Meyer C, Lehmann T, Sparna T, Schormann W, Hammad S, Kreutz C, Timmer J, von Weizsäcker F, Thürmann PA, Merfort I, Guthke R, Dooley S, Hengstler JG, Gebhardt R (2010) Transcription factors ETF, E2F, and SP-1 are involved in cytokine-independent proliferation of murine hepatocytes. Hepatology 52(6):2127–2136. doi:10.1002/hep.23930
Article CAS PubMed Google Scholar
Zimmer B, Pallocca G, Dreser N, Foerster S, Waldmann T, Westerhout J, Julien S, Krause KH, van Thriel C, Hengstler JG, Sachinidis A, Bosgra S, Leist M (2014) Profiling of drugs and environmental chemicals for functional impairment of neural crest migration in a novel stem cell-based test battery. Arch Toxicol 88(5):1109–1126. doi:10.1007/s00204-014-1231-9
PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät Statistik, Technische Universität Dortmund, Vogelpothsweg 87, 44221, Dortmund, Germany
Marianna Grinberg

Authors

Marianna Grinberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marianna Grinberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grinberg, M. Highlight report: quality control for genome-wide expression data: how to identify sample mix-up. Arch Toxicol 89, 2459–2461 (2015). https://doi.org/10.1007/s00204-015-1644-0

Download citation

Published: 27 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00204-015-1644-0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Highlight report: quality control for genome-wide expression data: how to identify sample mix-up

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation