Abstract
“Reproducible research” refers to a publishing discipline, originating in the geosciences, in which journal articles are accompanied by publication of data resources and software sufficient to allow independent reproduction of all tables and figures presented in articles. This paper reviews concepts of reproducible research in connection with cancer bioinformatics. The importance of reproducible discipline in the face of analytic complexity of microarray studies is documented with two case studies, and the role of portable self-documenting data and software archives in securing reproducibility is described. Legal protections for those engaged in reproducible research are discussed in the context of current US copyright law; a reproducible research standard that formalizes rights and obligations of those engaged in reproducible research is detailed. There is every indication that reproducible discipline is feasible for microarray studies, and reliability of inferences in cancer bioinformatics will be enhanced if commitments to concrete reproducibility are broadly accepted in the research community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baggerly KA, Coombes KR, Neeley ES (2008) Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. J Clin Oncol 26(7):1186–1187. doi:10.1200/JCO.2007.15.1951. URL http://www.hubmed.org/display.cgi?uids=18309960
Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA, Marks JR, Dressman HK, West M, Nevins JR (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439(7074):353–357.doi: 10.1038/nature04296. URL http://www.hubmed.org/display.cgi?uids=16273092
Carey VJ, Gentry J, Sarkar R, Gentleman D, Ramaswamy S (2008) SGDI: system for genomic data integration. Pac Symp Biocomput 141–152. URL http://www.hubmed.org/display.cgi?uids=18229682
Carvalho CM, Chang J, Lucas JE, Nevins JR, Wang Q, West M (2008) High-dimensional sparse factor modeling: applications in gene expression genomics. J Am Stat Assoc 103(484):1438–1456
Donoho DL, Maleki A, Ur Rahman I, Shahram M, Stodden V (2009) Reproducible research in computational harmonic analysis. IEEE Comput Sci Eng 11(1):8–18
Dressman HK, Berchuck A, Chan G, Zhai J, Bild A, Sayer R, Cragun J, Clarke J, Whitaker RS, Li L, Gray J, Marks J, Ginsburg GS, Potti A, West M, Nevins JR, and Lancaster JM (2007). An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 25(5):517–525. doi:10.1200/JCO.2006.06.3743. URL http://www.hubmed.org/display.cgi?uids=17290060
Gentleman R (2005) Reproducible research: a bioinformatics case study. Stat Appl Genet Mol Biol 4. doi:10.2202/1544-6115.1034. URL http://www.hubmed.org/display.cgi?uids=16646837
Gentleman R, Lang DT (2004) Statistical analyses and reproducible research. Bioconductor project working papers 2, May 2004. URL http://www.bepress.com/bioconductor/paper2
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80.doi: 10.1186/gb-2004-5-10-r80. URL http://www.hubmed.org/display.cgi?uids=15461798
Hans C, Dobra A, West M (2007) Shotgun stochastic search for regression with many candidate predictors. J Am Stat Assoc 102:507–516
Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41(2):149–155. doi:10.1038/ng.295. URL http://www.hubmed.org/display.cgi?uids=19174838
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003) Summaries of affymetrix genechip probe level data. Nucleic Acids Res 31(4):e15. URL http://www.hubmed.org/display.cgi?uids=12582260
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. doi:10.1093/biostatistics/kxj037. URL http://www.hubmed.org/display.cgi?uids=16632515
Laine C, Goodman SN, Griswold ME, Sox HC (2007) Reproducible research: moving toward research the public can really trust. Ann Intern Med 146(6):450–453. URL http://www.hubmed.org/display.cgi?uids=17339612
Lessig L (2008) Remix: making art and commerce thrive in the hybrid economy. The Penguin Press, New York, NY
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365(9458):488–492. doi:10.1016/S0140-6736(05)17866-0. URL http://www.hubmed.org/display.cgi?uids=15705458
Peng RD, Dominici F, Zeger SL (2006) Reproducible epidemiologic research. Am J Epidemiol 163(9):783–789. doi:10.1093/aje/kwj093. URL http://www.hubmed.org/display.cgi?uids=16510544
Picard RR, Berk KN (1990) Data splitting. Am Stat 44:140–147
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442. doi:10.1038/415436a. URL http://www.hubmed.org/display.cgi?uids=11807556
Ramasamy A, Mondry A, Holmes CC, Altman DG (2008) Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med 5(9):e184. doi:10.1371/journal.pmed.0050184. URL http://www.hubmed.org/display.cgi?uids=18767902
Stodden V (2009) Enabling reproducible research: licensing for scientific innovation. Int J Commun Law Policy 13(1):1–25
van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536. doi:10.1038/415530a. URL http://www.hubmed.org/display.cgi?uids=11823860
Vandewalle P, Kovacevic J, Vetterli M (2009) Reproducible research in signal processing – what, why, and how. IEEE Signal Process Mag 26(3):37–47. URL http://rr.epfl.ch/17/
von Hippel E (2006) Democratizing innovation. MIT, Cambridge, MA
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002). Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143. URL http://www.hubmed.org/display.cgi
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Carey, V.J., Stodden, V. (2010). Reproducible Research Concepts and Tools for Cancer Bioinformatics. In: Ochs, M., Casagrande, J., Davuluri, R. (eds) Biomedical Informatics for Cancer Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5714-6_8
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5714-6_8
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5712-2
Online ISBN: 978-1-4419-5714-6
eBook Packages: MedicineMedicine (R0)