Skip to main content

Building Portable and Reproducible Cancer Informatics Workflows: An RNA Sequencing Case Study

  • Protocol
  • First Online:
Cancer Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1878))

Abstract

The Seven Bridges Cancer Genomics Cloud (CGC) is part of the National Cancer Institute Cloud Resource project, which was created to explore the paradigm of co-locating massive datasets with the computational resources to analyze them. The CGC was designed to allow researchers to easily find the data they need and analyze it with robust applications in a scalable and reproducible fashion. To enable this, individual tools are packaged within Docker containers and described by the Common Workflow Language (CWL), an emerging standard for enabling reproducible data analysis. On the CGC, researchers can deploy individual tools and customize massive workflows by chaining together tools. Here, we discuss a case study in which RNA sequencing data is analyzed with different methods and compared on the Seven Bridges CGC. We highlight best practices for designing command line tools, Docker containers, and CWL descriptions to enable massively parallelized and reproducible biomedical computation with cloud resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alioto TS et al (2015) A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 6:10001

    Article  CAS  Google Scholar 

  2. Lau JW, Lehnert E, Sethi A, Malhotra R, Kaushik G, Onder Z, Groves-Kirkby N (2017) The cancer genomics cloud: Collaborative, reproducible, and democratized-a new paradigm in large-scale computational research. Cancer Research. 77(21):e3–e6

    Article  CAS  Google Scholar 

  3. Merkel D (2014) Docker: lightweight linux containers for consistent development and deployment. Linux J 2014(239):2

    Google Scholar 

  4. Amstutz, Peter, Crusoe, Michael R, Tijanić, Nebojša, Chapman, Brad, Chilton, John, Heuer, Michael, Kartashov, Andrey, Leehr, Dan, Ménager, Hervé, Nedeljkovich, Maya, Scales, Matt, Soiland-Reyes, Stian, Stojanovic, Luka (2016) Common workflow language, v1.0. Figshare

    Google Scholar 

  5. Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12(1):1

    Article  Google Scholar 

  6. Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34(5):525–527

    Article  CAS  Google Scholar 

  7. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ et al (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391):603–607

    Article  CAS  Google Scholar 

  8. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T et al (2002) The Ensembl genome database project. Nucleic Acids Res 30(1):38–41

    Article  CAS  Google Scholar 

  9. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22(9):1775–1789

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The Cancer Genomics Cloud is powered by Seven Bridges and has been funded in whole or in part with federal funds from the NCI, NIH, Department of Health and Human Services, under contract no. HHSN261201400008C and HHSN261200800001E. We thank the entire Seven Bridges team, the Cancer Genomics Cloud Pilot teams from the NCI, the Broad Institute, and the Institute of Systems Biology, the Genomic Data Commons team, countless early users, and data donors. We also wish to further acknowledge the source of two of the datasets that are available to authorized users through the CGC and that were central to its development: The Cancer Genome Atlas (TCGA, phs000178). The resources described here were developed in part based upon data generated by The Cancer Genome Atlas managed by the NCI and NHGRI. Information about TCGA can be found at https://cancergenome.nih.gov/. And Therapeutically Applicable Research to Generate Effective Treatments (TARGET, phs000218). The resources described here were developed in part based on data generated by the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative managed by the NCI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brandi Davis-Dusenbery .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Kaushik, G., Davis-Dusenbery, B. (2019). Building Portable and Reproducible Cancer Informatics Workflows: An RNA Sequencing Case Study. In: Krasnitz, A. (eds) Cancer Bioinformatics. Methods in Molecular Biology, vol 1878. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8868-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8868-6_2

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8866-2

  • Online ISBN: 978-1-4939-8868-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics