Mining Cancer Transcriptomes: Bioinformatic Tools and the Remaining Challenges

Abstract

The development of next-generation sequencing technologies has had a profound impact on the field of cancer genomics. With the enormous quantities of data being generated from tumor samples, researchers have had to rapidly adapt tools or develop new ones to analyse the raw data to maximize its value. While much of this effort has been focused on improving specific algorithms to get faster and more precise results, the accessibility of the final data for the research community remains a significant problem. Large amounts of data exist but are not easily available to researchers who lack the resources and experience to download and reanalyze them. In this article, we focus on RNA-seq analysis in the context of cancer genomics and discuss the bioinformatic tools available to explore these data. We also highlight the importance of developing new and more intuitive tools to provide easier access to public data and discuss the related issues of data sharing and patient privacy.

This is a preview of subscription content, log in to check access.

Abbreviations

eQTL:

expression Quantitative Trait Loci

GDC:

Genomic Data Commons

ICGC:

International Cancer Genome Consortium

TCGA:

The Cancer Genome Atlas

References

  1. 1.

    Mardis ER, Wilson RK. Cancer genome sequencing: a review. Hum Mol Genet. 2009;18(R2):R163–8.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Cancer Genome Atlas Research Network, Kandoth C, Schultz N, Cherniack AD, Akbani R, Liu Y, et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497(7447):67–73.

    Article  Google Scholar 

  3. 3.

    Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059–74.

    Article  Google Scholar 

  4. 4.

    Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–7.

    Article  Google Scholar 

  5. 5.

    Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–15.

    Article  Google Scholar 

  6. 6.

    Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–8.

    Article  Google Scholar 

  7. 7.

    Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153(1):17–37.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Bainbridge MN, Warren RL, Hirst M, Romanuik T, Zeng T, Go A, et al. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics. 2006;7:246.

    Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PLoS One. 2014;9(1):e78644.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Wilhelm BT, Briau M, Austin P, Faubert A, Boucher G, Chagnon P, et al. RNA-seq analysis of 2 closely related leukemia clones that differ in their self-renewal capacity. Blood. 2011;117(2):e27–38.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13.

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL. Informatics for RNA Sequencing: a web resource for analysis on the cloud. PLoS Comput Biol. 2015;11(8):e1004393.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22(9):1813–31. doi:10.1101/gr.136184.111.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008;5(7):621–8.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481(7382):506–10.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Kreso A, Dick JE. Evolution of the cancer stem cell model. Cell Stem Cell. 2014;14(3):275–91.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30(8):777–82.

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Islam S, Kjallquist U, Moliner A, Zajac P, Fan JB, Lonnerberg P, et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 2011;21(7):1160–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–82.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Zhang X, Zhang M, Hou Y, Xu L, Li W, Zou Z, et al. Single-cell analyses of transcriptional heterogeneity in squamous cell carcinoma of urinary bladder. Oncotarget. 2016;7(40):66069–76.

    PubMed  PubMed Central  Google Scholar 

  21. 21.

    Gerber T, Willscher E, Loeffler-Wirth H, Hopp L, Schadendorf D, Schartl M, et al. Mapping heterogeneity in patient-derived melanoma cultures by single-cell RNA-seq. Oncotarget. 2017;8(1):846–62.

    PubMed  Google Scholar 

  22. 22.

    Min JW, Kim WJ, Han JA, Jung YJ, Kim KT, Park WY, et al. Identification of distinct tumor subpopulations in lung adenocarcinoma via single-Cell RNA-seq. PLoS One. 2015;10(8):e0135817.

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Farlik M, Halbritter F, Muller F, Choudry FA, Ebert P, Klughammer J, et al. DNA methylation dynamics of human hematopoietic stem cell differentiation. Cell Stem Cell. 2016;19(6):808–22.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Human Cell Atlas. 2016. https://www.humancellatlas.org/.

  25. 25.

    McCarthy DJ, Campbell KR, Lun AT, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. Epub 14 January 2017.

  26. 26.

    Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput Biol. 2015;11(11):e1004575.

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Hou Y, Guo H, Cao C, Li X, Hu B, Zhu P, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res. 2016;26(3):304–19.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Diaz A, Liu SJ, Sandoval C, Pollen A, Nowakowski TJ, Lim DA, et al. SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics. 2016;32(14):2219–20.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Mattson MP. Superior pattern processing is the essence of the evolved human brain. Front Neurosci. 2014;8:265.

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62.

    Article  PubMed  Google Scholar 

  31. 31.

    Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 1999;9(8):677–9.

    CAS  PubMed  Google Scholar 

  32. 32.

    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13(7):e1002195.

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Zhang Y, Li L, Xiao J, Yang Y, Zhu Z. FQZip: lossless reference-based compression of next generation sequencing data in FASTQ format. In: Handa H, Ishibuchi H, Ong Y-S, Tan K-C, editors. Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems, vol. 2. Cham: Springer International Publishing; 2015. p. 127–35.

  35. 35.

    Saha S, Rajasekaran S. NRGC: a novel referential genome compression algorithm. Bioinformatics. 2016;32(22):3405–12.

    PubMed  Google Scholar 

  36. 36.

    Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, et al. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinform. 2015;16:288.

    Article  Google Scholar 

  37. 37.

    Joly Y, Ngueng Feze I, Simard J. Genetic discrimination and life insurance: a systematic review of the evidence. BMC Med. 2013;11:25.

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Otlowski M, Taylor S, Bombard Y. Genetic discrimination: international perspectives. Annu Rev Genomics Hum Genet. 2012;13:433–54.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    McGuire AL, Majumder MA. Two cheers for GINA? Genome Med. 2009;1(1):6.

    Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, et al. Sharing clinical trial data—a proposal from the international committee of medical journal editors. N Engl J Med. 2016;374(4):384–6.

    Article  PubMed  Google Scholar 

  41. 41.

    Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1.

  42. 42.

    Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, et al. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. Nat Methods. 2016;13(3):245–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Cech M, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016;44(W1):W3–10.

    Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Jefford M, Moore R. Improvement of informed consent and the quality of consent documents. Lancet Oncol. 2008;9(5):485–93.

    Article  PubMed  Google Scholar 

  45. 45.

    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.

    CAS  Article  PubMed  Google Scholar 

  46. 46.

    Li H. Improving SNP discovery by base alignment quality. Bioinformatics. 2011;27(8):1157–8.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. 48.

    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.

    Article  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568–76.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31(1):46–53.

    CAS  Article  PubMed  Google Scholar 

  52. 52.

    Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

    Article  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7(12):1009–15.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Shen S, Park JW, Huang J, Dittmar KA, Lu ZX, Zhou Q, et al. MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-seq data. Nucleic Acids Res. 2012;40(8):e61.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.

    CAS  Article  PubMed  Google Scholar 

  56. 56.

    Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22(10):2008–17.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Hardcastle TJ, Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinform. 2010;11:422.

    Article  Google Scholar 

  58. 58.

    Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, et al. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics. 2013;29(23):2971–8.

    CAS  Article  PubMed  Google Scholar 

  59. 59.

    Rivas MA, Pirinen M, Neville MJ, Gaulton KJ, Moutsianas L, Go TDC, et al. Assessing association between protein truncating variants and quantitative traits. Bioinformatics. 2013;29(19):2419–26.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12(8):R72.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011;7:522.

    Article  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics. 2011;27(20):2903–4.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Delhomme N, Padioleau I, Furlong EE, Steinmetz LM. easyRNASeq: a bioconductor package for processing RNA-seq data. Bioinformatics. 2012;28(19):2532–3.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Wolfinger MT, Fallmann J, Eggenhofer F, Amman F. ViennaNGS: a toolbox for building efficient next-generation sequencing analysis pipelines. F1000Res. 2015;4:50.

  65. 65.

    Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet. 2006;38(5):500–1.

    CAS  Article  PubMed  Google Scholar 

  66. 66.

    Julia M, Telenti A, Rausell A. Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics. 2015;31(20):3380–2.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. 67.

    International Cancer Genome C, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, et al. International network of cancer genome projects. Nature. 2010;464(7291):993–8.

    Article  Google Scholar 

  68. 68.

    Zhu J, Sanborn JZ, Benz S, Szeto C, Hsu F, Kuhn RM, et al. The UCSC cancer genomics browser. Nat Methods. 2009;6(4):239–40.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Samur MK, Yan Z, Wang X, Cao Q, Munshi NC, Li C, et al. canEvolve: a web portal for integrative oncogenomics. PLoS One. 2013;8(2):e56228.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Zhang J, Finney RP, Rowe W, Edmonson M, Yang SH, Dracheva T, et al. Systematic analysis of genetic alterations in tumors using Cancer Genome WorkBench (CGWB). Genome Res. 2007;17(7):1111–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Bu D, Yu K, Sun S, Xie C, Skogerbo G, Miao R, et al. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res. 2012;40. (Database issue: D210-5).

  72. 72.

    Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41. (Database issue: D955-61).

  73. 73.

    Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015;348(6235):660–5.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. 2012. arXiv:1207.3907v2.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Brian T. Wilhelm.

Ethics declarations

Conflict of interest

Thomas Milan and Brian T. Wilhelm have no conflicts of interest.

Funding

This work was supported by Grants to BTW from the Fonds de Recherche du Québec en Santé (32900), and the Terry Fox Research Institute (TFRI-NI-1042).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Milan, T., Wilhelm, B.T. Mining Cancer Transcriptomes: Bioinformatic Tools and the Remaining Challenges. Mol Diagn Ther 21, 249–258 (2017). https://doi.org/10.1007/s40291-017-0264-1

Download citation

Keywords

  • Bioinformatic Tool
  • Cancer Genomic
  • Human Squamous Cell Carcinoma
  • Cancer Genomic Project
  • Intuitive Tool