Advertisement

Molecular Diagnosis & Therapy

, Volume 21, Issue 3, pp 249–258 | Cite as

Mining Cancer Transcriptomes: Bioinformatic Tools and the Remaining Challenges

  • Thomas Milan
  • Brian T. WilhelmEmail author
Current Opinion

Abstract

The development of next-generation sequencing technologies has had a profound impact on the field of cancer genomics. With the enormous quantities of data being generated from tumor samples, researchers have had to rapidly adapt tools or develop new ones to analyse the raw data to maximize its value. While much of this effort has been focused on improving specific algorithms to get faster and more precise results, the accessibility of the final data for the research community remains a significant problem. Large amounts of data exist but are not easily available to researchers who lack the resources and experience to download and reanalyze them. In this article, we focus on RNA-seq analysis in the context of cancer genomics and discuss the bioinformatic tools available to explore these data. We also highlight the importance of developing new and more intuitive tools to provide easier access to public data and discuss the related issues of data sharing and patient privacy.

Keywords

Bioinformatic Tool Cancer Genomic Human Squamous Cell Carcinoma Cancer Genomic Project Intuitive Tool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

eQTL

expression Quantitative Trait Loci

GDC

Genomic Data Commons

ICGC

International Cancer Genome Consortium

TCGA

The Cancer Genome Atlas

Notes

Compliance with Ethical Standards

Conflict of interest

Thomas Milan and Brian T. Wilhelm have no conflicts of interest.

Funding

This work was supported by Grants to BTW from the Fonds de Recherche du Québec en Santé (32900), and the Terry Fox Research Institute (TFRI-NI-1042).

References

  1. 1.
    Mardis ER, Wilson RK. Cancer genome sequencing: a review. Hum Mol Genet. 2009;18(R2):R163–8.CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Cancer Genome Atlas Research Network, Kandoth C, Schultz N, Cherniack AD, Akbani R, Liu Y, et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497(7447):67–73.CrossRefGoogle Scholar
  3. 3.
    Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059–74.CrossRefGoogle Scholar
  4. 4.
    Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–7.CrossRefGoogle Scholar
  5. 5.
    Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–15.CrossRefGoogle Scholar
  6. 6.
    Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–8.CrossRefGoogle Scholar
  7. 7.
    Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153(1):17–37.CrossRefPubMedGoogle Scholar
  8. 8.
    Bainbridge MN, Warren RL, Hirst M, Romanuik T, Zeng T, Go A, et al. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics. 2006;7:246.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PLoS One. 2014;9(1):e78644.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Wilhelm BT, Briau M, Austin P, Faubert A, Boucher G, Chagnon P, et al. RNA-seq analysis of 2 closely related leukemia clones that differ in their self-renewal capacity. Blood. 2011;117(2):e27–38.CrossRefPubMedGoogle Scholar
  11. 11.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13.CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Griffith M, Walker JR, Spies NC, Ainscough BJ, Griffith OL. Informatics for RNA Sequencing: a web resource for analysis on the cloud. PLoS Comput Biol. 2015;11(8):e1004393.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, Chen Y, DeSalvo G, Epstein C, Fisher-Aylor KI, Euskirchen G, Gerstein M, Gertz J, Hartemink AJ, Hoffman MM, Iyer VR, Jung YL, Karmakar S, Kellis M, Kharchenko PV, Li Q, Liu T, Liu XS, Ma L, Milosavljevic A, Myers RM, Park PJ, Pazin MJ, Perry MD, Raha D, Reddy TE, Rozowsky J, Shoresh N, Sidow A, Slattery M, Stamatoyannopoulos JA, Tolstorukov MY, White KP, Xi S, Farnham PJ, Lieb JD, Wold BJ, Snyder M. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22(9):1813–31. doi: 10.1101/gr.136184.111.CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008;5(7):621–8.CrossRefPubMedGoogle Scholar
  15. 15.
    Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481(7382):506–10.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Kreso A, Dick JE. Evolution of the cancer stem cell model. Cell Stem Cell. 2014;14(3):275–91.CrossRefPubMedGoogle Scholar
  17. 17.
    Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30(8):777–82.CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Islam S, Kjallquist U, Moliner A, Zajac P, Fan JB, Lonnerberg P, et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 2011;21(7):1160–7.CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–82.CrossRefPubMedGoogle Scholar
  20. 20.
    Zhang X, Zhang M, Hou Y, Xu L, Li W, Zou Z, et al. Single-cell analyses of transcriptional heterogeneity in squamous cell carcinoma of urinary bladder. Oncotarget. 2016;7(40):66069–76.PubMedPubMedCentralGoogle Scholar
  21. 21.
    Gerber T, Willscher E, Loeffler-Wirth H, Hopp L, Schadendorf D, Schartl M, et al. Mapping heterogeneity in patient-derived melanoma cultures by single-cell RNA-seq. Oncotarget. 2017;8(1):846–62.PubMedGoogle Scholar
  22. 22.
    Min JW, Kim WJ, Han JA, Jung YJ, Kim KT, Park WY, et al. Identification of distinct tumor subpopulations in lung adenocarcinoma via single-Cell RNA-seq. PLoS One. 2015;10(8):e0135817.CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Farlik M, Halbritter F, Muller F, Choudry FA, Ebert P, Klughammer J, et al. DNA methylation dynamics of human hematopoietic stem cell differentiation. Cell Stem Cell. 2016;19(6):808–22.CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Human Cell Atlas. 2016. https://www.humancellatlas.org/.
  25. 25.
    McCarthy DJ, Campbell KR, Lun AT, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. Epub 14 January 2017.Google Scholar
  26. 26.
    Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: a pipeline for single-cell RNA-seq profiling analysis. PLoS Comput Biol. 2015;11(11):e1004575.CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Hou Y, Guo H, Cao C, Li X, Hu B, Zhu P, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res. 2016;26(3):304–19.CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Diaz A, Liu SJ, Sandoval C, Pollen A, Nowakowski TJ, Lim DA, et al. SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics. 2016;32(14):2219–20.CrossRefPubMedGoogle Scholar
  29. 29.
    Mattson MP. Superior pattern processing is the essence of the evolved human brain. Front Neurosci. 2014;8:265.CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62.CrossRefPubMedGoogle Scholar
  31. 31.
    Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 1999;9(8):677–9.PubMedGoogle Scholar
  32. 32.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13(7):e1002195.CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Zhang Y, Li L, Xiao J, Yang Y, Zhu Z. FQZip: lossless reference-based compression of next generation sequencing data in FASTQ format. In: Handa H, Ishibuchi H, Ong Y-S, Tan K-C, editors. Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems, vol. 2. Cham: Springer International Publishing; 2015. p. 127–35.Google Scholar
  35. 35.
    Saha S, Rajasekaran S. NRGC: a novel referential genome compression algorithm. Bioinformatics. 2016;32(22):3405–12.PubMedGoogle Scholar
  36. 36.
    Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, et al. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinform. 2015;16:288.CrossRefGoogle Scholar
  37. 37.
    Joly Y, Ngueng Feze I, Simard J. Genetic discrimination and life insurance: a systematic review of the evidence. BMC Med. 2013;11:25.CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Otlowski M, Taylor S, Bombard Y. Genetic discrimination: international perspectives. Annu Rev Genomics Hum Genet. 2012;13:433–54.CrossRefPubMedGoogle Scholar
  39. 39.
    McGuire AL, Majumder MA. Two cheers for GINA? Genome Med. 2009;1(1):6.CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, et al. Sharing clinical trial data—a proposal from the international committee of medical journal editors. N Engl J Med. 2016;374(4):384–6.CrossRefPubMedGoogle Scholar
  41. 41.
    Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1.Google Scholar
  42. 42.
    Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, et al. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. Nat Methods. 2016;13(3):245–7.CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Cech M, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016;44(W1):W3–10.CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Jefford M, Moore R. Improvement of informed consent and the quality of consent documents. Lancet Oncol. 2008;9(5):485–93.CrossRefPubMedGoogle Scholar
  45. 45.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.CrossRefPubMedGoogle Scholar
  46. 46.
    Li H. Improving SNP discovery by base alignment quality. Bioinformatics. 2011;27(8):1157–8.CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60.CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.CrossRefPubMedPubMedCentralGoogle Scholar
  49. 49.
    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568–76.CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31(1):46–53.CrossRefPubMedGoogle Scholar
  52. 52.
    Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7(12):1009–15.CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Shen S, Park JW, Huang J, Dittmar KA, Lu ZX, Zhou Q, et al. MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-seq data. Nucleic Acids Res. 2012;40(8):e61.CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.CrossRefPubMedGoogle Scholar
  56. 56.
    Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22(10):2008–17.CrossRefPubMedPubMedCentralGoogle Scholar
  57. 57.
    Hardcastle TJ, Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinform. 2010;11:422.CrossRefGoogle Scholar
  58. 58.
    Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, et al. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics. 2013;29(23):2971–8.CrossRefPubMedGoogle Scholar
  59. 59.
    Rivas MA, Pirinen M, Neville MJ, Gaulton KJ, Moutsianas L, Go TDC, et al. Assessing association between protein truncating variants and quantitative traits. Bioinformatics. 2013;29(19):2419–26.CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Kim D, Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12(8):R72.CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011;7:522.CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics. 2011;27(20):2903–4.CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Delhomme N, Padioleau I, Furlong EE, Steinmetz LM. easyRNASeq: a bioconductor package for processing RNA-seq data. Bioinformatics. 2012;28(19):2532–3.CrossRefPubMedPubMedCentralGoogle Scholar
  64. 64.
    Wolfinger MT, Fallmann J, Eggenhofer F, Amman F. ViennaNGS: a toolbox for building efficient next-generation sequencing analysis pipelines. F1000Res. 2015;4:50.Google Scholar
  65. 65.
    Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet. 2006;38(5):500–1.CrossRefPubMedGoogle Scholar
  66. 66.
    Julia M, Telenti A, Rausell A. Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics. 2015;31(20):3380–2.CrossRefPubMedPubMedCentralGoogle Scholar
  67. 67.
    International Cancer Genome C, Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, et al. International network of cancer genome projects. Nature. 2010;464(7291):993–8.CrossRefGoogle Scholar
  68. 68.
    Zhu J, Sanborn JZ, Benz S, Szeto C, Hsu F, Kuhn RM, et al. The UCSC cancer genomics browser. Nat Methods. 2009;6(4):239–40.CrossRefPubMedPubMedCentralGoogle Scholar
  69. 69.
    Samur MK, Yan Z, Wang X, Cao Q, Munshi NC, Li C, et al. canEvolve: a web portal for integrative oncogenomics. PLoS One. 2013;8(2):e56228.CrossRefPubMedPubMedCentralGoogle Scholar
  70. 70.
    Zhang J, Finney RP, Rowe W, Edmonson M, Yang SH, Dracheva T, et al. Systematic analysis of genetic alterations in tumors using Cancer Genome WorkBench (CGWB). Genome Res. 2007;17(7):1111–7.CrossRefPubMedPubMedCentralGoogle Scholar
  71. 71.
    Bu D, Yu K, Sun S, Xie C, Skogerbo G, Miao R, et al. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res. 2012;40. (Database issue: D210-5).Google Scholar
  72. 72.
    Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41. (Database issue: D955-61).Google Scholar
  73. 73.
    Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. Human genomics. The human transcriptome across tissues and individuals. Science. 2015;348(6235):660–5.CrossRefPubMedPubMedCentralGoogle Scholar
  74. 74.
    Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. 2012. arXiv:1207.3907v2.

Copyright information

© Springer International Publishing Switzerland 2017

Authors and Affiliations

  1. 1.Laboratory for High Throughput Biology, Institute for Research in Immunology and CancerUniversité de MontréalMontrealCanada
  2. 2.Department of MedicineUniversité de MontréalMontréalCanada

Personalised recommendations