Advertisement

Identification of Mutated Cancer Driver Genes in Unpaired RNA-Seq Samples

  • David Mosen-AnsorenaEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1878)

Abstract

The identification of cancer driver genes through the analysis of mutations detected with high-throughput sequencing is a useful tool and a key challenge in cancer genomics. The workflow presented here relies on unpaired RNA-seq tumoral samples, thus leveraging already available RNA-seq data and providing the intrinsical benefits of directly targeting the transcriptome. Based on well-established methods for variant detection, this workflow also involves thorough data cleaning and extensive annotation, which enable the selection for somatic mutations with functional impact and the prioritization of genes relevant to the carcinogenic processes in the input samples.

Key words

High-throughput sequencing RNA-seq Mutations Cancer Driver genes 

References

  1. 1.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079.  https://doi.org/10.1093/bioinformatics/btp352CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
  3. 3.
    Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311.  https://doi.org/10.1093/nar/29.1.308CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, Kemeza DM, Strassler DS, Ponting CP, Webber C, Devine SE (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839.  https://doi.org/10.1101/gr.115907.110CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor. Bioinformatics 26:2069–2070.  https://doi.org/10.1093/bioinformatics/btq330CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 31:213–219.  https://doi.org/10.1038/nbt.2514CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576.  https://doi.org/10.1101/gr.129684.111CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Radenbaugh AJ, Ma S, Ewing A, Stuart JM, Collisson EA, Zhu J, Haussler D (2014) RADIA: RNA and DNA integrated analysis for somatic mutation detection. PLoS One 9:e111516.  https://doi.org/10.1371/journal.pone.0111516CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303.  https://doi.org/10.1101/gr.107524.110CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498.  https://doi.org/10.1038/ng.806CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21.  https://doi.org/10.1093/bioinformatics/bts635CrossRefGoogle Scholar
  12. 12.
    Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA (2013) From FastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 43:11.10.1–11.10.33Google Scholar
  13. 13.
    Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38:e178.  https://doi.org/10.1093/nar/gkq622CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185–1191.  https://doi.org/10.1038/nmeth.2722CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Exome Aggregation Consortium (ExAC), Cambridge, MA. http://exac.broadinstitute.org/
  16. 16.
    Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39:D945–D950.  https://doi.org/10.1093/nar/gkq929CrossRefPubMedGoogle Scholar
  17. 17.
    Derrien T, Estellé J, Marco Sola S, Knowles DG, Raineri E, Guigó R, Ribeca P (2012) Fast computation and applications of genome mappability. PLoS One 7:e30377.  https://doi.org/10.1371/journal.pone.0030377CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Wang C, Davila JI, Baheti S, Bhagwate AV, Wang X, Kocher J-PA, Slager SL, Feldman AL, Novak AJ, Cerhan JR, Thompson EA, Asmann YW (2014) RVboost: RNA-seq variants prioritization using a boosting method. Bioinformatics 3:1–3.  https://doi.org/10.1093/bioinformatics/btu577CrossRefGoogle Scholar
  19. 19.
    Piskol R, Ramaswami G, Li JB (2013) Reliable identification of genomic variants from RNA-seq data. Am J Hum Genet 93:641–651.  https://doi.org/10.1016/j.ajhg.2013.08.008CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Cabanski CR, Wilkerson MD, Soloway M, Parker JS, Liu J, Prins JF, Marron JS, Perou CM, Neil Hayes D (2013) BlackOPs: increasing confidence in variant detection through mappability filtering. Nucleic Acids Res 41:1–10.  https://doi.org/10.1093/nar/gkt692CrossRefGoogle Scholar
  21. 21.
    Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664.  https://doi.org/10.1101/gr.229202CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    O’Brien TD, Jia P, Xia J, Saxena U, Jin H, Vuong H, Kim P, Wang Q, Aryee MJ, Mino-Kenudson M, Engelman J, Le LP, Iafrate AJ, Heist RS, Pao W, Zhao Z (2015) Inconsistency and features of single nucleotide variants detected in whole exome sequencing versus transcriptome sequencing: a case study in lung cancer. Methods 83:118–127.  https://doi.org/10.1016/j.ymeth.2015.04.016CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of BiostatisticsHarvard T.H. Chan School of Public HealthBostonUSA
  2. 2.Department of Biostatistics and Computational BiologyDana-Farber Cancer InstituteBostonUSA

Personalised recommendations