Finding cancer driver mutations in the era of big data research

  • Rebecca C. Poulos
  • Jason W. H. Wong


In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.


Cancer genomics Somatic Driver mutation Big data Cancer Sequencing Genome Mutational signatures Selection 


Compliance with ethical standards

Funding information

R.C.P is supported by an Australian Government Research Training Program Scholarship. J.W.H.W. is supported by an Australian Research Council Future Fellowship (FT130100096) and a National Health and Medical Research Council Project Grant (APP1119932).

Conflicts of interest

Rebecca C. Poulos declares that she has no conflict of interest. Jason W.H. Wong declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Abraham BJ, Hnisz D, Weintraub AS, Kwiatkowski N, Li CH, Li Z, Weichert-Leahey N, Rahman S, Liu Y, Etchin J et al (2017) Small genomic insertions form enhancers that misregulate oncogenes. Nat Commun 8:14385CrossRefPubMedPubMedCentralGoogle Scholar
  2. Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 7:Unit 7.20Google Scholar
  3. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Borresen-Dale A-L et al (2013a) Signatures of mutational processes in human cancer. Nature 500:415–421CrossRefPubMedPubMedCentralGoogle Scholar
  4. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR (2013b) Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3:246–259CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bell RJA, Rube HT, Xavier-Magalhães A, Costa BM, Mancini A, Song JS, Costello JF (2016) Understanding TERT promoter mutations: a common path to immortality. Mol Cancer Res 14:315–323CrossRefPubMedPubMedCentralGoogle Scholar
  6. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S et al (2012) Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 22:1790–1797CrossRefPubMedPubMedCentralGoogle Scholar
  7. Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA et al (2012) Absolute quantification of somatic DNA alterations in human cancer. Nat Biotech 30:413–421CrossRefGoogle Scholar
  8. Cuykendall TN, Rubin MA, Khurana E (2017) Non-coding genetic variation in cancer. Curr Opin Syst Biol 1:9–15CrossRefGoogle Scholar
  9. Dees ND, Zhang Q, Kandoth C, Wendl MC, Schierding W, Koboldt DC (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res 22:1589–1598CrossRefPubMedPubMedCentralGoogle Scholar
  10. Flensburg C, Sargeant T, Bosma A, Kluin RJC, Kibbelaar RE, Hoogendoorn M, Alexander WS, Roberts AW, Bernards R, de Jong D et al (2017) Dynamic changes in clonal architecture during disease progression in follicular lymphoma. bioRxiv.
  11. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S et al (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811CrossRefPubMedGoogle Scholar
  12. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A et al (2011) COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res 39:D945–D950CrossRefPubMedGoogle Scholar
  13. Frigola J, Sabarinathan R, Mularoni L, Muinos F, Gonzalez-Perez A, Lopez-Bigas N (2017) Reduced mutation rate in exons due to differential mismatch repair. Nat Genet 49:1684–1692CrossRefPubMedGoogle Scholar
  14. Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, Khurana E, Gerstein M (2014) FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol 15:480CrossRefPubMedPubMedCentralGoogle Scholar
  15. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR (2004) A census of human cancer genes. Nat Rev Cancer 4:177–183CrossRefPubMedPubMedCentralGoogle Scholar
  16. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E et al (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6:l1CrossRefGoogle Scholar
  17. Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, Santos A, Lopez-Bigas N (2013) IntOGen-mutations identifies cancer drivers across tumor types. Nat Meth 10:1081–1082CrossRefGoogle Scholar
  18. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351CrossRefPubMedGoogle Scholar
  19. Groschel S, Sanders MA, Hoogenboezem R, de Wit E, Bouwman BA, Erpelinck C, van der Velden VH, Havermans M, Avellino R, van Lom K et al (2014) A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157:369–381CrossRefPubMedGoogle Scholar
  20. Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, Staudt LM (2016) Toward a shared vision for cancer genomic data. N Engl J Med 375:1109–1112CrossRefPubMedGoogle Scholar
  21. Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144:646–674CrossRefPubMedGoogle Scholar
  22. Hinkson IV, Davidsen TM, Klemm JD, Kerlavage AR, Kibbe WA (2017) A comprehensive infrastructure for big data in cancer research: accelerating cancer research and precision medicine. Front Cell Dev Biol 5:83CrossRefPubMedPubMedCentralGoogle Scholar
  23. Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, Kadel S, Moll I, Nagore E, Hemminki K et al (2013) TERT promoter mutations in familial and sporadic melanoma. Science 339:959–961CrossRefPubMedGoogle Scholar
  24. Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA (2013) Highly recurrent TERT promoter mutations in human melanoma. Science 339:957–959CrossRefPubMedPubMedCentralGoogle Scholar
  25. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46:310–315CrossRefPubMedPubMedCentralGoogle Scholar
  26. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protocols 4:1073–1081CrossRefPubMedGoogle Scholar
  27. Lanzós A, Carlevaro-Fita J, Mularoni L, Reverter F, Palumbo E, Guigó R, Johnson R (2017) Discovery of cancer driver long noncoding RNAs across 1112 tumour genomes: new candidates and distinguishing features. Sci Rep 7:41544CrossRefPubMedPubMedCentralGoogle Scholar
  28. Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505:495–501CrossRefPubMedPubMedCentralGoogle Scholar
  29. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA et al (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218CrossRefPubMedPubMedCentralGoogle Scholar
  30. Lochovsky L, Zhang J, Fu Y, Khurana E, Gerstein M (2015) LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Res 43:8123–8134CrossRefPubMedPubMedCentralGoogle Scholar
  31. Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, Etchin J, Lawton L, Sallan SE, Silverman LB et al (2014) Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346:1373–1377CrossRefPubMedPubMedCentralGoogle Scholar
  32. Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, Davies H, Stratton MR, Campbell PJ (2017) Universal patterns of selection in cancer and somatic tissues. Cell 171:1029–1041CrossRefPubMedPubMedCentralGoogle Scholar
  33. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190–1195CrossRefPubMedPubMedCentralGoogle Scholar
  34. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F (2016) The Ensembl variant effect predictor. Genome Biol 17:122CrossRefPubMedPubMedCentralGoogle Scholar
  35. Mertens F, Johansson B, Fioretos T, Mitelman F (2015) The emerging complexity of gene fusions in cancer. Nat Rev Cancer 15:371CrossRefPubMedGoogle Scholar
  36. Miller CA, White BS, Dees ND, Griffith M, Welch JS, Griffith OL, Vij R, Tomasson MH, Graubert TA, Walter MJ et al (2014) SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput Biol 10:e1003665CrossRefPubMedPubMedCentralGoogle Scholar
  37. Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, López-Bigas N (2016) OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol 17:128CrossRefPubMedPubMedCentralGoogle Scholar
  38. Nowell PC (1976) The clonal evolution of tumor cell populations. Science 194:23–28CrossRefPubMedGoogle Scholar
  39. Oesper L, Mahmoody A, Raphael BJ (2013) THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biol 14:R80–R80CrossRefPubMedPubMedCentralGoogle Scholar
  40. Perera D, Chacon D, Thoms JA, Poulos RC, Shlien A, Beck D, Campbell PJ, Pimanda JE, Wong JW (2014) OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol 15:485PubMedPubMedCentralGoogle Scholar
  41. Perera D, Poulos RC, Shah A, Beck D, Pimanda JE, Wong JWH (2016) Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532:259–263CrossRefPubMedGoogle Scholar
  42. Porta-Pardo E, Godzik A (2014) e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30:3109–3114CrossRefPubMedPubMedCentralGoogle Scholar
  43. Poulos RC, Olivier J, Wong JWH (2017) The interaction between cytosine methylation and processes of DNA replication and repair shape the mutational landscape of cancer genomes. Nucleic Acids Res 45:7786–7795CrossRefPubMedPubMedCentralGoogle Scholar
  44. Poulos RC, Thoms JAI, Guan YF, Unnikrishnan A, Pimanda JE, Wong JWH (2016) Functional mutations form at CTCF-cohesin binding sites in melanoma due to uneven nucleotide excision repair across the motif. Cell Rep 17:2865–2872CrossRefPubMedGoogle Scholar
  45. Poulos, R.C., Wong, J.W.H. (2017) cis-regulatory driver mutations in cancer genomes. In eLS (John Wiley & Sons, Ltd), pp. 1–10Google Scholar
  46. Qiao Y, Quinlan AR, Jazaeri AA, Verhaak RGW, Wheeler DA, Marth GT (2014) SubcloneSeeker: a computational framework for reconstructing tumor clone structure for cancer variant interpretation and prioritization. Genome Biol 15:443CrossRefPubMedPubMedCentralGoogle Scholar
  47. Rahman S, Magnussen M, León TE, Farah N, Li Z, Abraham BJ, Alapi KZ, Mitchell RJ, Naughton T, Fielding AK et al (2017) Activation of the LMO2 oncogene through a somatically acquired neomorphic promoter in T-cell acute lymphoblastic leukemia. Blood 129:3221–3226CrossRefPubMedPubMedCentralGoogle Scholar
  48. Reimand J, Bader GD (2013) Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol Syst Biol 9:637CrossRefPubMedPubMedCentralGoogle Scholar
  49. Reimand J, Wagih O, Bader GD (2013) The mutational landscape of phosphorylation signaling in cancer. Sci Rep 3:2651CrossRefPubMedPubMedCentralGoogle Scholar
  50. Rheinbay E, Parasuraman P, Grimsby J, Tiao G, Engreitz JM, Kim J, Lawrence MS, Taylor-Weiner A, Rodriguez-Cuevas S, Rosenberg M et al (2017) Recurrent and functional regulatory mutations in breast cancer. Nature 547:55–60CrossRefPubMedGoogle Scholar
  51. Ritchie GR, Dunham I, Zeggini E, Flicek P (2014) Functional annotation of noncoding sequence variants. Nat Methods 11:294–296CrossRefPubMedPubMedCentralGoogle Scholar
  52. Roth A, Khattra J, Yap D, Wan A, Laks E, Biele J, Ha G, Aparicio S, Bouchard-Côté A, Shah SP (2014) PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11:396–398CrossRefPubMedPubMedCentralGoogle Scholar
  53. Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, López-Bigas N (2016) Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532:264–267CrossRefPubMedGoogle Scholar
  54. Sanders MA, Chew E, Flensburg C, Zeilemaker A, Miller SE, al Hinai A, Bajel A, Luiken B, Rijken M, Mclennan T et al (2017) Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade driven by 5mC. bioRxiv.
  55. Schmitt MW, Loeb LA, Salk JJ (2016) The influence of subclonal resistance mutations on targeted cancer therapy. Nat Rev Clin Oncol 13:335–347CrossRefPubMedGoogle Scholar
  56. Schuster-Bockler B, Lehner B (2012) Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488:504–507CrossRefPubMedGoogle Scholar
  57. Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR (2009) Human mutation rate associated with DNA replication timing. Nat Genet 41:393–395CrossRefPubMedPubMedCentralGoogle Scholar
  58. Stratton MR, Campbell PJ, Futreal PA (2009) The cancer genome. Nature 458:719–724CrossRefPubMedPubMedCentralGoogle Scholar
  59. Supek F, Lehner B (2015) Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521:81–84CrossRefPubMedPubMedCentralGoogle Scholar
  60. Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B (2014) Synonymous mutations frequently act as driver mutations in human cancers. Cell 156:1324–1335CrossRefPubMedGoogle Scholar
  61. Tamborero D, Gonzalez-Perez A, Lopez-Bigas N (2013a) OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29:2238–2244CrossRefPubMedGoogle Scholar
  62. Tamborero D, Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Kandoth C, Reimand J, Lawrence MS, Getz G, Bader GD, Ding L et al (2013b) Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep 3:2650CrossRefPubMedPubMedCentralGoogle Scholar
  63. Tomasetti C, Marchionni L, Nowak MA, Parmigiani G, Vogelstein B (2015) Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc Natl Acad Sci U S A 112:118–123CrossRefPubMedGoogle Scholar
  64. Tomczak K, Czerwińska P, Wiznerowicz M (2015) The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol 19:A68–A77Google Scholar
  65. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW (2013) Cancer genome landscapes. Science 339:1546–1558CrossRefPubMedPubMedCentralGoogle Scholar
  66. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164–e164CrossRefPubMedPubMedCentralGoogle Scholar
  67. Waszak SM, Tiao G, Zhu B, Rausch T, Muyas F, Rodriguez-Martin B, Rabionet R, Yakneen S, Escaramis G, Li Y et al (2017) Germline determinants of the somatic mutation landscape in 2,642 cancer genomes. bioRxiv.
  68. Yates LR, Campbell PJ (2012) Evolution of the cancer genome. Nat Rev Genet 13:795–806CrossRefPubMedPubMedCentralGoogle Scholar
  69. Yates LR, Gerstung M, Knappskog S, Desmedt C, Gundem G, Van Loo P, Aas T, Alexandrov LB, Larsimont D, Davies H et al (2015) Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat Med 21:751CrossRefPubMedPubMedCentralGoogle Scholar
  70. Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., Liang, Y., Rivkin, E., Wang, J., Whitty, B., et al. (2011) International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data. Database (Oxford) 2011: bar026Google Scholar
  71. Zhang X, Choi PS, Francis JM, Imielinski M, Watanabe H, Cherniack AD, Meyerson M (2016) Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat Genet 48:176–182CrossRefPubMedGoogle Scholar
  72. Zheng CL, Wang NJ, Chung J, Moslehi H, Sanborn JZ, Hur JS, Collisson EA, Vemula SS, Naujokas A, Chiotti KE et al (2014) Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep 9:1228–1234CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© International Union for Pure and Applied Biophysics (IUPAB) and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Prince of Wales Clinical School and Lowy Cancer Research CentreUNSW SydneySydneyAustralia
  2. 2.Children’s Medical Research InstituteThe University of SydneySydneyAustralia
  3. 3.School of Biomedical Sciences, Li Ka Shing Faculty of MedicineThe University of Hong KongPok Fu LamHong Kong

Personalised recommendations