Bioinformatics for Precision Medicine in Oncology

  • Nicolas Servant
  • Philippe Hupé


The availability of high-throughput technologies and their application makes them very attractive for cancer centers offering new opportunities through new clinical tools for daily practice. However, establishing such a clinical facility is not a trivial task due to the complexity of PM framework along with the overwhelming amount of data. From the data management perspective, data integration issue (i.e., merging heterogeneous data in a seamless information system) in oncology can be formulated as follows: a large volume of data is disseminated across a large variety of databases which increase in size at a huge velocity.

Several challenges should be faced up at different levels: (1) the technical level to develop an adequate computational architecture (software/hardware); (2) the organizational and management levels to define the procedures to collect data with highest confidence, quality, and traceability; (3) the scientific level to create sophisticated bioinformatics workflows and statistical models to analyze the data and correlate them with the evolution of the disease and risks to the patient; and (4) the reporting level to allow the query, the easy retrieval, and the reporting of any data that might be useful for therapeutic decision in real time, therefore allowing clinicians to propose the tailored therapy to the patient in the shortest delay.

Obviously, an efficient informatics and bioinformatics architecture is definitely needed to support PM in order to record, manage, and analyze all the information collected (Simon and Roychowdhury 2013). The following chapter presents the different bioinformatics solutions implemented in order to tackle these challenges. The key points of each part will be detailed offering an overview of these solutions.


Reference Genome Somatic Mutation Mapping Quality Sequencing Depth Precision Medicine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Albers CA, Lunter G, MacArthur DG et al (2011) Dindel: accurate indel calls from short-read data. Genome Res 21(6):961–973. doi: 10.1101/gr.112326.110.arXiv:10040887v1 PubMedCentralCrossRefPubMedGoogle Scholar
  2. Athey BD, Braxenthaler M, Haas M et al (2013) tranSMART: an open source, and community-driven informatics, and data sharing platform for clinical, and translational research. AMIA Jt Summits Transl Sci Proc 2013:6–8PubMedCentralPubMedGoogle Scholar
  3. Boeva V, Zinovyev A, Bleakley K et al (2011) Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 27(2):268–269PubMedCentralCrossRefPubMedGoogle Scholar
  4. Burrows M, Wheeler DJ (1994) A block sorting lossless data compression algorithm. Technical report 124. Digital Equipment Corporation, Palo AltoGoogle Scholar
  5. Canuel V, Rance B, Avillach P et al (2014) Translational research platforms integrating clinical, and omics data: a review of publicly available solutions. Brief Bioinformat. doi: 10.1093/bib/bbu006 Google Scholar
  6. Cerami E, Gao J, Dogrusoz U et al (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2:401–404. doi: 10.1158/2159-8290.CD-12-0095 CrossRefPubMedGoogle Scholar
  7. Cingolani P, Platts A, Wang-le L et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6(2):80–92PubMedCentralCrossRefPubMedGoogle Scholar
  8. Dawson MA, Kouzarides T (2012) Cancer epigenetics: from mechanism to therapy. Cell 150(1):12–27. doi: 10.1016/j.cell.2012.06.013 CrossRefPubMedGoogle Scholar
  9. Downing GJ, Boyle SN, Brinner KM et al (2009) Information management to enable personalized medicine: stakeholder roles in building clinical decision support. BMC Med Inform Decis Mak 9:44. doi: 10.1186/1472-6947-9-44 PubMedCentralCrossRefPubMedGoogle Scholar
  10. Durbin RM, Abecasis GR, Altshuler RM et al (2010) A map of human genome variation from population-scale sequencing. Nature 467(7319):1061–1073CrossRefPubMedGoogle Scholar
  11. Fernald GH, Capriotti E, Daneshjou R et al (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27:1741–1748. doi: 10.1093/bioinformatics/btr295 PubMedCentralCrossRefPubMedGoogle Scholar
  12. Houdayer C, Caux-Moncoutier V, Krieger S et al (2012) Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat 33(8):1228–1238. doi: 10.1002/humu.22101 CrossRefPubMedGoogle Scholar
  13. Hupé P, Stransky N, Thiery J-P, Radvanyi F, Barillot E (2004) Analysis of array CGH: data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20:3413–3422. doi: 10.1093/bioinformatics/bth418 CrossRefPubMedGoogle Scholar
  14. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with bowtie 2. Nat Methods 9(4):357–359. doi: 10.1038/nmeth.1923 PubMedCentralCrossRefPubMedGoogle Scholar
  15. Langmead B, Trapnell C, Pop M et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. doi: 10.1186/gb-2009-10-3-r25 PubMedCentralCrossRefPubMedGoogle Scholar
  16. Li M, Nordborg M, Li LM (2004) Adjust quality scores from alignment and improve sequencing accuracy. Nucleic Acids Res 32(17):5183–5191PubMedCentralCrossRefPubMedGoogle Scholar
  17. Li H, Handsaker B, Wysoker A et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. doi: 10.1093/bioinformatics/btp352 PubMedCentralCrossRefPubMedGoogle Scholar
  18. Madhavan S, Gusev Y, Harris MA (2011) G-CODE: enabling systems medicine through innovative informatics. Genome Biol 12(Suppl 1):P38. doi: 10.1186/gb-2011-12-s1-p38 PubMedCentralCrossRefGoogle Scholar
  19. Marco-Sola S, Sammeth M, Guigo R et al (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Method 9:1185–1188. doi: 10.1038/nmeth.2221 CrossRefGoogle Scholar
  20. McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303PubMedCentralCrossRefPubMedGoogle Scholar
  21. O’Rawe J, Jiang T, Sun G et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5:28. doi: 10.1186/gm432 PubMedCentralCrossRefPubMedGoogle Scholar
  22. Popova T, Manié E, Stoppa-Lyonnet D, Rigaill G, Barillot E, Stern MH (2009) Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome Biol 10:R128. doi: 10.1186/gb-2009-10-11-r128 PubMedCentralCrossRefPubMedGoogle Scholar
  23. Ramos AH, Lichtenstein L, Gupta M et al (2015) Oncotator: cancer variant annotation tool. Hum Mutat 36:E2423–E2429, Scholar
  24. Rigaill, G. (2010). Pruned dynamic programming for optimal multiple change-point detection. ArXiv e-prints, (May):9Google Scholar
  25. Servant N, Roméjon J, Gestraud P et al (2014) Bioinformatics for precision medicine in oncology: principles and application to the SHIVA clinical trial. Front Genet 5:152. doi: 10.3389/fgene.2014.00152 PubMedCentralCrossRefPubMedGoogle Scholar
  26. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311PubMedCentralCrossRefPubMedGoogle Scholar
  27. Simon R, Roychowdhury S (2013) Implementing personalized cancer genomics in clinical trials. Nat Rev Drug Discov 12:358–369. doi: 10.1038/nrd3979 CrossRefPubMedGoogle Scholar
  28. Tan R, Wang Y, Kleinstein SE et al (2014) An evaluation of copy number variation detection tools from whole-exome sequencing data. Hum Mutat 35(7):899–907. doi: 10.1002/humu.22537 CrossRefPubMedGoogle Scholar
  29. Timp W, Feinberg AP (2013) Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat Rev Cancer 13:497–510. doi: 10.1038/nrc3486 CrossRefPubMedGoogle Scholar
  30. Van der Auwera GA, Carneiro M, Hartl C et al (2013) From FastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 43:11.10.1–11.10.33Google Scholar
  31. Veltman JA, Cuppen E, Vrijenhoek T (2013) Challenges for implementing next-generation sequencing-based genome diagnostics: it’s also the people, not just the machines. Personal Med 10:473–484. doi: 10.2217/pme.13.41 CrossRefGoogle Scholar
  32. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164. doi: 10.1093/nar/gkq603 PubMedCentralCrossRefPubMedGoogle Scholar
  33. Zeitouni B, Boeva V, Janoueix-Lerosey I et al (2010) SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics 26:1895–1896. doi: 10.1093/bioinformatics/btq293 PubMedCentralCrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Bioinformatics PlatformInstitut CurieParisFrance
  2. 2.Unité INSERM/Institut Curie U900ParisFrance

Personalised recommendations