Skip to main content

Whole-Exome Sequencing Data – Identifying Somatic Mutations

  • Chapter
Springer Handbook of Bio-/Neuroinformatics

Abstract

The use of next-generation sequencing instruments to study hematological malignancies generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage, and analyze terabytes of sequencing data, often generated from extremely different data sources. Our project is mainly focused on sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection procedure of somatic mutations and the statistical testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to handle large-scale sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source software and the R language: alignment, detection of mutations, annotation, functional classification, and visualization of results. We analyzed Illumina whole-exome sequencing data from five leukemic patients and five paired controls plus one colon cancer sample and paired control. The results were validated by Sanger sequencing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ANNOVAR:

annotation of genetic variants

BAM:

binary alignment format

BC:

blast crisis

BWA:

Burrows–Wheeler alignment

BWT:

Burrows–Wheeler transform

CIRCOS:

circular visualization of tabular data

CML:

chronic myeloid leukemia

CNV:

copy number variation

DNA:

deoxyribonucleic acid

IGV:

integrative genomics viewer

INDEL:

insertion or a deletion

IPA:

ingenuity pathway analysis

LOH:

loss of heterozygosity

LRT:

linear regression technique

PDGFR:

platelet-derived growth factor receptor

PML-RAR:

promyelocytic leukemia-retinoic acid receptor

Ph:

Philadelphia chromosome

R:

R programming language

RTA:

real time analyzer

SAM:

sequence alignment map

SAMtools:

tools for sequence alignment maps

SCS:

sequencing control software

SIFT:

sorts intolerant from tolerant

SNP:

single-nucleotide polymorphism

SNV:

single nucleotide variant

UCSC:

University of California Santa Cruz

aCGH:

array-comparative genomic hybridization

aCML:

atypical chronic myeloid leukemia

ddNTP:

dideoxynucleotide

gDNA:

genomic DNA

References

  1. M.R. Stratton, P.J. Campbell, P.A. Futreal: The cancer genome, Nature 458(7239), 719–724 (2009)

    Article  Google Scholar 

  2. P.J. Campbell, P.J. Stephens, E.D. Pleasance, S. OʼMeara, H. Li, T. Santarius, L.A. Stebbings, C. Leroy, S. Edkins, C. Hardy, J.W. Teague, A. Menzies, I. Goodhead, D.J. Turner, C.M. Clee, M.A. Quail, A. Cox, C. Brown, R. Durbin, M.E. Hurles, P.A. Edwards, G.R. Bignell, M.R. Stratton, P.A. Futreal: Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing, Nat. Genet. 40(6), 722–729 (2008)

    Article  Google Scholar 

  3. S.B. Ng, E.H. Turner, P.D. Robertson, S.D. Flygare, A.W. Bigham, C. Lee, T. Shaffer, M. Wong, A. Bhattacharjee, E.E. Eichler, M. Bamshad, D.A. Nickerson, J. Shendure: Targeted capture and massively parallel sequencing of 12 human exomes, Nature 461(7261), 272–276 (2009)

    Article  Google Scholar 

  4. M.K. Sakharkar, V.T. Chow, P. Kangueane: Distributions of exons and introns in the human genome, In Silico Biol. 4(4), 387–393 (2004)

    Google Scholar 

  5. Y. Jiao, C. Shi, B.H. Edil, R.F. de Wilde, D.S. Klimstra, A. Maitra, R.D. Schulick, L.H. Tang, C.L. Wolfgang, M.A. Choti, V.E. Velculescu, L.A. Diaz Jr., B. Vogelstein, K.W. Kinzler, R.H. Hruban, N. Papadopoulos: DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors, Science 331(6021), 1199–1203 (2011)

    Article  Google Scholar 

  6. R Core Team: R: A Language and Enviroment for Statistical Computing (R Foundation for Statistical Computing, Vienna 2012), available online at http://www.R-project.org/

    Google Scholar 

  7. Y. Chen, C. Peng, D. Li, S. Li: Molecular and cellular bases of chronic myeloid leukemia, Protein Cell 1(2), 124–132 (2010)

    Article  Google Scholar 

  8. S. Burgstaller, A. Reiter, N. Cross: BCR-ABL -negative chronic myeloid leukemia, Curr. Hematol. Malig. Rep. 2(2), 75–82 (2007)

    Article  Google Scholar 

  9. S.B. Primrose, R.M. Twyman: Principles of Genome Analysis and Genomics (Blackwell, Malden 2003)

    Google Scholar 

  10. Agilent Technologies: SureSelect Human All Exon Kit Illumina Paired-End Sequencing Library Prep Protocol Version 1.0.1 (2009)

    Google Scholar 

  11. Paired-End Sequencing Sample Preparation Guide, http://www.illumina.com

  12. P.J. Cock, C.J. Fields, N. Goto, M.L. Heuer, P.M. Rice: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucleic Acids Res. 38(6), 1767–1771 (2010)

    Article  Google Scholar 

  13. P.J.A. Cock, T. Antao, J.T. Chang, B.A. Chapman, C.J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M.J.L. de Hoon: Biopython: Freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics 25(11), 1422–1423 (2009)

    Article  Google Scholar 

  14. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 Genome Project Data Processing Subgroup: The sequence alignment/map format and SAMtools, Bioinformatics 25(16), 2078–2079 (2009)

    Article  Google Scholar 

  15. P. Kumar, S. Henikoff, P.C. Ng: Predicting the effects of coding nonsynonymous variants on protein function using the SIFT algorithm, Nat. Protoc. 4(7), 1073–1081 (2009)

    Article  Google Scholar 

  16. J.T. Robinson, H. Thorvaldsdottir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov: Integrative genomics viewer, Nat. Biotechnol. 29, 24–26 (2011)

    Article  Google Scholar 

  17. B. Ewing, P. Green: Base-calling of automated sequencer traces using Phred. II. Error probabilities, Genome Res. 8(3), 186–194 (1998)

    Article  Google Scholar 

  18. H. Li, R. Durbin: Fast and accurate long-read alignment with Burrows–Wheeler transform, Bioinformatics 26(5), 589–595 (2010)

    Article  Google Scholar 

  19. K. Wang, M. Li, H. Hakonarson: ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data, Nucleic Acids Res. 38, e164 (2010), available online at http://www.http://www.openbioinformatics.org/annovar/

    Article  Google Scholar 

  20. K. Wang, M. Li, H. Hakonarson: ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res. 38(16), e164 (2010)

    Article  Google Scholar 

  21. Ingenuity Systems: http://www.ingenuity.com/ (Ingenuity Systems, Inc., Redwood City)

  22. S. Chun, J.C. Fay: Identification of deleterious mutations within three human genomes, Genome Res. 19(9), 1553–1561 (2009)

    Article  Google Scholar 

  23. X. Liu, X. Jian, E. Boerwinkle: dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions, Hum. Mutat. 32(8), 894–899 (2011)

    Article  Google Scholar 

  24. M.N. Edmonson, J. Zhang, C. Yan, R.P. Finney, D.M. Meerzaman, K.H. Buetow: Bambino: A variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format, Bioinformatics 27(6), 865–866 (2011)

    Article  Google Scholar 

  25. M.E. Sana, M. Iascone, D. Marchetti, J. Palatini, M. Galasso, S. Volinia: GAMES identifies and annotates mutations in next-generation sequencing projects, Bioinformatics 27(1), 9–13 (2011)

    Article  Google Scholar 

  26. M. Krzywinski, J.E. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, M.A. Marra: Circos: An information aesthetic for comparative genomics, Genome Res. 19, 1639–1645 (2009), available online at http://mkweb.bcgsc.ca/circos/intro/genomic_data/

    Article  Google Scholar 

  27. A. Barla, G. Jurman, R. Visintainer, M. Squillario, M. Filosi, S. Riccadonna, C. Furlanello: A machine learning pipeline for discriminant pathways identification, Proc. 8th Int. Meet. Comput. Intell. Methods Bioinf. Biostat., Gargnano (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Roberta Spinelli , Rocco Piazza , Alessandra Pirola , Simona Valletta , Roberta Rostagno , Angela Mogavero , Manuela Marega , Hima Raman or Carlo Gambacorti-Passerini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag

About this chapter

Cite this chapter

Spinelli, R. et al. (2014). Whole-Exome Sequencing Data – Identifying Somatic Mutations. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30574-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30573-3

  • Online ISBN: 978-3-642-30574-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics