Abstract
The use of next-generation sequencing instruments to study hematological malignancies generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage, and analyze terabytes of sequencing data, often generated from extremely different data sources. Our project is mainly focused on sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection procedure of somatic mutations and the statistical testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to handle large-scale sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source software and the R language: alignment, detection of mutations, annotation, functional classification, and visualization of results. We analyzed Illumina whole-exome sequencing data from five leukemic patients and five paired controls plus one colon cancer sample and paired control. The results were validated by Sanger sequencing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ANNOVAR:
-
annotation of genetic variants
- BAM:
-
binary alignment format
- BC:
-
blast crisis
- BWA:
-
Burrows–Wheeler alignment
- BWT:
-
Burrows–Wheeler transform
- CIRCOS:
-
circular visualization of tabular data
- CML:
-
chronic myeloid leukemia
- CNV:
-
copy number variation
- DNA:
-
deoxyribonucleic acid
- IGV:
-
integrative genomics viewer
- INDEL:
-
insertion or a deletion
- IPA:
-
ingenuity pathway analysis
- LOH:
-
loss of heterozygosity
- LRT:
-
linear regression technique
- PDGFR:
-
platelet-derived growth factor receptor
- PML-RAR:
-
promyelocytic leukemia-retinoic acid receptor
- Ph:
-
Philadelphia chromosome
- R:
-
R programming language
- RTA:
-
real time analyzer
- SAM:
-
sequence alignment map
- SAMtools:
-
tools for sequence alignment maps
- SCS:
-
sequencing control software
- SIFT:
-
sorts intolerant from tolerant
- SNP:
-
single-nucleotide polymorphism
- SNV:
-
single nucleotide variant
- UCSC:
-
University of California Santa Cruz
- aCGH:
-
array-comparative genomic hybridization
- aCML:
-
atypical chronic myeloid leukemia
- ddNTP:
-
dideoxynucleotide
- gDNA:
-
genomic DNA
References
M.R. Stratton, P.J. Campbell, P.A. Futreal: The cancer genome, Nature 458(7239), 719–724 (2009)
P.J. Campbell, P.J. Stephens, E.D. Pleasance, S. OʼMeara, H. Li, T. Santarius, L.A. Stebbings, C. Leroy, S. Edkins, C. Hardy, J.W. Teague, A. Menzies, I. Goodhead, D.J. Turner, C.M. Clee, M.A. Quail, A. Cox, C. Brown, R. Durbin, M.E. Hurles, P.A. Edwards, G.R. Bignell, M.R. Stratton, P.A. Futreal: Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing, Nat. Genet. 40(6), 722–729 (2008)
S.B. Ng, E.H. Turner, P.D. Robertson, S.D. Flygare, A.W. Bigham, C. Lee, T. Shaffer, M. Wong, A. Bhattacharjee, E.E. Eichler, M. Bamshad, D.A. Nickerson, J. Shendure: Targeted capture and massively parallel sequencing of 12 human exomes, Nature 461(7261), 272–276 (2009)
M.K. Sakharkar, V.T. Chow, P. Kangueane: Distributions of exons and introns in the human genome, In Silico Biol. 4(4), 387–393 (2004)
Y. Jiao, C. Shi, B.H. Edil, R.F. de Wilde, D.S. Klimstra, A. Maitra, R.D. Schulick, L.H. Tang, C.L. Wolfgang, M.A. Choti, V.E. Velculescu, L.A. Diaz Jr., B. Vogelstein, K.W. Kinzler, R.H. Hruban, N. Papadopoulos: DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors, Science 331(6021), 1199–1203 (2011)
R Core Team: R: A Language and Enviroment for Statistical Computing (R Foundation for Statistical Computing, Vienna 2012), available online at http://www.R-project.org/
Y. Chen, C. Peng, D. Li, S. Li: Molecular and cellular bases of chronic myeloid leukemia, Protein Cell 1(2), 124–132 (2010)
S. Burgstaller, A. Reiter, N. Cross: BCR-ABL -negative chronic myeloid leukemia, Curr. Hematol. Malig. Rep. 2(2), 75–82 (2007)
S.B. Primrose, R.M. Twyman: Principles of Genome Analysis and Genomics (Blackwell, Malden 2003)
Agilent Technologies: SureSelect Human All Exon Kit Illumina Paired-End Sequencing Library Prep Protocol Version 1.0.1 (2009)
Paired-End Sequencing Sample Preparation Guide, http://www.illumina.com
P.J. Cock, C.J. Fields, N. Goto, M.L. Heuer, P.M. Rice: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucleic Acids Res. 38(6), 1767–1771 (2010)
P.J.A. Cock, T. Antao, J.T. Chang, B.A. Chapman, C.J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, B. Wilczynski, M.J.L. de Hoon: Biopython: Freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics 25(11), 1422–1423 (2009)
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 Genome Project Data Processing Subgroup: The sequence alignment/map format and SAMtools, Bioinformatics 25(16), 2078–2079 (2009)
P. Kumar, S. Henikoff, P.C. Ng: Predicting the effects of coding nonsynonymous variants on protein function using the SIFT algorithm, Nat. Protoc. 4(7), 1073–1081 (2009)
J.T. Robinson, H. Thorvaldsdottir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov: Integrative genomics viewer, Nat. Biotechnol. 29, 24–26 (2011)
B. Ewing, P. Green: Base-calling of automated sequencer traces using Phred. II. Error probabilities, Genome Res. 8(3), 186–194 (1998)
H. Li, R. Durbin: Fast and accurate long-read alignment with Burrows–Wheeler transform, Bioinformatics 26(5), 589–595 (2010)
K. Wang, M. Li, H. Hakonarson: ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data, Nucleic Acids Res. 38, e164 (2010), available online at http://www.http://www.openbioinformatics.org/annovar/
K. Wang, M. Li, H. Hakonarson: ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data, Nucleic Acids Res. 38(16), e164 (2010)
Ingenuity Systems: http://www.ingenuity.com/ (Ingenuity Systems, Inc., Redwood City)
S. Chun, J.C. Fay: Identification of deleterious mutations within three human genomes, Genome Res. 19(9), 1553–1561 (2009)
X. Liu, X. Jian, E. Boerwinkle: dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions, Hum. Mutat. 32(8), 894–899 (2011)
M.N. Edmonson, J. Zhang, C. Yan, R.P. Finney, D.M. Meerzaman, K.H. Buetow: Bambino: A variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format, Bioinformatics 27(6), 865–866 (2011)
M.E. Sana, M. Iascone, D. Marchetti, J. Palatini, M. Galasso, S. Volinia: GAMES identifies and annotates mutations in next-generation sequencing projects, Bioinformatics 27(1), 9–13 (2011)
M. Krzywinski, J.E. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, M.A. Marra: Circos: An information aesthetic for comparative genomics, Genome Res. 19, 1639–1645 (2009), available online at http://mkweb.bcgsc.ca/circos/intro/genomic_data/
A. Barla, G. Jurman, R. Visintainer, M. Squillario, M. Filosi, S. Riccadonna, C. Furlanello: A machine learning pipeline for discriminant pathways identification, Proc. 8th Int. Meet. Comput. Intell. Methods Bioinf. Biostat., Gargnano (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag
About this chapter
Cite this chapter
Spinelli, R. et al. (2014). Whole-Exome Sequencing Data – Identifying Somatic Mutations. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-30574-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30573-3
Online ISBN: 978-3-642-30574-0
eBook Packages: EngineeringEngineering (R0)