Skip to main content

A Bioinformatics Procedure to Identify and Annotate Somatic Mutations in Whole-Exome Sequencing Data

  • Conference paper
  • 1211 Accesses

Part of the Lecture Notes in Computer Science book series (LNBI,volume 7548)

Abstract

The application of next-generation sequencing instruments generates a tremendous amount of sequencing data. This leads to a challenging bioinformatics problem to store, manage and analyze terabytes of sequencing data often generated from extremely different data-sources. Our project is mainly focused on the sequence analysis of human cancer genomes, in order to identify the genetic lesions underlying the development of tumors. However, the automated detection procedure of somatic mutations and a statistical based testing procedure to identify genetic lesions are still an open problem. Therefore, we propose a computational procedure to manage large scale sequencing data in order to detect exonic somatic mutations in a tumor sample. The proposed pipeline includes several steps based on open-source softwares and R language: alignment, detection of mutations, annotation, functional classification and visualization of results. We analyzed whole exome sequencing data from 3 leukemic patients and 3 paired controls plus 1 colon cancer sample and paired control. The results were validated by Sanger sequencing.

Keywords

  • next-generation sequencing
  • computational procedure
  • somatic mutations
  • leukemia
  • colon cancer

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   72.00
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Campbell, P.J., Stephens, P.J., Pleasance, E.D., O’Meara, S., Li, H., Santarius, T., Stebbings, L.A., Leroy, C., Edkins, S., Hardy, C., Teague, J.W., Menzies, A., Goodhead, I., Turner, D.J., Clee, C.M., Quail, M.A., Cox, A., Brown, C., Durbin, R., Hurles, M.E., Edwards, P.A., Bignell, G.R., Stratton, M.R., Futreal, P.A.: Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 40(6), 722–729 (2008)

    CrossRef  Google Scholar 

  2. Jiao, Y., Shi, C., Edil, B.H., de Wilde, R.F., Klimstra, D.S., Maitra, A., Schulick, R.D., Tang, L.H., Wolfgang, C.L., Choti, M.A., Velculescu, V.E., Diaz Jr., L.A., Vogelstein, B., Kinzler, K.W., Hruban, R.H., Papadopoulos, N.: DAXX/ATRX, MEN1, and mTOR pathway genes are frequently altered in pancreatic neuroendocrine tumors. Science 331(6021), 1199–1203 (2011)

    CrossRef  Google Scholar 

  3. SureSelect Human All Exon Kit Illumina Paired-End Sequencing Library Prep Protocol Version 1.0.1, Agilent Technologies (October 2009)

    Google Scholar 

  4. Paired-End Sequencing Sample Preparation Guide.pdf, http://www.illumina.com

  5. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16), 2078–2079 (2009)

    Google Scholar 

  6. Ewing, B., Green, P.: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8(3), 186–194 (1998)

    Google Scholar 

  7. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E.E., Bamshad, M., Nickerson, D.A., Shendure, J.: Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461(7261), 272–276 (2009)

    CrossRef  Google Scholar 

  8. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010)

    CrossRef  Google Scholar 

  9. Kumar, P., Henikoff, S., Ng, P.C.: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4(7), 1073–1081 (2009)

    CrossRef  Google Scholar 

  10. Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)

    Google Scholar 

  11. Liu, X., Jian, X., Boerwinkle, E.: dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32(8), 894–899 (2011)

    CrossRef  Google Scholar 

  12. Chun, S., Fay, J.C.: Identification of deleterious mutations within three human genomes. Genome Res. 19(9), 1553–1561 (2009)

    CrossRef  Google Scholar 

  13. Edmonson, M.N., Zhang, J., Yan, C., Finney, R.P., Meerzaman, D.M., Buetow, K.H.: Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27(6), 865–866 (2011)

    CrossRef  Google Scholar 

  14. Sana, M.E., Iascone, M., Marchetti, D., Palatini, J., Galasso, M., Volinia, S.: GAMES identifies and annotates mutations in next-generation sequencing projects. Bioinformatics 27(1), 9–13 (2011)

    CrossRef  Google Scholar 

  15. Barla, A., Jurman, G., Visintainer, R., Squillario, M., Filosi, M., Riccadonna, S., Furlanello, C.: A machine learning pipeline for discriminant pathways identification. In: Proceedings CIBB 2011, 8th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, Gargnano, Italy (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Spinelli, R. et al. (2012). A Bioinformatics Procedure to Identify and Annotate Somatic Mutations in Whole-Exome Sequencing Data. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2011. Lecture Notes in Computer Science(), vol 7548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35686-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35686-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35685-8

  • Online ISBN: 978-3-642-35686-5

  • eBook Packages: Computer ScienceComputer Science (R0)