Skip to main content

Many-Core Processor Bioinformatics and Next-Generation Sequencing

  • Conference paper
Book cover IT Revolutions (IT Revolutions 2011)

Abstract

The new massive DNA sequencing methods demand both computer hardware and bioinformatics software capable of handling huge amounts of data. This paper shows how the many-core processors (in which each core can execute a whole operating system) can be exploited to address problems which previously required expensive supercomputers. Thus, the Needleman-Wunsch/Smith-Waterman pairwise alignments will be described using long DNA sequences (>100 kb), including the implications for progressive multiple alignments. Likewise, assembling algorithms used to generate contigs on sequencing projects (therefore, using short sequences) and the future in peptide (protein) folding computing methods will be also described. Our study also integrates the last trends in many-core processors and their applications in the field of bioinformatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gálvez, S., et al.: Next-Generation Bioinformatics: Using Many-Core Processor Architecture to Develop a Web Service for Sequence Alignment. Bioinformatics 26(5), 683–686 (2010)

    Article  Google Scholar 

  2. Castillo, A., et al.: Genomic approaches for olive oil quality control. In: Plant Genomics European Meetings (Plant GEM 6), Tenerife, Spain (2007)

    Google Scholar 

  3. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  4. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  5. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)

    Article  Google Scholar 

  6. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. Driga, A., et al.: FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment. Algorithmica 45(3), 337–375 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids Res. 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  9. Larkin, M.A., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)

    Article  Google Scholar 

  10. Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)

    Article  Google Scholar 

  11. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)

    Google Scholar 

  12. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification (1973)

    Google Scholar 

  13. Pop, M., Salzberg, S.L., Shumway, M.: Genome sequence assembly: Algorithms and issues. Computer 35(7), 47–48 (2002)

    Article  Google Scholar 

  14. Sutton, G.G., et al.: TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects. Genome Science & Technology 1(1), 11 (1995)

    Article  MathSciNet  Google Scholar 

  15. Green, P.: Phrap Documentation: Algorithms. Phred/Phrap/Consed System Home Page (2002), http://www.phrap.org (cited October 31, 2010)

  16. Huang, X., Madan, A.: CAP3: A DNA sequence assembly program. Genome. Res. 9(9), 868–877 (1999)

    Article  Google Scholar 

  17. De Bruijn, N.G.: A Combinational Problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49, 758–764 (1946)

    MathSciNet  MATH  Google Scholar 

  18. Pevzner, P.A., Tang, H.X., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98(17), 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome. Res. 18(5), 821–829 (2008)

    Article  Google Scholar 

  20. Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome. Res. 18(2), 324–330 (2008)

    Article  Google Scholar 

  21. Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome. Res. 18(5), 810–820 (2008)

    Article  Google Scholar 

  22. Warren, R.L., et al.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)

    Article  Google Scholar 

  23. Jeck, W.R., et al.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)

    Article  Google Scholar 

  24. Dohm, J.C., et al.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome. Res. 17(11), 1697–1706 (2007)

    Article  Google Scholar 

  25. Hernandez, D., et al.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome. Res. 18(5), 802–809 (2008)

    Article  Google Scholar 

  26. Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome. Res. 19(6), 1117–1123 (2009)

    Article  Google Scholar 

  27. Shirts, M., Pande, V.S.: COMPUTING: Screen Savers of the World Unite! Science 290(5498), 1903–1904 (2000)

    Article  Google Scholar 

  28. Marianayagam, N.J., Fawzi, N.L., Head-Gordon, T.: Protein folding by distributed computing and the denatured state ensemble. Proc. Natl. Acad. Sci. USA 102(46), 16684–16689 (2005)

    Article  Google Scholar 

  29. Ding, F., et al.: Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms. RNA 14(6), 1164–1173 (2008)

    Article  Google Scholar 

  30. Ding, F., et al.: Ab initio folding of proteins with all-atom discrete molecular dynamics. Structure 16(7), 1010–1018 (2008)

    Article  Google Scholar 

  31. Shah, A.A., et al.: Parallel and Distributed Processing with Applications. In: Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications, pp. 817–822 (2008)

    Google Scholar 

  32. Shah, A.A., Barthel, D., Krasnogor, N.: Grid and Distributed Public Computing Schemes for Structural Proteomics: A Short Overview. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds.) ISPA Workshops 2007. LNCS, vol. 4743, pp. 424–434. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  33. Intel, The SCC Platform Overview (2010), Web: http://techresearch.intel.com/spaw2/uploads/files/SCC-Overview.pdf (cited October 31, 2010)

  34. Intel, Intel’s Teraflops Research Chip (2010), Web: http://download.intel.com/pressroom/kits/Teraflops/Teraflops_Research_Chip_Overview.pdf (cited October 31, 2010)

  35. Brookwood, N.: AMD FusionTM Family of APUs: Enabling a Superior, Immersive PC Experience (2010), Web: http://sites.amd.com/us/Documents/48423B_fusion_whitepaper_WEB.pdf (cited October 31, 2010)

  36. nVidia, Tesla C2050 and Tesla C2070 Computing Processor Board Specification (2010), Web: http://www.nvidia.com/docs/IO/43395/BD-04983-001_v04.pdf (cited October 31, 2010)

  37. nVidia, GeForce GTX 580 Specification (2010), Web: http://www.geforce.com/#/Hardware/GPUs/geforce-gtx-580/specifications (cited October 31, 2010)

  38. Tilera, Tile-Gx Processor Family Product Brief, Web: http://www.tilera.com/sites/default/files/productbriefs/PB025_TILE-Gx_Processor_A_v3.pdf (cited October 31, 2010)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Esteban, F.J., Díaz, D., Hernández, P., Caballero, J.A., Dorado, G., Gálvez, S. (2012). Many-Core Processor Bioinformatics and Next-Generation Sequencing. In: Liñán Reyes, M., Flores Arias, J.M., González de la Rosa, J.J., Langer, J., Bellido Outeiriño, F.J., Moreno-Munñoz, A. (eds) IT Revolutions. IT Revolutions 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 82. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32304-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32304-1_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32303-4

  • Online ISBN: 978-3-642-32304-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics