Abstract
The new massive DNA sequencing methods demand both computer hardware and bioinformatics software capable of handling huge amounts of data. This paper shows how the many-core processors (in which each core can execute a whole operating system) can be exploited to address problems which previously required expensive supercomputers. Thus, the Needleman-Wunsch/Smith-Waterman pairwise alignments will be described using long DNA sequences (>100 kb), including the implications for progressive multiple alignments. Likewise, assembling algorithms used to generate contigs on sequencing projects (therefore, using short sequences) and the future in peptide (protein) folding computing methods will be also described. Our study also integrates the last trends in many-core processors and their applications in the field of bioinformatics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gálvez, S., et al.: Next-Generation Bioinformatics: Using Many-Core Processor Architecture to Develop a Web Service for Sequence Alignment. Bioinformatics 26(5), 683–686 (2010)
Castillo, A., et al.: Genomic approaches for olive oil quality control. In: Plant Genomics European Meetings (Plant GEM 6), Tenerife, Spain (2007)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Driga, A., et al.: FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment. Algorithmica 45(3), 337–375 (2006)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids Res. 22(22), 4673–4680 (1994)
Larkin, M.A., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification (1973)
Pop, M., Salzberg, S.L., Shumway, M.: Genome sequence assembly: Algorithms and issues. Computer 35(7), 47–48 (2002)
Sutton, G.G., et al.: TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects. Genome Science & Technology 1(1), 11 (1995)
Green, P.: Phrap Documentation: Algorithms. Phred/Phrap/Consed System Home Page (2002), http://www.phrap.org (cited October 31, 2010)
Huang, X., Madan, A.: CAP3: A DNA sequence assembly program. Genome. Res. 9(9), 868–877 (1999)
De Bruijn, N.G.: A Combinational Problem. Koninklijke Nederlandse Akademie v. Wetenschappen 49, 758–764 (1946)
Pevzner, P.A., Tang, H.X., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America 98(17), 9748–9753 (2001)
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome. Res. 18(5), 821–829 (2008)
Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome. Res. 18(2), 324–330 (2008)
Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome. Res. 18(5), 810–820 (2008)
Warren, R.L., et al.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)
Jeck, W.R., et al.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)
Dohm, J.C., et al.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome. Res. 17(11), 1697–1706 (2007)
Hernandez, D., et al.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome. Res. 18(5), 802–809 (2008)
Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome. Res. 19(6), 1117–1123 (2009)
Shirts, M., Pande, V.S.: COMPUTING: Screen Savers of the World Unite! Science 290(5498), 1903–1904 (2000)
Marianayagam, N.J., Fawzi, N.L., Head-Gordon, T.: Protein folding by distributed computing and the denatured state ensemble. Proc. Natl. Acad. Sci. USA 102(46), 16684–16689 (2005)
Ding, F., et al.: Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms. RNA 14(6), 1164–1173 (2008)
Ding, F., et al.: Ab initio folding of proteins with all-atom discrete molecular dynamics. Structure 16(7), 1010–1018 (2008)
Shah, A.A., et al.: Parallel and Distributed Processing with Applications. In: Proceedings of the 2008 International Symposium on Parallel and Distributed Processing with Applications, pp. 817–822 (2008)
Shah, A.A., Barthel, D., Krasnogor, N.: Grid and Distributed Public Computing Schemes for Structural Proteomics: A Short Overview. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds.) ISPA Workshops 2007. LNCS, vol. 4743, pp. 424–434. Springer, Heidelberg (2007)
Intel, The SCC Platform Overview (2010), Web: http://techresearch.intel.com/spaw2/uploads/files/SCC-Overview.pdf (cited October 31, 2010)
Intel, Intel’s Teraflops Research Chip (2010), Web: http://download.intel.com/pressroom/kits/Teraflops/Teraflops_Research_Chip_Overview.pdf (cited October 31, 2010)
Brookwood, N.: AMD FusionTM Family of APUs: Enabling a Superior, Immersive PC Experience (2010), Web: http://sites.amd.com/us/Documents/48423B_fusion_whitepaper_WEB.pdf (cited October 31, 2010)
nVidia, Tesla C2050 and Tesla C2070 Computing Processor Board Specification (2010), Web: http://www.nvidia.com/docs/IO/43395/BD-04983-001_v04.pdf (cited October 31, 2010)
nVidia, GeForce GTX 580 Specification (2010), Web: http://www.geforce.com/#/Hardware/GPUs/geforce-gtx-580/specifications (cited October 31, 2010)
Tilera, Tile-Gx Processor Family Product Brief, Web: http://www.tilera.com/sites/default/files/productbriefs/PB025_TILE-Gx_Processor_A_v3.pdf (cited October 31, 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Esteban, F.J., Díaz, D., Hernández, P., Caballero, J.A., Dorado, G., Gálvez, S. (2012). Many-Core Processor Bioinformatics and Next-Generation Sequencing. In: Liñán Reyes, M., Flores Arias, J.M., González de la Rosa, J.J., Langer, J., Bellido Outeiriño, F.J., Moreno-Munñoz, A. (eds) IT Revolutions. IT Revolutions 2011. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 82. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32304-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-32304-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32303-4
Online ISBN: 978-3-642-32304-1
eBook Packages: Computer ScienceComputer Science (R0)