Bioinformatics Applications on the FPGA-Based High-Performance Computer RIVYERA

Chapter

Abstract

Sequence alignment is one of the most popular application areas in bioinformatics. Nowadays, the exponential growth of biological sequence data becomes a severe problem if processed on standard general purpose PCs. Tackling this problem with large computing clusters is a widely accepted solution, although acquaintance and maintenance as well as space and energy requirements introduce significant costs. However, this chapter shows that this problem can be addressed by harnessing the high-performance computing platform RIVYERA, based on reconfigurable hardware (in particular FPGAs). The implementations of three examples of widely used applications in this area in bioinformatics are described: optimal sequence alignment with the Needleman–Wunsch and Smith–Waterman algorithm, protein database search with BLASTp, and short-read sequence alignment with a BWA-like algorithm. The results show a clear outperformance of standard PCs and GPU systems as well as energy savings of more than 90% compared to PC clusters, combined with the space requirements for one RIVYERA of only 3U–4U in a standard server rack.

Keywords

Mercury Prefix Populus Harness Suffix 

References

  1. 1.
    S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)Google Scholar
  2. 2.
    S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    M. Burrows, D.J. Wheeler, A block-sorting lossless data compression algorithm. Tech. rep., Digital Systems Research Center, Palo Alto, CA (1994)Google Scholar
  4. 4.
    CLCbio – High-Speed Smith–Waterman (2012), http://www.clcbio.com/index.php?id=1254. Accessed March 2012
  5. 5.
    CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform (2011), http://cushaw.sourceforge.net/. Accessed March 2012
  6. 6.
    M.S. Farrar, Optimizing Smith–Waterman for the cell broadband engine (2010), http://sites.google.com/site/farrarmichael/smith-watermanfortheibmcellbe. Accessed March 2012
  7. 7.
    P. Ferragina, G. Manzini, Opportunistic data structures with applications, in Proceedings of FOCS2000 (2000), IEEE Computer Society, Washington DC, USA, pp. 390–398Google Scholar
  8. 8.
    N. Homer, B. Merriman, S.F. Nelson, Bfast: an alignment tool for large scale genome resequencing. PLoS ONE 4(11), 12 (2009). http://www.ncbi.nlm.nih.gov/pubmed/19907642
  9. 9.
    A. Jacob, J. Lancaster, J. Buhler, B. Harris, R.D. Chamberlain, Mercury BLASTp: accelerating protein sequence alignment. ACM Trans. Reconfigurable Tech. Syst. 1, 9:1–9:44 (2008)Google Scholar
  10. 10.
    S. Kasap, K. Benkrid, Y. Liu, Design and implementation of an FPGA-based core for gapped BLAST sequence alignment with the two-hit method. Eng. Lett. 16, 443–452 (2008)Google Scholar
  11. 11.
    P. Klus, S. Lam, D. Lyberg, M. Cheung, G. Pullan, I. McFarlane, G. Yeo, B. Lam, Barracuda - a fast short read sequence aligner using graphics processing units. BMC Res. Notes 5(1), 27 (2012). doi:10.1186/1756-0500-5-27Google Scholar
  12. 12.
    S. Kumar, C. Paar, J. Pelzl, G. Pfeiffer, A. Rupp, M. Schimmler, How to break DES for € 8,980, in SHARCS2006, Cologne, Germany (2006)Google Scholar
  13. 13.
    B. Langmead, C. Trapnell, M. Pop, S. Salzberg, Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 10(3), R25 (2009). doi:10.1186/gb-2009-10-3-r25, http://genomebiology.com/2009/10/3/R25
  14. 14.
    H. Li, R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics (Oxford, England) 25(14), 1754–1760 (2009). doi:10.1093/bioinformatics/btp324, http://dx.doi.org/10.1093/bioinformatics/btp324
  15. 15.
    H. Li, J. Ruan, R. Durbin, Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008). doi:10.1101/gr.078212.108, http://dx.doi.org/10.1101/gr.078212.108 Google Scholar
  16. 16.
    R. Li, Y. Li, K. Kristiansen, J. Wang, SOAP: short oligonucleotide alignment program. Bioinformatics (Oxford, England) 24(5), 713–714 (2008). doi:10.1093/bioinformatics/btn025, http://dx.doi.org/10.1093/bioinformatics/btn025
  17. 17.
    R. Li, C. Yu, Y. Li, T.W.W. Lam, S.M.M. Yiu, K. Kristiansen, J. Wang, SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics (Oxford, England) 25(15), 1966–1967 (2009). doi:10.1093/bioinformatics/btp336, http://dx.doi.org/10.1093/bioinformatics/btp336
  18. 18.
    W. Liu, B. Schmidt, W. Müller-Wittig, CUDA-BLASTP: accelerating BLASTP on CUDA-enabled graphics hardware. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8, 1678–1684 (2011)CrossRefGoogle Scholar
  19. 19.
    Y. Liu, B. Schmidt, D. Maskell, CUDASW\(++\)2.0: enhanced Smith–Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Res. Notes 3(1), 93 + (2010). doi:10.1186/1756-0500-3-93Google Scholar
  20. 20.
    A. Mahram, M.C. Herbordt, Fast and accurate NCBI BLASTp: acceleration with multiphase FPGA-based prefiltering, in Proceedings of ICS’10 (2010), ACM, New York, USA, pp. 73–28Google Scholar
  21. 21.
    NCBI BLAST, http://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed March 2012
  22. 22.
    NCBI GenBank database, http://www.ncbi.nlm.nih.gov/genbank/. Accessed March 2012
  23. 23.
    NCBI RefSeq database, http://www.ncbi.nlm.nih.gov/RefSeq/. Accessed March 2012
  24. 24.
    S.B. Needleman, C.D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  25. 25.
    G. Pfeiffer, S. Baumgart, J. Schröder, M. Schimmler, A massively parallel architecture for bioinformatics, in ICCS2009. Lecture Notes in Computer Science, vol. 5544 (Springer, Berlin, 2009), pp. 994–1003Google Scholar
  26. 26.
    SciEngines GmbH, http://www.sciengines.com. Accessed March 2012
  27. 27.
    T.F. Smith, M.S. Waterman, Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefGoogle Scholar
  28. 28.
    C. Starke, V. Grossmann, L. Wienbrandt, M. Schimmler, An FPGA implementation of an investment strategy processor, in ICCS2012. Procedia Computer Science, vol. 9 (Elsevier, 2012), pp. 1880–1889Google Scholar
  29. 29.
    C. Starke, V. Grossmann, L. Wienbrandt, S. Koschnicke, J. Carstens, M. Schimmler, Optimizing investment strategies with the reconfigurable hardware platform RIVYERA. Int. J. Reconfigurable Comput. 2012, 10 (2012). doi:10.1155/2012/646984CrossRefGoogle Scholar
  30. 30.
    Superfamily HMM library and genome assignments server, http://supfam.cs.bris.ac.uk/SUPERFAMILY/. Accessed March 2012
  31. 31.
    UniProt Knowledgebase, http://www.ebi.ac.uk/uniprot/. Accessed March 2012
  32. 32.
    L. Wienbrandt, S. Baumgart, J. Bissel, F. Schatz, M. Schimmler, Massively parallel FPGA-based implementation of BLASTp with the two-hit method, in ICCS2011. Procedia Computer Science, vol. 1 (Elsevier, 2011), pp. 1967–1976Google Scholar
  33. 33.
    L. Wienbrandt, D. Siebert, M. Schimmler, Improvement of BLASTp on the FPGA-based high-performance computer RIVYERA, in ISBRA2012. Lecture Notes in Bioinformatics, vol. 7292 (Springer, Berlin, Heidelberg, 2012), pp. 275–286Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceChristian-Albrechts-University of KielKielGermany

Personalised recommendations