Skip to main content

Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 4967)

Abstract

With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search.

Keywords

  • sequence
  • similarity search
  • spaced seeds
  • subset seeds
  • indexing
  • FPGA
  • reconfigurable architecture
  • dedicated hardware

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-68111-3_131
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-68111-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, W., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Rognes, T.: ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Research 29(7), 1647–1652 (2001)

    CrossRef  Google Scholar 

  3. Farrar, M.: Striped smith–waterman speeds database searches six times over other simd implementations. Bioinformatics 23(2), 156–161 (2007)

    CrossRef  Google Scholar 

  4. Darling, A., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference and Expo (CWCE 2003) (2003)

    Google Scholar 

  5. Thorsen, O., Smith, B., Sosa, C.P., Jiang, K., Lin, H., Peters, A., Fen, W.: Parallel genomic sequence-search on a massively parallel system. In: Int. Conference on Computing Frontiers (CF 2007), pp. 59–68 (2007)

    Google Scholar 

  6. Lavenier, D., Xinchun, L., Georges, G.: Seed-based genomic sequence comparison using a FPGA/FLASH accelerator. In: Field Programmable Technology (FPT 2006), pp. 41–48 (2006)

    Google Scholar 

  7. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    CrossRef  Google Scholar 

  8. Crochemore, M., Landau, G., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Symposium On Discrete Algorithms (SODA 2002), pp. 679–688 (2002)

    Google Scholar 

  9. Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    CrossRef  Google Scholar 

  10. Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33, W540–W543 (2005)

    CrossRef  Google Scholar 

  11. Csürös, M., Ma, B.: Rapid homology search with two-stage extension and daughter seeds. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 104–114. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  12. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)

    CrossRef  MathSciNet  Google Scholar 

  13. Brejová, B., Brown, D., Vinar, T.: Vector seeds: An extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005)

    CrossRef  MathSciNet  MATH  Google Scholar 

  14. Li, M., Ma, M., Zhang, L.: Superiority and complexity of the spaced seeds. In: Symp. on Discrete Algorithms (SODA 2006), pp. 444–453 (2006)

    Google Scholar 

  15. Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)

    CrossRef  Google Scholar 

  16. Kisman, D., Li, M., Ma, B., Li, W.: tPatternhunter: gapped, fast and sensitive translated homology search. Bioinformatics 21(4), 542–544 (2005)

    CrossRef  Google Scholar 

  17. Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 29–38 (2005)

    CrossRef  Google Scholar 

  18. Kung, H.T., Leiserson, C.: Algorithms for VLSI processors arrays. Addison-Wesley, Reading (1980)

    Google Scholar 

  19. Lipton, R., Lopresti, D.: In: Fuchs, H. (ed.) A systolic array for rapid string comparison, pp. 363–376. Computer Science Press, Rockville, MD (2004)

    Google Scholar 

  20. Chow, E., Hunkapiller, T., Peterson, J., Waterman, M.S.: Biological information signal processor. In: International Conference on Application Specific Array Processors (ASAP 1991), pp. 144–160 (1991)

    Google Scholar 

  21. Hoang, D.: Searching genetic databases on splash 2. In: IEEE Workshop on FPGAs for Custom Computing Machines (FCCM 1993), Napa, California, pp. 185–191 (1993)

    Google Scholar 

  22. Lavenier, D., Giraud, M.: Bioinformatics Applications. In: Gokhale, M.B., Graham, P.S. (eds.) Reconfigurable Computing, Springer, Heidelberg (2005)

    Google Scholar 

  23. Dydel, S., Bala, P.: Large scale protein sequence alignment using FPGA reprogrammable logic devices. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 23–32. Springer, Heidelberg (2004)

    Google Scholar 

  24. Court, T.V., Herbordt, M.C.: Families of fpga-based accelerators for approximate string matching. Microprocessors and Microsystems 31(2), 135–145 (2007)

    CrossRef  Google Scholar 

  25. Singh, R.K., Tell, S.G., White, C.T., Hoffman, D., Chi, V.L., Erickson, B.W.: A scalable systolic multiprocessor system for analysis of biological sequences. In: Borrielo, G., Ebeling, C. (eds.) Symposium on Research on Integrated Systems, pp. 168–182 (1993)

    Google Scholar 

  26. Knowles, G., Gardner-Stephen, P.: A new hardware architecture for genomic and proteomic sequence alignment. In: IEEE Computational Systems Bioinformatics Conference (CSBC 2004) (2004)

    Google Scholar 

  27. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinf. Comp. Biology 4(2), 553–569 (2006)

    CrossRef  Google Scholar 

  28. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peterlongo, P. et al. (2008). Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_131

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_131

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)