Skip to main content

Exhaustive Peptide Searching Using Relations

  • Conference paper
Data Management. Data, Data Everywhere (BNCOD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4587))

Included in the following conference series:

Abstract

We present a new robust solution to short peptide searching, tested on a relational platform, with a set of biological queries. Our algorithm is appropriate for large scale scientific data analysis, and has been tested with 1.4 GB of amino-acids. Protein sequences are indexed as short overlapping string windows, and stored in a relation. To find approximate matches, we use a neighbourhood generation algorithm. The words in the neighbourhood are then fetched and stored in a relation. We measure execution time and compare the matches found to those delivered by BLAST. We report some performance gains in exact matching and searching within edit distance 1, and very significant quality improvements over heuristics, as we guarantee to deliver all relevant matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Baeza-Yates, R., Navarro, G.: A Hybrid Indexing Method for Approximate String Matching. JDA 1, 205–239 (2001)

    MathSciNet  Google Scholar 

  4. Burkhardt, S., et al.: q-gram Based Database Searching Using a Suffix Array. In: RECOMB, pp. 77–83. ACM Press, New York (1999)

    Chapter  Google Scholar 

  5. Eckman, B.A., Kaufmann, A.: Querying BLAST within a Data Federation. IEEE Data Eng. Bull. 27(3), 12–19 (2004)

    Google Scholar 

  6. Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein Bioinformatics. Wiley, Chichester (2003)

    Google Scholar 

  7. Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty, B.A., Merrick, J.M., McKenney, K., Sutton, G., FitzHugh, W., Fields, C., Gocayne, J.D., Scott, J., Shirley, R., Liu, L.-I., Glodek, A., Kelley, J.M., Weidman, J.F., Phillips, C.A., Spriggs, T., Hedblom, E., Cotton, M.D., Utterback, T.R., Hanna, M.C., Nguyen, D.T., Saudek, D.M., Brandon, R.C., Fine, L.D., Fritchman, J.L., Fuhrmann, J.L., Geoghagen, N.S.M., Gnehm, C.L., McDonald, L.A., Small, K.V., Fraser, C.M., Smith, H.O., Venter, J.C.: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223), 496–512 (1995)

    Article  Google Scholar 

  8. Gautier, L., et al.: Alternative mapping of probes to genes for Affymetrix chips. BMC Bioninformatics, p. 111 (2004)

    Google Scholar 

  9. Guccione, S.A., Keller, E.: Gene Matching using JBits. In: Glesner, M., Zipf, P., Renovell, M. (eds.) FPL 2002. LNCS, vol. 2438, pp. 1168–1171. Springer, Heidelberg (2002)

    Google Scholar 

  10. Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  11. Hyyrö, H., Navarro, G.: A Practical Index for genome Searching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 341–349. Springer, Heidelberg (2003)

    Google Scholar 

  12. Hunt, E.: Indexed Searching on Proteins Using a Suffix Sequoia. IEEE Data Eng. Bulletin 27(3), 24–31 (2004)

    Google Scholar 

  13. Hunt, E., Atkinson, M.P., Irving, R.W.: A database index to large biological sequences. In: VLDB, pp. 139–148. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  14. Hunt, E., Atkinson, M.P., Irving, R.W.: Database Indexing for Large DNA and Protein Sequence Collections. The. VLDB Journal 11, 256–271 (2002)

    Article  MATH  Google Scholar 

  15. Kahveci, T., Singh, A.K.: An Efficient Index Structure for String Databases. In: VLDB, pp. 351–360. Morgan and Kaufmann, Washington (2001)

    Google Scholar 

  16. Kahveci, T., Singh, A.K.: Progressive searching of biological sequences. IEEE Data Eng. Bull. 27(3), 32–39 (2004)

    Google Scholar 

  17. Karakoç, E., Özsoyoglu, Z.M., Sahinalp, S.C., Tasan, M., Zhang, X.: Novel approaches to biomolecular sequence indexing. IEEE Data Eng. Bull. 27(3), 40–47 (2004)

    Google Scholar 

  18. Kent, W.J.: BLAT: The BLAST-like Alignment Tool. Genome Res. 12(4), 656–664 (2002)

    MathSciNet  Google Scholar 

  19. Kim, Y.J., Boyd, A., Athey, B.D., Patel, J.M.: miBLAST: Scalable Evaluation of a Batch of Nucleotide Sequence Queries with BLAST. Nucleic Acids Research 33, 4335–4344 (2005)

    Article  Google Scholar 

  20. Levenstein, V.I.: Binary codes capable of correcting insertions and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  21. Meek, C., Patel, J.M., Kasetty, S.: OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences. In: VLDB 2003, pp. 910–921 (2003)

    Google Scholar 

  22. Mewes, H.W., Hani, J., Pfeiffer, F., Frishman, D.: MIPS: a database for protein sequences and complete genomes. Nucleic Acids Research 26, 33–37 (1998)

    Article  Google Scholar 

  23. Miller, C., Gurd, J., Brass, A.: A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases. Bioinformatics 15, 111–121 (1999)

    Article  Google Scholar 

  24. Miranker, D.P., Briggs, W.J., Mao, R., Ni, S., Xu, W.: Biosequence Use Cases in MoBIoS SQL. IEEE Data Eng. Bull. 27(3), 3–11 (2004)

    Google Scholar 

  25. Myers, E.W.: A sublinear algorithm for approximate key word searching. Algorithmica 12(4/5), 345–374 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  26. Navarro. G.: NR-grep: A Fast and Flexible Pattern Matching Tool. Technical report (2000). TR/DCC-2000-3. University of Chile, Departmento de Ciencias de la Computacion, www.dcc.uchile.cl/~gnavarro

  27. Needleman, S.B., Wunsch, C.D.: A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of two Proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  28. Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: A Fast Search Method for Large DNA Databases. Genome Res. 11(10), 1725–1729 (2001)

    Article  Google Scholar 

  29. Sethupathy, P., et al.: A guide through present computational approaches for the identification of mammalian microRNA targets. Nat Methods 3(11), 881–886 (2006)

    Article  Google Scholar 

  30. Sidhu, S.S.: Phage Display In Biotechnology and Drug Discovery. Taylor and Francis, Abington (2005)

    Google Scholar 

  31. Smith, T.A., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  32. Stephens, S., Chen, J.Y., Thomas, S.: ODM BLAST: Sequence Homology Search in the RDBMS. IEEE Data Eng. Bull. 27(3), 20–23 (2004)

    Google Scholar 

  33. Ukkonen, E.: Approximate string matching over suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) Combinatorial Pattern Matching. LNCS, vol. 684, pp. 228–242. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  34. Wall, L., Schwartz, R.L., Christiansen, T., Potter, S.: Programming Perl. Nutshell Handbook, 2nd edn. O’Reilly & Associates (1996)

    Google Scholar 

  35. Work, L.M., Bining, H., Hunt, E., et al.: Vascular Bed-Targeted in vivo Gene Delivery Using Tropism-Modified Adeno-associated Viruses. Molecular Therapy 13(4), 683–693 (2006)

    Article  Google Scholar 

  36. Yamaguchi, Y., Miyajima, Y., Maruyama, T., Konagaya, A.: High Speed Homology Search Using Run-Time Reconfiguration. In: Glesner, M., Zipf, P., Renovell, M. (eds.) FPL 2002. LNCS, vol. 2438, pp. 281–291. Springer, Heidelberg (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Richard Cooper Jessie Kennedy

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hunt, E. (2007). Exhaustive Peptide Searching Using Relations. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73390-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73389-8

  • Online ISBN: 978-3-540-73390-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics