Skip to main content

Sequence Motif Identification and Protein Family Classification Using Probabilistic Trees

  • Conference paper
Book cover Advances in Bioinformatics and Computational Biology (BSB 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3594))

Included in the following conference series:

Abstract

Efficient family classification of newly discovered protein sequences is a central problem in bioinformatics. We present a new algorithm, using Probabilistic Suffix Trees, which identifies equivalences between the amino acids in different positions of a motif for each family. We also show that better classification can be achieved identifying representative fingerprints in the amino acid chains.

This work is partially supported by CAPES and is part of PRONEX/FAPESP’s Project Stochastic behavior, critical phenomena and rhythmic pattern identification in natural languages (grant number 03/09930-9).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karp, R.M.: Mathematical challenges from genomics and molecular biology. Notices Amer. Math. Soc. 49, 544–553 (2002)

    MATH  MathSciNet  Google Scholar 

  2. Rissanen, J.: A universal data compression system. IEEE Trans. Inform. Theory 29, 656–664 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bejerano, G., Yona, G.: Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 17, 23–43 (2001)

    Article  Google Scholar 

  4. Eskin, E., Grundy, W.N., Singer, Y.: Protein family classification using sparse markov transducers. In: Proc. Int’l Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 134–145 (2000)

    Google Scholar 

  5. Bourguignon, P.Y., Robelin, D.: Modèles de Markov parcimonieux: sélection de modèle et estimation. Manuscript (2004)

    Google Scholar 

  6. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucl. Acids Res. 32, D138–D141 (2004)

    Article  Google Scholar 

  7. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res. 31, 365–370 (2003)

    Article  Google Scholar 

  8. Pearson, W.R.: Comparison of methods for searching protein sequence databases. Protein Sci. 4, 1145–1160 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leonardi, F., Galves, A. (2005). Sequence Motif Identification and Protein Family Classification Using Probabilistic Trees. In: Setubal, J.C., Verjovski-Almeida, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2005. Lecture Notes in Computer Science(), vol 3594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532323_20

Download citation

  • DOI: https://doi.org/10.1007/11532323_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28008-8

  • Online ISBN: 978-3-540-31861-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics