Skip to main content

Informative Motifs in Protein Family Alignments

  • Conference paper
Algorithms in Bioinformatics (WABI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Included in the following conference series:

Abstract

Consensus and sequence pattern analysis on family alignments are extensively used to identify new family members and to determine functionally and structurally important identities. Since these common approaches emphasize dominant characteristics of the family and assume residue identities are independent at each position, there is no way to describe residue preferences outside of the family consensus. In this study, we propose a novel approach to detect motifs outside the consensus of a protein family alignment via an information theoretic approach. We implemented an algorithm that discovers frequent residue motifs that are high in information content and outside of the family consensus, called informative motifs, inspired by the classic Apriori algorithm. We observed that these informative motifs are mostly spatially localized and present distinctive features of various members of the family. Availability: The source code is available upon request from the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358 (1987)

    Article  Google Scholar 

  2. Gribskov, M., Luthy, R., Eisenberg, D.: Profile analysis. Methods in Enzymology 183, 146 (1990)

    Article  Google Scholar 

  3. Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. 91(3), 1059–1063 (1994)

    Article  Google Scholar 

  4. Eddy, S., Mitchison, G., Durbin, R.: Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23 (1995)

    Article  Google Scholar 

  5. Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)

    Article  Google Scholar 

  6. Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequences. Nuc. Acids Res. 18(20), 6097–6100 (1990)

    Article  Google Scholar 

  7. Halperin, I., Wolfson, H., Nussinov, R.: Correlated Mutations: Advances and limitations.A study on fusion proteins and on the Chesin-Dockerin family. Proteins 63, 832–845 (2006)

    Article  Google Scholar 

  8. Valdar, W.S.J.: Scoring residue conservation. Proteins 48, 227–241 (2002)

    Article  Google Scholar 

  9. Ray, W.C.: MAVL and StickWRLD: visually exploring relationships in nucleic acid sequence alignments. Nucleic Acids Res. 32, W59–W63 (2004)

    Google Scholar 

  10. Ray, W.C.: MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features. Nucleic Acids Res. 33, W315–W319 (2005)

    Google Scholar 

  11. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  12. Shannon, C.E.: A mathematical theory of communication. Bell Sys. Tech. J. 27, 379–423, 623–656 (1948)

    MathSciNet  MATH  Google Scholar 

  13. Dunham, M.: Association Rules. In: Data Mining: Introductary and Advanced Topics, pp. 164–191. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  14. Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: Pfam: clans, web tools and services. Nucleic Acids Research 34(Database issue), D247–D251 (2006)

    Google Scholar 

  15. Long, J.J., Wang, J.L., Berry, J.O.: Cloning and analysis of the C4 photosynthetic NAD-dependent malic enzyme of amaranth mitochondria. J. Biol. Chem. 269(4), 2827–2833 (1998)

    Google Scholar 

  16. Berry, M., Phillips Jr., G.N.: Crystal Structures of Bacillus stearothermophilus Adenylate kinase with boundAp5A,Mg2+Ap5A, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+. Prot. Str. Func. Gen. 32, 276–288 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ozer, H.G., Ray, W.C. (2007). Informative Motifs in Protein Family Alignments. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74126-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74125-1

  • Online ISBN: 978-3-540-74126-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics