Informative Motifs in Protein Family Alignments

Ozer, Hatice Gulcin; Ray, William C.

doi:10.1007/978-3-540-74126-8_15

Hatice Gulcin Ozer^1,2 &
William C. Ray^2,3

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1075 Accesses
2 Citations

Abstract

Consensus and sequence pattern analysis on family alignments are extensively used to identify new family members and to determine functionally and structurally important identities. Since these common approaches emphasize dominant characteristics of the family and assume residue identities are independent at each position, there is no way to describe residue preferences outside of the family consensus. In this study, we propose a novel approach to detect motifs outside the consensus of a protein family alignment via an information theoretic approach. We implemented an algorithm that discovers frequent residue motifs that are high in information content and outside of the family consensus, called informative motifs, inspired by the classic Apriori algorithm. We observed that these informative motifs are mostly spatially localized and present distinctive features of various members of the family. Availability: The source code is available upon request from the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358 (1987)
Article Google Scholar
Gribskov, M., Luthy, R., Eisenberg, D.: Profile analysis. Methods in Enzymology 183, 146 (1990)
Article Google Scholar
Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. 91(3), 1059–1063 (1994)
Article Google Scholar
Eddy, S., Mitchison, G., Durbin, R.: Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23 (1995)
Article Google Scholar
Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)
Article Google Scholar
Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequences. Nuc. Acids Res. 18(20), 6097–6100 (1990)
Article Google Scholar
Halperin, I., Wolfson, H., Nussinov, R.: Correlated Mutations: Advances and limitations.A study on fusion proteins and on the Chesin-Dockerin family. Proteins 63, 832–845 (2006)
Article Google Scholar
Valdar, W.S.J.: Scoring residue conservation. Proteins 48, 227–241 (2002)
Article Google Scholar
Ray, W.C.: MAVL and StickWRLD: visually exploring relationships in nucleic acid sequence alignments. Nucleic Acids Res. 32, W59–W63 (2004)
Google Scholar
Ray, W.C.: MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features. Nucleic Acids Res. 33, W315–W319 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Sys. Tech. J. 27, 379–423, 623–656 (1948)
MathSciNet MATH Google Scholar
Dunham, M.: Association Rules. In: Data Mining: Introductary and Advanced Topics, pp. 164–191. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: Pfam: clans, web tools and services. Nucleic Acids Research 34(Database issue), D247–D251 (2006)
Google Scholar
Long, J.J., Wang, J.L., Berry, J.O.: Cloning and analysis of the C4 photosynthetic NAD-dependent malic enzyme of amaranth mitochondria. J. Biol. Chem. 269(4), 2827–2833 (1998)
Google Scholar
Berry, M., Phillips Jr., G.N.: Crystal Structures of Bacillus stearothermophilus Adenylate kinase with boundAp5A,Mg2+Ap5A, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+. Prot. Str. Func. Gen. 32, 276–288 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biophysics Program,
Hatice Gulcin Ozer
Columbus Children’s Research Institute,
Hatice Gulcin Ozer & William C. Ray
Department of Pediatrics, The Ohio State University, 700 Children’s Drive Columbus, OH 43205, USA
William C. Ray

Authors

Hatice Gulcin Ozer
View author publications
You can also search for this author in PubMed Google Scholar
William C. Ray
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozer, H.G., Ray, W.C. (2007). Informative Motifs in Protein Family Alignments. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-74126-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics