Introducing Dependencies into Alignment Analysis and Its Use for Local Structure Prediction in Proteins

  • Szymon Nowakowski
  • Krzysztof Fidelis
  • Jerzy Tiuryn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3911)


In this paper we explore several techniques of analysing sequence alignments. Their main idea is to generalize an alignment by means of a probability distribution. The Dirichlet mixture method is used as a reference to assess new techniques. They are compared based on a cross validation test with both synthetic and real data: we use them to identify sequence-structure relationships between target protein and possible local motifs. We show that the Beta method is almost as successful as the reference method, but it is much faster (up to 17 times). MAP (Maximum a Posteriori) estimation for two PSSMs (Position Specific Score Matrices) introduces dependencies between columns of an alignment. It is shown in our experiments to be much more successful than the reference method, but it is very computationally expensive. To this end we developed its parallel implementation.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, P., Bafna, V.: Detecting Non-adjoining Correlations with Signals in DNA. In: RECOMB 1998, pp. 2–8 (1998)Google Scholar
  2. 2.
    Aloy, P., Stark, A., Hadley, C., Russell, R.B.: Predictions Without Templates: New Folds, Secondary Structure, and Contacts in CASP5. Proteins: Struct. Funct. Genet. 53, 436–456 (2003)CrossRefGoogle Scholar
  3. 3.
    Altschul, S.F.: Amino Acid Substitution Matrices from an Information Theoretic Perspective. JMB 219, 555–565 (1991)CrossRefGoogle Scholar
  4. 4.
    Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling Dependencies in Protein-DNA Binding Sites. In: RECOMB 2003, pp. 28–37 (2003)Google Scholar
  5. 5.
    Brenner, S.E., Koehl, P., Levitt, M.: The ASTRAL Compendium for Sequence and Structure Analysis. Nucleic Acids Research 28, 254–256 (2000)CrossRefGoogle Scholar
  6. 6.
    Brown, M.P., Hughey, R., Krogh, A., Mian, I.S., Sjölander, K., Haussler, D.: Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families. In: Hunter, L., Searls, D., Shavlik, J. (eds.) ISMB 1993, pp. 47–55. AAAI/MIT Press, Menlo Park (1993)Google Scholar
  7. 7.
    Bulyk, M.L., Johnson, P.L., Church, G.M.: Nucleotides of Transcription Factor Binding Sites Exert Interdependent Effects On the Binding Affinities of Transcription Factors. Nuc. Acids Res. 30, 1255–1261 (2002)CrossRefGoogle Scholar
  8. 8.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  9. 9.
    Hvidsten, R.H., Kryshtafovych, A., Komorowski, J., Fidelis, K.: A Novel Approach to Fold Recognition Using Sequence-Derived Properties From Sets of Structurally Similar Local Fragments of Proteins. Bioinformatics 19, 81–91 (2003)CrossRefGoogle Scholar
  10. 10.
    Karplus, K.: Regularizers for Estimating Distributions of Amino Acids from Small Samples. Technical Report UCSC-CRL-95-11, University of California, Santa Cruz, CA, USA (1995),
  11. 11.
    Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: PSB 2001 (2001)Google Scholar
  12. 12.
    Sjölander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., Haussler, D.: Dirichlet Mixtures: a Method for Improved Detection of Weak but Significant Protein Sequence Homology. Computer Applications in Biosciences 12, 327–345 (1996)Google Scholar
  13. 13.
    Smith, R.F., Smith, T.F.: Automatic Generation of Primary Sequence Patterns from Sets of Related Protein Sequences. PNAS 87, 118–122 (1990)CrossRefGoogle Scholar
  14. 14.
    Tatusov, R.L., Altschul, S.F., Koonin, E.V.: Detection of Conserved Segments in Proteins: Iterative Scanning of Sequence Databases with Alignment Blocks. PNAS 91, 12091–12095 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Szymon Nowakowski
    • 1
  • Krzysztof Fidelis
    • 2
  • Jerzy Tiuryn
    • 1
  1. 1.Institute of InformaticsWarsaw UniversityWarszawaPoland
  2. 2.Genome CenterUniversity of California, Davis, Genome and Biomedical Sciences FacilityDavisUSA

Personalised recommendations