Introducing Dependencies into Alignment Analysis and Its Use for Local Structure Prediction in Proteins
In this paper we explore several techniques of analysing sequence alignments. Their main idea is to generalize an alignment by means of a probability distribution. The Dirichlet mixture method is used as a reference to assess new techniques. They are compared based on a cross validation test with both synthetic and real data: we use them to identify sequence-structure relationships between target protein and possible local motifs. We show that the Beta method is almost as successful as the reference method, but it is much faster (up to 17 times). MAP (Maximum a Posteriori) estimation for two PSSMs (Position Specific Score Matrices) introduces dependencies between columns of an alignment. It is shown in our experiments to be much more successful than the reference method, but it is very computationally expensive. To this end we developed its parallel implementation.
Unable to display preview. Download preview PDF.
- 1.Agarwal, P., Bafna, V.: Detecting Non-adjoining Correlations with Signals in DNA. In: RECOMB 1998, pp. 2–8 (1998)Google Scholar
- 4.Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling Dependencies in Protein-DNA Binding Sites. In: RECOMB 2003, pp. 28–37 (2003)Google Scholar
- 6.Brown, M.P., Hughey, R., Krogh, A., Mian, I.S., Sjölander, K., Haussler, D.: Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families. In: Hunter, L., Searls, D., Shavlik, J. (eds.) ISMB 1993, pp. 47–55. AAAI/MIT Press, Menlo Park (1993)Google Scholar
- 10.Karplus, K.: Regularizers for Estimating Distributions of Amino Acids from Small Samples. Technical Report UCSC-CRL-95-11, University of California, Santa Cruz, CA, USA (1995), ftp://ftp.cse.ucsc.edu/pub/tr/ucsc-crl-95-11.ps.Z
- 11.Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. In: PSB 2001 (2001)Google Scholar
- 12.Sjölander, K., Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., Haussler, D.: Dirichlet Mixtures: a Method for Improved Detection of Weak but Significant Protein Sequence Homology. Computer Applications in Biosciences 12, 327–345 (1996)Google Scholar