Journal of Mathematical Biology

, Volume 53, Issue 1, pp 135–161 | Cite as

Symmetric time warping, Boltzmann pair probabilities and functional genomics

Article

Abstract

Given two time series, possibly of different lengths, time warping is a method to construct an optimal alignment obtained by stretching or contracting time intervals. Unlike pairwise alignment of amino acid sequences, classical time warping, originally introduced for speech recognition, is not symmetric in the sense that the time warping distance between two time series is not necessarily equal to the time warping distance of the reversal of the time series. Here we design a new symmetric version of time warping, and present a formal proof of symmetry for our algorithm as well as for one of the variants of Aach and Church [1]. We additionally design quadratic time dynamic programming algorithms to compute both the forward and backward Boltzmann partition functions for symmetric time warping, and hence compute the Boltzmann probability that any two time series points are aligned. In the future, with the availability of increasingly long and accurate time series gene expression data, our algorithm can provide a sense of biological significance for aligned time points – e.g. our algorithm could be used to provide evidence that expression values of two genes have higher Boltzmann probability (say) in the G1 and S phase than in G2 and M phases. Algorithms, source code and web interface, developed by the first author, are made publicly available via the Boltzmann Time Warping web server at bioinformatics.bc.edu/clotelab/.

Key words or phrases

Time warping Boltzmann partition function gene expression data time series 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aach, J., Church, G.: Aligning gene expression time series with time warping algorithms. Bioinformatics 17 (6), 495–508 (2001)CrossRefGoogle Scholar
  2. 2.
    Clote, P., Backofen, R.: Computational Molecular Biology: An Introduction. John Wiley & Sons, 2000 286 pagesGoogle Scholar
  3. 3.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  4. 4.
    Spellman, P., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)Google Scholar
  5. 5.
    Cho, R., et al.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65–73 (1998)CrossRefGoogle Scholar
  6. 6.
    Cho, R. et al.: Transcriptional regulation and function during the human cell cycle. Nature Genetics 27, 48–54 (2001)Google Scholar
  7. 7.
    Kruskal, J.B., Liberman, M.: The symmetric time-warping problem: From continuous to discrete. In: Kruskal, J.B., Sankoff, D. (eds.), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, CSLI Publications,Stanford, 1999, pp. 125–161 Text originally published by Addison-Wesley in 1983Google Scholar
  8. 8.
    McCaskill, J.S.: The equilibrium partition function and base pair binding probabilities for rna secondary structure. Biopolymers 29, 1105–1119 (1990)CrossRefGoogle Scholar
  9. 9.
    Mückstein, U., Hofacker, I., Stadler, P.: Stochastic pairwise alignments. Bioinformatics 18, S153–S160 (2002)Google Scholar
  10. 10.
    Myazawa, S.: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Eng. 8, 999–1009 (1994)Google Scholar
  11. 11.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Bio. 48, 443–453 (1970)CrossRefGoogle Scholar
  12. 12.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefGoogle Scholar
  13. 13.
    Vingron, M., Argos, P.: Determination of reliable regions in protein sequence alignments. Protein Eng. 3 (7), 565–569 (1990)CrossRefGoogle Scholar
  14. 14.
    Waterman, M.S.: Introduction to Computational Biology - Maps, Sequences and Genomes. Chapman & Hall, 1995Google Scholar
  15. 15.
    Xia, T., Jr., SantaLucia, J., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., Turner, D.H.: Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37, 14719–35 (1999)CrossRefGoogle Scholar
  16. 16.
    Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406–3415 (2003)CrossRefGoogle Scholar
  17. 17.
    Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9, 133–148 (1981)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Department of BiologyCourtesy appt. in Computer Science, Boston CollegeChestnut HillUSA
  2. 2.Department of Molecular MedicineUniversity of Massachusetts Medical SchoolWorcesterUSA

Personalised recommendations