Flexible identification of structural objects in nucleic acid sequences: Palindromes, mirror repeats, pseudoknots and triple helices

  • Marie-France Sagot
  • Alain Viari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1264)


This paper presents algorithms for flexibly identifying structural objects in nucleic acid sequences. These objects are palindromes, mirror repeats, pseudoknots and triple helices. We further explore here the idea of a model against which the words in a sequence are compared for finding these structural objects [17]. In the present case, models are words defined over the alphabet of nucleotides that have both direct and inverse occurrences in the sequence. Moreover, errors (substitutions, deletions and insertions) are allowed between a model and its inverse occurrences. Helix stems may therefore present bulges or interior loops, and mirror repeats need not be exact. Reasonably efficient performance comes from the fact that the parts composing the structures are kept separated until the end and that filtering for valid occurrences (occurrences that may form part of such a structure) can be done in O(n) time where n is the length of the sequence. The time complexity for the searching phase (that is, before the structural parts are put together at the end) of both algorithms presented here (one for palindromes and mirror repeats, the other for pseudoknots and triple helices) is then O(nk(e+1)(1+min d max -d min +1+e, k e Σ e )) where n is the length of the sequence, d max and d min are, respectively, the maximal and minimal length of a hairpin loop, k is either the maximum length k max of a model, is a fixed length or represents the maximum value of a range of lengths, e is the maximum number of errors allowed (substitutions, deletions and insertions) and ∣Σ∣ is the size of the alphabet of nucleotides.


nucleic acid sequence nucleic structural object palindrome mirror repeat pseudoknot triple helix approximate comparison model direct occurrence (complementary) inverse occurrence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. P. Abrahams, M. v. d. Berg, E. v. Batenburg, and C. Pleij. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Comput. Appli. Biosci., 8:243–248, 1992.Google Scholar
  2. 2.
    B. Billoud, M. Kontic, and A. Viari. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databases. Nucleic Acids Res., 24:1395–1403, 1996.Google Scholar
  3. 3.
    D. Bouthinon, H. Soldano, and B. Billoud. Apprentissage d'un concept commun à un ensemble d'objets dont la description est hypothétique: application à la découverte de structures secondaires d'ARN. In 11émes Journés Françaises d'Apprentissage, 1996.Google Scholar
  4. 4.
    M. Brown and C. Wilson. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. manuscript — University of California, Santa Cruz, Oct. 1995, 1995.Google Scholar
  5. 5.
    J.-H. Chen, S.-Y. Le, and J. V. Maizel. A procedure for RNA pseudoknot prediction. Comput. Appli. Biosci., 8:243–248, 1992.Google Scholar
  6. 6.
    D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–128, 1985.Google Scholar
  7. 7.
    I. Tinoco Jr., P. W. Davis, C. C. Hardin, J. D. Puglisi, G. T. Walker, and J. Wyatt. RNA structures from A to Z. In Cold Spring Harbor Symposia on Quantitative Biology, volume LII, pages 135–146. Cold Spring Harbor Laboratory, 1987.Google Scholar
  8. 8.
    N. A. Kolchanov, I. I. Titov, I. E. Vlassova, and V. V. Vlassov. Chemical and computer probing of RNA structure. In W. E. Cohn and K. Moldave, editors, Progress in Nucleic Acid Research and Molecular Biology, pages 131–196. Academic Press, 1996.Google Scholar
  9. 9.
    M. Kontic. Palingol. Langage pour la description et la recherche de structures secondaires dans les séquences nucléotidiques, 1993. DEA d'Intelligence Artificielle, Université de Paris Nord.Google Scholar
  10. 10.
    F. Lefebvre. An optimized parsing algorithm well suited for RNA folding. In Proceedings First International Conference on Intelligent Systems for Molecular Biology, Cambridge, England, 1995.Google Scholar
  11. 11.
    B. Lewin. Genes V. Oxford University Press, 1994.Google Scholar
  12. 12.
    H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Res., 11:4629–4634, 1983.Google Scholar
  13. 13.
    H. M. Martinez. Detecting pseudoknots and other local base-pairing structures in RNA sequences. 183:306–317, 1990.Google Scholar
  14. 14.
    S. M. Murkin, V. I. Lyamichev, K. N. Druhlyak, V. N. Dobrynin, S. A. Filipov, and M. D. Frank-Kamenetskii. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature, 330:495–497, 1987.Google Scholar
  15. 15.
    E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.Google Scholar
  16. 16.
    C. W. A. Pleij and L. Bosch. RNA pseudoknots: structure, detection, and prediction. 180:289–303, 1989.Google Scholar
  17. 17.
    M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. pages 87–100, Viñas del Mar, Chili, 1995. Second South American Workshop on String Processing.Google Scholar
  18. 18.
    M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors, Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.Google Scholar
  19. 19.
    M.-F. Sagot, A. Viari, and H. Soldano. A distance-based block searching algorithm. pages 322–331, Cambridge, England, 1995. Third International Symposium on Intelligent Systems for Molecular Biology.Google Scholar
  20. 20.
    M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. In Z. Galil and E. Ukkonen, editors, Combinatorial Pattern Matching, volume 937 of Lecture Notes in Computer Science, pages 366–385. Springer-Verlag, 1995. to appear in Theoret. Comput. Sci. Google Scholar
  21. 21.
    Y. Sakakibara, M. Brown, R. Hughey, I. S. Mian, K. Sjolander, R. C. Underwood, and D. Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res., 22:5112–5120, 1994.Google Scholar
  22. 22.
    D. Searls. The linguistics of DNA. American Scientist, 80:579–591, 1992.Google Scholar
  23. 23.
    M. S. Waterman. Consensus methods for folding single-stranded nucleic acids. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 185–224. CRC Press, 1989.Google Scholar
  24. 24.
    S. Wu, U. Manber, and E. W. Myers. An O(NP) sequence comparison algorithm. Inf. Proc. Letters, 35:317–323, 1990.Google Scholar
  25. 25.
    M. Zuker and D. Sankoff. RNA secondary structures and their prediction. Bull. Math. Biol., 46:591–621, 1984.Google Scholar
  26. 26.
    M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9:133–148, 1981.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Marie-France Sagot
    • 1
    • 2
  • Alain Viari
    • 2
  1. 1.Institut Gaspard MongeUniversité de Marne-la-ValléeNoisy-le-Grand
  2. 2.Atelier de BioInformatiqueUniversité de Paris 6Paris

Personalised recommendations