Skip to main content

Abstract

In a world of constant changes, conserved patterns of any kind are objects of interest for various reasons. Some are prosaic. If one has to perform a given operation on a set of objects and some of these objects are identical one may sometimes economize by performing just one operation for each group of identical objects. If the objects are not identical but almost (there are just a very limited number of well-characterized differences between them), one could perhaps adjust the operation to a smaller number of steps than starting from scratch for each object among the group of almost identical ones.

Partially supported by CAPES-COFECUB (project 272/99-II), PRONEX project 107/97 (mct/finep/cnpq), and CNPQ (proc. 464114/00-4 and proc. 304527/89-0)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Bieganski, J. Riedl, J. V. Carlis, and E. Retzel. Generalized suffix trees for biological sequence data: applications and implementations. In Proc of the 27th Hawai Int. Conf. on Systems Sci., pages 35–44. IEEE Computer Society Press, 1994.

    Google Scholar 

  2. B. Charlesworth, P. Sniegowski, and W. Stephan.The evolutionary dynamics of repetitive DNA in eukaryotes. Nature, 371:215–220, 1994.

    Article  Google Scholar 

  3. B. Clift, D. Haussler, R. McConnell, T. D. Schneider, and G. D. Stormo. Sequence landscapes. Nucleic Acids Res., 14:141–158, 1986.

    Article  Google Scholar 

  4. T. E. Creighton. Proteins: Structures and Molecular Properties. W.H. Freeman, 1993.

    Google Scholar 

  5. M. Crochemore. An optimal algorithm for computing the repetitions in a word. Inf. Proc. Letters, 12:244–250, 1981.

    Article  MATH  MathSciNet  Google Scholar 

  6. M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.

    Google Scholar 

  7. M. Dayhoff, R. Schwartz, and B. Orcutt. A model of evolutionary change in proteins. In M. Dayhoff, editor, Atlas of Protein Sequence an Structure volume 5 suppl. 3, pages 345–352. Natl. Biomed. Res. Found 1978.

    Google Scholar 

  8. O. Delgrange. Un algorithme rapide pour une compression modulaire optimale. Application à l’analyse de séquences génétiques. Thèse de doctorat, Université de Lille I, 1997.

    Google Scholar 

  9. V. Escalier, J. Pothier, H. Soldano, and A. Viari. Pairwise and multiple identification of three dimensional common substructures in proteins. J. Computational Biology, 1996.

    Google Scholar 

  10. V. Fischetti, G. Landau, J. Schmidt, and P. Sellers. Identifying periodic occurrences of a template with applications to protein structure. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 111–120. Springer-Verlag, 1992.

    Chapter  Google Scholar 

  11. Y. M. Praenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned DNA sequences: application to escherichia coli lrp regulon. Comput. Appl. Biosci., 11:379–387, 1995.

    Google Scholar 

  12. D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences, analysis of promoter sequences from escherichia coli. J.Mol. Biol., 186:117–128, 1985.

    Article  Google Scholar 

  13. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Book  MATH  Google Scholar 

  14. S. Henikoff and J. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA, 89:10915–10919, 1992.

    Article  Google Scholar 

  15. L. C. K. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.

    Google Scholar 

  16. I. Jonassen. Efficient discovery of conserved patterns using a pattern graph. Comput. Appl. Biosci., 13:509–522, 1997.

    Google Scholar 

  17. I. Jonassen, J. F. Collins, and D. G. Higgins. Finding flexible patterns in unaligned protein sequences. Protein Science, 4:1587–1595, 1995.

    Article  Google Scholar 

  18. I. Jonassen, I. Eidhammer, and W. R. Taylor. Discovery of local packing motifs in protein structures. Proteins: Structure, Function, and Genetics, 34:206–219, 1999.

    Article  Google Scholar 

  19. S. K. Kannan and E. W. Myers. An algorithm for locating non-overlapping regions of maximum alignment score. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 684 of Lecture Notes in Computer Science, page 7486. Springer-Verlag, 1993.

    Google Scholar 

  20. R. Karp, R. Miller, and A. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proc. 4th Annu. ACM Symp. Theory of Computing, pages 125–136, 1972.

    Google Scholar 

  21. A. Klingenhoff, K. Frech, K. Quandt, and T. Werner. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 1, 15:180–186, 1999.

    Article  Google Scholar 

  22. G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 684 of Lecture Notes in Computer Science, pages 120–133. Springer-Verlag, 1993.

    Chapter  Google Scholar 

  23. L. Marsan and M.-F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Computational Biology, 7:345–362, 2000.

    Article  Google Scholar 

  24. algorithms and application to promoter consensus identification. In S. Istrail, P. Pevzner, and M. Waterman, editors, RECOMB’00. Proceedings of Fourth Annual International Conference on Computational Molecular Biology. ACM Press, 2000.

    Google Scholar 

  25. E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23:262–272, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  26. A. Milosavljevic and J. Jurka. Discovering simple DNA sequences by the algorithmic significance method. Comput. Appl. Biosci., 9:407–411, 1993.

    Google Scholar 

  27. L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and polynomial time algorithms. In Proc. of the eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 297–308. ACM Press, 2000.

    Google Scholar 

  28. J. Pothier. 1993. Personal communication.

    Google Scholar 

  29. C. Queen, M. N. Wegman, and L. J. Korn. Improvements to a program for DNA analysis: a procedure to find homologies among many sequences. Nucleic Acids Res., 10:449–456, 1982.

    Article  Google Scholar 

  30. G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan. Stereochemistry of polypeptide chain configurations. J. Mol. Biol., 87:95–99, 1963.

    Article  Google Scholar 

  31. J. Risler, M. Delorme, H. Delacroix, and A. Hénaut. Amino acid substitutions in srtucturally related proteins: a pattern recognition approach. J. Mol. Biol., 204:1019–1029, 1988.

    Article  Google Scholar 

  32. E. Rivals and O. Delgrange. A first step toward chromosome analysis by compression algorithms. In N. G. Bourbakis, editor, First International IEEE Symposium on Intelligence in Neural and Biological Systems, pages 233–239. IEEE Computer Society Press, 1995.

    Chapter  Google Scholar 

  33. M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. In R. Baeza-Yates and U. Manber, editors, Second South American Workshop on String Processing pages 87–100, Viñas del Mar, Chili, 1995. University of Chili.

    Google Scholar 

  34. M.-F. Sagot and E. W. Myers. Identifying satellites and periodic repetitions m biological sequences. J. of Computational Biology, 10:10–20, 1998.

    Google Scholar 

  35. M.-F. Sagot and E. W. Myers. Identifying satellites in nucleic acid sequences. In S. Istrail, P. Pevzner, and M. Waterman, editors, RECOMB’98 Proceedings of Second Annual International Conference on Computational Molecular Biology, pages 234–242. ACM Press, 1998.

    Chapter  Google Scholar 

  36. M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.

    Chapter  Google Scholar 

  37. M. F. Sagot, A. Viari, J. Pothier, and H. Soldano. Finding flexible patterns in a text — an application to 3D molecular matching. Comput. Appl. Biosci., 11:59–70, 1995.

    Google Scholar 

  38. M.-F. Sagot, A. Viari, and H. Soldano.A distance-based block searching algorithm. In C. Rawlings, D. Clark, R. Altman, L. Hunter, T. Lengauer, and S. Wodak, editors, Third International Symposium on Intelligent Systems for Molecular Biology, pages 322–331, Cambridge, England, 1995. AAAI Press.

    Google Scholar 

  39. M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. Theoret Comput Sci., 180:115–137, 1997.presented at Combinatorial Pattern Matching 1995.

    Article  MATH  MathSciNet  Google Scholar 

  40. H. Soldano, A. Viari, and M. Champesme. Searching for flexible repeated patterns using a non transitive similarity relation. Pattern Recognition Letters, 16:233–246, 1995.

    Article  Google Scholar 

  41. R. Staden. Methods for discovering novel motifs in nucleic acid sequences. Comput Appl. Biosci., 5:293–298, 1989.

    Google Scholar 

  42. E. Ukkonen. Constructing suffix trees on-line in linear time. In IFIP’92, pages 484–492, 1992.

    Google Scholar 

  43. J. van Helden, A. F. Rios, and J. Collado-Vides. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res., 28:1808–1818, 2000.

    Article  Google Scholar 

  44. A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot. Inferring regulatory elements from a whole genome. An analysis of the σ 80 family of promoter signals. J.Mol. Biol. 297:335–353, 2000.

    Article  Google Scholar 

  45. A. Vanet, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithmical methods for identifying them. Research in Microbiology, 150:779–799, 1999.

    Article  Google Scholar 

  46. R. Verin and M. Crochemore. Direct construction of compact directed acyclic word graphs. In A. Apostolico and J. Hein, editors, Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 116–129. Springer-Verlag, 1997.

    Google Scholar 

  47. M. S. Waterman. General methods of sequence comparison. Bull. Math. Biol., 46:473–500, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  48. M. S. Waterman. Multiple sequence alignments by consensus. Nucleic Acids Res., 14:9095–9102, 1986.

    Article  Google Scholar 

  49. M. S. Waterman. Consensus patterns in sequences. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 93–116. CRC Press, 1989.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Sagot, MF., Wakabayashi, Y. (2003). Pattern Inference under many Guises. In: Reed, B.A., Sales, C.L. (eds) Recent Advances in Algorithms and Combinatorics. CMS Books in Mathematics / Ouvrages de mathématiques de la SMC. Springer, New York, NY. https://doi.org/10.1007/0-387-22444-0_8

Download citation

  • DOI: https://doi.org/10.1007/0-387-22444-0_8

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4684-9268-2

  • Online ISBN: 978-0-387-22444-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics