Pattern Inference under many Guises

Sagot, M.-F.; Wakabayashi, Y.

doi:10.1007/0-387-22444-0_8

M.-F. Sagot &
Y. Wakabayashi

Part of the book series: CMS Books in Mathematics / Ouvrages de mathématiques de la SMC ((CMSBM))

744 Accesses
2 Citations

Abstract

In a world of constant changes, conserved patterns of any kind are objects of interest for various reasons. Some are prosaic. If one has to perform a given operation on a set of objects and some of these objects are identical one may sometimes economize by performing just one operation for each group of identical objects. If the objects are not identical but almost (there are just a very limited number of well-characterized differences between them), one could perhaps adjust the operation to a smaller number of steps than starting from scratch for each object among the group of almost identical ones.

Partially supported by CAPES-COFECUB (project 272/99-II), PRONEX project 107/97 (mct/finep/cnpq), and CNPQ (proc. 464114/00-4 and proc. 304527/89-0)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Bieganski, J. Riedl, J. V. Carlis, and E. Retzel. Generalized suffix trees for biological sequence data: applications and implementations. In Proc of the 27th Hawai Int. Conf. on Systems Sci., pages 35–44. IEEE Computer Society Press, 1994.
Google Scholar
B. Charlesworth, P. Sniegowski, and W. Stephan.The evolutionary dynamics of repetitive DNA in eukaryotes. Nature, 371:215–220, 1994.
Article Google Scholar
B. Clift, D. Haussler, R. McConnell, T. D. Schneider, and G. D. Stormo. Sequence landscapes. Nucleic Acids Res., 14:141–158, 1986.
Article Google Scholar
T. E. Creighton. Proteins: Structures and Molecular Properties. W.H. Freeman, 1993.
Google Scholar
M. Crochemore. An optimal algorithm for computing the repetitions in a word. Inf. Proc. Letters, 12:244–250, 1981.
Article MATH MathSciNet Google Scholar
M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994.
Google Scholar
M. Dayhoff, R. Schwartz, and B. Orcutt. A model of evolutionary change in proteins. In M. Dayhoff, editor, Atlas of Protein Sequence an Structure volume 5 suppl. 3, pages 345–352. Natl. Biomed. Res. Found 1978.
Google Scholar
O. Delgrange. Un algorithme rapide pour une compression modulaire optimale. Application à l’analyse de séquences génétiques. Thèse de doctorat, Université de Lille I, 1997.
Google Scholar
V. Escalier, J. Pothier, H. Soldano, and A. Viari. Pairwise and multiple identification of three dimensional common substructures in proteins. J. Computational Biology, 1996.
Google Scholar
V. Fischetti, G. Landau, J. Schmidt, and P. Sellers. Identifying periodic occurrences of a template with applications to protein structure. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 111–120. Springer-Verlag, 1992.
Chapter Google Scholar
Y. M. Praenkel, Y. Mandel, D. Friedberg, and H. Margalit. Identification of common motifs in unaligned DNA sequences: application to escherichia coli lrp regulon. Comput. Appl. Biosci., 11:379–387, 1995.
Google Scholar
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for DNA sequences, analysis of promoter sequences from escherichia coli. J.Mol. Biol., 186:117–128, 1985.
Article Google Scholar
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
Book MATH Google Scholar
S. Henikoff and J. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA, 89:10915–10919, 1992.
Article Google Scholar
L. C. K. Hui. Color set size problem with applications to string matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.
Google Scholar
I. Jonassen. Efficient discovery of conserved patterns using a pattern graph. Comput. Appl. Biosci., 13:509–522, 1997.
Google Scholar
I. Jonassen, J. F. Collins, and D. G. Higgins. Finding flexible patterns in unaligned protein sequences. Protein Science, 4:1587–1595, 1995.
Article Google Scholar
I. Jonassen, I. Eidhammer, and W. R. Taylor. Discovery of local packing motifs in protein structures. Proteins: Structure, Function, and Genetics, 34:206–219, 1999.
Article Google Scholar
S. K. Kannan and E. W. Myers. An algorithm for locating non-overlapping regions of maximum alignment score. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 684 of Lecture Notes in Computer Science, page 7486. Springer-Verlag, 1993.
Google Scholar
R. Karp, R. Miller, and A. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proc. 4th Annu. ACM Symp. Theory of Computing, pages 125–136, 1972.
Google Scholar
A. Klingenhoff, K. Frech, K. Quandt, and T. Werner. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 1, 15:180–186, 1999.
Article Google Scholar
G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In Z. G. A. Apostolico, M. Crochemore and U. Manber, editors, Combinatorial Pattern Matching, volume 684 of Lecture Notes in Computer Science, pages 120–133. Springer-Verlag, 1993.
Chapter Google Scholar
L. Marsan and M.-F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Computational Biology, 7:345–362, 2000.
Article Google Scholar
algorithms and application to promoter consensus identification. In S. Istrail, P. Pevzner, and M. Waterman, editors, RECOMB’00. Proceedings of Fourth Annual International Conference on Computational Molecular Biology. ACM Press, 2000.
Google Scholar
E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23:262–272, 1976.
Article MATH MathSciNet Google Scholar
A. Milosavljevic and J. Jurka. Discovering simple DNA sequences by the algorithmic significance method. Comput. Appl. Biosci., 9:407–411, 1993.
Google Scholar
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and polynomial time algorithms. In Proc. of the eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 297–308. ACM Press, 2000.
Google Scholar
J. Pothier. 1993. Personal communication.
Google Scholar
C. Queen, M. N. Wegman, and L. J. Korn. Improvements to a program for DNA analysis: a procedure to find homologies among many sequences. Nucleic Acids Res., 10:449–456, 1982.
Article Google Scholar
G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan. Stereochemistry of polypeptide chain configurations. J. Mol. Biol., 87:95–99, 1963.
Article Google Scholar
J. Risler, M. Delorme, H. Delacroix, and A. Hénaut. Amino acid substitutions in srtucturally related proteins: a pattern recognition approach. J. Mol. Biol., 204:1019–1029, 1988.
Article Google Scholar
E. Rivals and O. Delgrange. A first step toward chromosome analysis by compression algorithms. In N. G. Bourbakis, editor, First International IEEE Symposium on Intelligence in Neural and Biological Systems, pages 233–239. IEEE Computer Society Press, 1995.
Chapter Google Scholar
M.-F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. In R. Baeza-Yates and U. Manber, editors, Second South American Workshop on String Processing pages 87–100, Viñas del Mar, Chili, 1995. University of Chili.
Google Scholar
M.-F. Sagot and E. W. Myers. Identifying satellites and periodic repetitions m biological sequences. J. of Computational Biology, 10:10–20, 1998.
Google Scholar
M.-F. Sagot and E. W. Myers. Identifying satellites in nucleic acid sequences. In S. Istrail, P. Pevzner, and M. Waterman, editors, RECOMB’98 Proceedings of Second Annual International Conference on Computational Molecular Biology, pages 234–242. ACM Press, 1998.
Chapter Google Scholar
M.-F. Sagot and A. Viari. A double combinatorial approach to discovering patterns in biological sequences. In D. Hirschberg and G. Myers, editors Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science, pages 186–208. Springer-Verlag, 1996.
Chapter Google Scholar
M. F. Sagot, A. Viari, J. Pothier, and H. Soldano. Finding flexible patterns in a text — an application to 3D molecular matching. Comput. Appl. Biosci., 11:59–70, 1995.
Google Scholar
M.-F. Sagot, A. Viari, and H. Soldano.A distance-based block searching algorithm. In C. Rawlings, D. Clark, R. Altman, L. Hunter, T. Lengauer, and S. Wodak, editors, Third International Symposium on Intelligent Systems for Molecular Biology, pages 322–331, Cambridge, England, 1995. AAAI Press.
Google Scholar
M.-F. Sagot, A. Viari, and H. Soldano. Multiple comparison: a peptide matching approach. Theoret Comput Sci., 180:115–137, 1997.presented at Combinatorial Pattern Matching 1995.
Article MATH MathSciNet Google Scholar
H. Soldano, A. Viari, and M. Champesme. Searching for flexible repeated patterns using a non transitive similarity relation. Pattern Recognition Letters, 16:233–246, 1995.
Article Google Scholar
R. Staden. Methods for discovering novel motifs in nucleic acid sequences. Comput Appl. Biosci., 5:293–298, 1989.
Google Scholar
E. Ukkonen. Constructing suffix trees on-line in linear time. In IFIP’92, pages 484–492, 1992.
Google Scholar
J. van Helden, A. F. Rios, and J. Collado-Vides. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res., 28:1808–1818, 2000.
Article Google Scholar
A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot. Inferring regulatory elements from a whole genome. An analysis of the σ ⁸⁰ family of promoter signals. J.Mol. Biol. 297:335–353, 2000.
Article Google Scholar
A. Vanet, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithmical methods for identifying them. Research in Microbiology, 150:779–799, 1999.
Article Google Scholar
R. Verin and M. Crochemore. Direct construction of compact directed acyclic word graphs. In A. Apostolico and J. Hein, editors, Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 116–129. Springer-Verlag, 1997.
Google Scholar
M. S. Waterman. General methods of sequence comparison. Bull. Math. Biol., 46:473–500, 1984.
Article MATH MathSciNet Google Scholar
M. S. Waterman. Multiple sequence alignments by consensus. Nucleic Acids Res., 14:9095–9102, 1986.
Article Google Scholar
M. S. Waterman. Consensus patterns in sequences. In M. S. Waterman, editor, Mathematical Methods for DNA Sequences, pages 93–116. CRC Press, 1989.
Google Scholar

Download references

Authors

M.-F. Sagot
View author publications
You can also search for this author in PubMed Google Scholar
Y. Wakabayashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Equipe Combinatoire, CNRS, Paris, France
Bruce A. Reed
McGill University, Montreal, Canada
Bruce A. Reed
Departamento de Computacao—LIA, Universidade Federal do Ceara, Campus do Pici—Bloco 910, CEP 60455-760, Fortaleza, CE, Brasil
Cláudia L. Sales

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sagot, MF., Wakabayashi, Y. (2003). Pattern Inference under many Guises. In: Reed, B.A., Sales, C.L. (eds) Recent Advances in Algorithms and Combinatorics. CMS Books in Mathematics / Ouvrages de mathématiques de la SMC. Springer, New York, NY. https://doi.org/10.1007/0-387-22444-0_8

Download citation

DOI: https://doi.org/10.1007/0-387-22444-0_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4684-9268-2
Online ISBN: 978-0-387-22444-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics