Abstract
In a sequence, approximate patterns are exponential in number. In this paper, we present a new notion of basis for the patterns with don’t cares occurring in a given text (sequence). The primitive patterns are of interest since their number is lower than previous known definitions (and in a case, sub-linear in the size of the text), and these patterns can be used to extract all the patterns of a text.
We present an incremental algorithm that computes the primitive patterns occurring at least q times in a text of length n, given the N primitive patterns occurring at least q−1 times, in time O(|Σ|Nn 2log2 n log log n). In the particular case where q = 2, the complexity in time is only O(|Σ|n 2 log2 n log log n). We also give an algorithm that decides if a given pattern is primitive in a given text.
Supported by the Ministry of Research (France) with the “Bio-ingénierie 2000” program, and the CIFRE convention, in a collaboration between ExonHit Therapeutics and ABISS — http://johann.jalix.org/research
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
A. Apostolico. Pattern discovery and the algorithmics of surprise. In P. Frasconi and R. Shamir, editors, Proceedings of the NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics, October 2001.
A. Apostolico and L. Parida. Compression and the wheel of fortune. In Proceedings of Data Compression Conference (DCC), Snowbird, Utah, March 2003.
M. Crochemore, C. Hancart, and T. Lecroq. Algorithmique du Texte. Vuibert, 2001.
M. Crochemore and M.-F. Sagot. Motifs in sequences: localization and extraction. In A. Konopka and al., editors, Handbook of Computational Chemistry. Marcel Dekker, Inc, 2001.
M. J. Fischer and M. S. Paterson. String matching and other products. SIAM-AMS proceedings, pages 113–125, 1974.
I. Jonassen, J. Collins, and D. Higgins. Finding flexible Patterns in unaligned protein sequences. Protein Science, pages 1587–1595, 1995.
C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Science, volume 262, page 208. 1993.
L. Marsan and M.-F. Sagot. Extracting structured motifs using a suffix tree — Algorithms and application to consensus identification. In S. Minoru and R. Shamir, editors, Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB), Tokyo, Japan, 2000. ACM Press.
B. Morgenstern, A. Dress, and T. Werner. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In Proceedings of the National Academy of Sciences USA, pages 1209–12103, 1996.
L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm. In Proceedings of the 11th Symposium on Discrete Algorithms, pages 297–308, 2000.
J. Pelfrêne. Indexation de motifs approches. Rapport de DÉA, September 2000.
J. Pelfrêne, S. Abdeddaïm, and J. Alexandre. Un algorithme d’indexation de motifs approchés (poster and short talk). In Journées Ouvertes Biologie Informatique Mathématiques, Saint-Malo, pages 263–264, June 2002.
N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical report, Università di Pisa, February 2003.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computer (Arch. Elektron. Rechnen), 7:281–292, 1971.
G. Schuler, S. Altschul, and D. Lipman. Proteins: Structure, Function, and Genetics, volume 9, pages 180–190. 1991.
J. Wang, B. Shapiro, and D. Shasha. Pattern Discovery in Biomolecular Data. Oxford University Press, 1999.
M. Waterman and R. Jones. Methods in enzymology, page 221. Academic Press, London, 1990. pp. 348–360, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pelfrêne, J., Abdeddaïm, S., Alexandre, J. (2003). Extracting Approximate Patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_24
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive