Abstract
We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of length up to a few hundreds our seeds have a larger weight, hence a better filtration efficiency, than the ones known in the literature. In this context, we study in depth the specific case of Wichmann rulers and prove some preliminary results on the generalization of our approach to the larger class of unrestricted rulers.
This research is founded by the BioBITS Project Converging Technologies 2007, area: Biotechnology-ICT, Regione Piemonte.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundam. Inform. 56(1-2), 51–70 (2003)
Egidi, L., Manzini, G.: Spaced seeds design using perfect rulers. Technical Report TR-INF-2011-06-01-UNIPMN, Computer Science Department, UPO (2011), http://www.di.unipmn.it
Erdós, P., Gál, I.S.: On the representation of 1, 2, …, n by differences. Indagationes Math. 10, 379–382 (1948)
Farach-Colton, M., Landau, G.M., Sahinalp, S.C., Tsur, D.: Optimal spaced seeds for faster approximate string matching. J. Comput. Syst. Sci. 73(7), 1035–1044 (2007)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138(3), 253–263 (2004)
Kucherov, G., Noé, L., Roytberg, M.A.: Multiseed lossless filtration. IEEE/ACM Trans. Comput. Biology Bioinform. 2(1), 51–61 (2005)
Leech, J.: On the representation of 1, 2, …, n by differences. J. London Math. Soc. 31, 160–169 (1956)
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. J. Bioinformatics and Computational Biology 2(3), 417–440 (2004)
Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., Li, M.: Zoom! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)
Luschny, P.: Perfect and optimal rulers (2003), http://www.luschny.de/math/rulers/prulers.html
Ma, B., Li, M.: On the complexity of the spaced seeds. J. Comput. Syst. Sci. 73(7), 1024–1034 (2007)
Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Ma, B., Yao, H.: Seed optimization is no easier than optimal Golomb ruler design. In: Brazma, A., Miyano, S., Akutsu, T. (eds.) APBC. Advances in Bioinformatics and Computational Biology, vol. 6, pp. 133–144. Imperial College Press, London (2008)
Nicolas, F., Rivals, E.: Hardness of optimal spaced seed design. J. Comput. Syst. Sci. 74(5), 831–849 (2008)
Wichmann, B.: A note on restricted difference bases. J. London Math. Soc. 38, 465–466 (1962)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Egidi, L., Manzini, G. (2011). Spaced Seeds Design Using Perfect Rulers. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-24583-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)