Spaced Seeds Design Using Perfect Rulers

  • Lavinia Egidi
  • Giovanni Manzini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7024)

Abstract

We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of length up to a few hundreds our seeds have a larger weight, hence a better filtration efficiency, than the ones known in the literature. In this context, we study in depth the specific case of Wichmann rulers and prove some preliminary results on the generalization of our approach to the larger class of unrestricted rulers.

References

  1. 1.
    Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundam. Inform. 56(1-2), 51–70 (2003)MathSciNetMATHGoogle Scholar
  2. 2.
    Egidi, L., Manzini, G.: Spaced seeds design using perfect rulers. Technical Report TR-INF-2011-06-01-UNIPMN, Computer Science Department, UPO (2011), http://www.di.unipmn.it
  3. 3.
    Erdós, P., Gál, I.S.: On the representation of 1, 2, …, n by differences. Indagationes Math. 10, 379–382 (1948)Google Scholar
  4. 4.
    Farach-Colton, M., Landau, G.M., Sahinalp, S.C., Tsur, D.: Optimal spaced seeds for faster approximate string matching. J. Comput. Syst. Sci. 73(7), 1035–1044 (2007)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138(3), 253–263 (2004)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Kucherov, G., Noé, L., Roytberg, M.A.: Multiseed lossless filtration. IEEE/ACM Trans. Comput. Biology Bioinform. 2(1), 51–61 (2005)CrossRefGoogle Scholar
  7. 7.
    Leech, J.: On the representation of 1, 2, …, n by differences. J. London Math. Soc. 31, 160–169 (1956)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. J. Bioinformatics and Computational Biology 2(3), 417–440 (2004)CrossRefGoogle Scholar
  9. 9.
    Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., Li, M.: Zoom! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)CrossRefGoogle Scholar
  10. 10.
    Luschny, P.: Perfect and optimal rulers (2003), http://www.luschny.de/math/rulers/prulers.html
  11. 11.
    Ma, B., Li, M.: On the complexity of the spaced seeds. J. Comput. Syst. Sci. 73(7), 1024–1034 (2007)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  13. 13.
    Ma, B., Yao, H.: Seed optimization is no easier than optimal Golomb ruler design. In: Brazma, A., Miyano, S., Akutsu, T. (eds.) APBC. Advances in Bioinformatics and Computational Biology, vol. 6, pp. 133–144. Imperial College Press, London (2008)Google Scholar
  14. 14.
    Nicolas, F., Rivals, E.: Hardness of optimal spaced seed design. J. Comput. Syst. Sci. 74(5), 831–849 (2008)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Wichmann, B.: A note on restricted difference bases. J. London Math. Soc. 38, 465–466 (1962)MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lavinia Egidi
    • 1
  • Giovanni Manzini
    • 1
  1. 1.Dipartimento di InformaticaUniversità del Piemonte OrientaleItaly

Personalised recommendations