Advertisement

An Index Structure for Spaced Seed Search

  • Taku Onodera
  • Tetsuo Shibuya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7074)

Abstract

In this paper, we introduce an index structure of texts which supports fast search of patterns with “don’t care”s in predetermined positions. This data structure is a generalization of the suffix array and has many applications especially for computational biology. We propose three algorithms to construct the index. Two of them are based on a variant of radix sort but each utilizes different types of referential information to sort suffixes by multiple characters at a time. The other is for the case when “don’t care”s appear periodically in patterns and can be combined with the others.

Keywords

Index Structure Binary String Fast Search Empty String Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andersson, A., Nilsson, S.: A new efficient radix sort. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pp. 714–721. IEEE Computer Society, Washington, DC, USA (1994)CrossRefGoogle Scholar
  2. 2.
    Brown, D.G.: A Survey of Seeding for Sequence Alignment, pp. 117–142. John Wiley & Sons, Inc. (2007)Google Scholar
  3. 3.
    Crochemore, M., Tischler, G.: The Gapped Suffix Array: A New Index Structure for Fast Approximate Matching. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 359–364. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Kärkkäinen, J., Sanders, P.: Simple Linear Work Suffix Array Construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time Longest-Common-prefix Computation in Suffix Arrays and its Applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Ko, P., Aluru, S.: Space Efficient Linear Time Construction of Suffix Arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Lam, T.W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space Efficient Indexes for String Matching with Don’T Cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Larsson, N.J., Sadakane, K.: Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1–20/(1999), Department of Computer Science, Lund University, Sweden (May 1999)Google Scholar
  9. 9.
    Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., Li, M.: Zoom! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)CrossRefGoogle Scholar
  10. 10.
    Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  11. 11.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1990, pp. 319–327. Society for Industrial and Applied Mathematics, Philadelphia (1990)Google Scholar
  12. 12.
    Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Storer, J.A., Marcellin, M.W. (eds.) DCC, pp. 193–202. IEEE Computer Society (2009)Google Scholar
  13. 13.
    Sohel Rahman, M., Iliopoulos, C.S.: Pattern matching algorithms with don’t cares. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plasil, F., Bieliková, M. (eds.) SOFSEM (2), pp. 116–126. Institute of Computer Science AS CR, Prague (2007)Google Scholar
  14. 14.
    Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: Shrimp: Accurate mapping of short color-space reads. PLoS Comput. Biol. 5(5), e1000386 (2009)CrossRefGoogle Scholar
  15. 15.
    Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Taku Onodera
    • 1
  • Tetsuo Shibuya
    • 1
  1. 1.Human Genome Center, Institute of Medical ScienceUniversity of TokyoTokyoJapan

Personalised recommendations