Advertisement

Scalable Parallel Suffix Array Construction

  • Fabian Kulla
  • Peter Sanders
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4192)

Abstract

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction. The implementation works on distributed memory computers using MPI, Experiments with up to 128 processors show good constant factors and make it look likely that the algorithm would also scale to considerably larger systems. This makes it possible to build suffix arrays for huge inputs very quickly. Our algorithm is a parallelization of the linear time DC3 algorithm.

Keywords

Load Imbalance Suffix Array Array Construction Parallel Sorting Full Text Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms. Prentice-Hall, Englewood Cliffs (1992)Google Scholar
  3. 3.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: The enhanced suffix array and its applications to genome analysis. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 449–463. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto) (1994)Google Scholar
  5. 5.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching, pp. 186–199. Springer, Heidelberg (2003) (to appear)CrossRefGoogle Scholar
  7. 7.
    Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. In: Workshop on Algorithm Engineering & Experiments, Vancouver, pp. 86–97 (2005)Google Scholar
  10. 10.
    Iliopoulos, C.S., Rytter, W.: On parallel transformations of suffix arrays into suffix trees. In: 15th Australasian Workshop on Combinatorial Algorithms (AWOCA) (2004)Google Scholar
  11. 11.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM (to appear, 2006)Google Scholar
  12. 12.
    Futamura, N., Aluru, S., Kurtz, S.: Parallel suffix sorting. In: Proc. 9th International Conference on Advanced Computing and Communications, pp. 76–81. McGraw-Hill, New York (2001)Google Scholar
  13. 13.
    Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 698–710. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Shi, H., Schaeffer, J.: Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing 14, 361–372 (1992)MATHCrossRefGoogle Scholar
  15. 15.
    Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI – the Complete Reference. MIT Press, Cambridge (1996)Google Scholar
  16. 16.
    Smyth, B., Turpin, A.: The performance of linear time suffix sorting algorithms. In: IEEE Data Compression Conference (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Fabian Kulla
    • 1
  • Peter Sanders
    • 1
  1. 1.Universität KarlsruheKarlsruheGermany

Personalised recommendations