Simple Linear Work Suffix Array Construction
recursively sort suffixes beginning at positions i mod 3 ≠ 0.
sort the remaining suffixes using the information obtained in step one.
merge the two sorted sequences obtained in steps one and two.
The algorithm is much simpler than previous linear time algorithms that are all based on the more complicated suffix tree data structure. Since sorting is a well studied problem, we obtain optimal algorithms for several other models of computation, e.g. external memory with parallel disks, cache oblivious, and parallel. The adaptations for BSP and EREW-PRAM are asymptotically faster than the best previously known algorithms.
KeywordsExternal Memory Annual Symposium Sorting Algorithm Suffix Tree Suffix Array
Unable to display preview. Download preview PDF.
- 3.S. Alstrup, C. Gavoille, H. Kaplan, and T. Rauhe. Nearest common ancestors: A survey and a new distributed algorithm. In Proc. 14th Annual Symposium on Parallel Algorithms and Architectures, pages 258–264. ACM, 2002.Google Scholar
- 4.M. A. Bender and M. Farach-Colton. The LCA problem revisited. In Proc. 4th Latin American Symposium on Theoretical INformatics, volume 1776 of LNCS, pages 88–94. Springer, 2000.Google Scholar
- 5.S. Burkhardt and J. Kärkkäinen. Fast lightweight suffix array construction and checking. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.Google Scholar
- 6.M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto), May 1994.Google Scholar
- 10.R. Dementiev and P. Sanders. Asynchronous parallel disk sorting. In Proc. 15th. Annual Symposium on Parallelism in Algorithms and Architectures. ACM, 2003. To appear.Google Scholar
- 11.M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137–143. IEEE, 1997.Google Scholar
- 12.M. Farach, P. Ferragina, and S. Muthukrishnan. Overcoming the memory bottleneck in suffix tree construction. In Proc. 39th Annual Symposium on Foundations of Computer Science, pages 174–183. IEEE, 1998.Google Scholar
- 13.M. Farach and S. Muthukrishnan. Optimal logarithmic time randomized suffix tree construction. In Proc. 23th International Conference on Automata, Languages and Programming, pages 550–561. IEEE, 1996.Google Scholar
- 15.M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science, pages 285–298. IEEE, 1999.Google Scholar
- 16.N. Futamura, S. Aluru, and S. Kurtz. Parallel suffix sorting. In Proc. 9th International Conference on Advanced Computing and Communications, pages 76–81. Tata McGraw-Hill, 2001.Google Scholar
- 18.G. Gonnet, R. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.Google Scholar
- 20.R. Grossi and G. F. Italiano. Suffix trees and their applications in string algorithms. Rapporto di Ricerca CS-96-14, Università “Ca’ Foscari” di Venezia, Italy, 1996.Google Scholar
- 21.D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
- 22.T. Hagerup and R. Raman. Waste makes haste: Tight bounds for loose parallel sorting. In Proc. 33rd Annual Symposium on Foundations of Computer Science, pages 628–637. IEEE, 1992.Google Scholar
- 25.J. Jájá. An Introduction to Parallel Algorithms. Addison Wesley, 1992.Google Scholar
- 26.J. Kärkkäinen. Suffix cactus: A cross between suffix tree and suffix array. In Z. Galil and E. Ukkonen, editors, Proc. 6th Annual Symposium on Combinatorial Pattern Matching, volume 937 of LNCS, pages 191–204. Springer, 1995.Google Scholar
- 27.T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 181–192. Springer, 2001.Google Scholar
- 28.D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.Google Scholar
- 29.P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, June 2003. To appear.Google Scholar
- 30.N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, Sweden, 1999.Google Scholar
- 33.M. H. Nodine and J. S. Vitter. Deterministic distribution sort in shared and distributed memory multiprocessors. In Proc. 5th Annual Symposium on Parallel Algorithms and Architectures, pages 120–129. ACM, 1993.Google Scholar
- 39.P. Weiner. Linear pattern matching algorithm. In Proc. 14th Symposium on Switching. and Automata Theory, pages 1–11. IEEE, 1973.Google Scholar