Color Set Size problem with applications to string matching

  • Lucas Chi
  • Kwong Hui
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 644)


The Color Set Size problem is: Given a rooted tree of size n with l leaves colored from 1 to m, ml, for each vertex u find the number of different leaf colors in the subtree rooted at u. This problem formulation, together with the Generalized Suffix Tree data structure has applications to string matching. This paper gives an optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m. In addition, parallel solutions to the above problems are given. These solutions may shed light on problems in computational biology, such as the multiple string alignment problem.


Internal Vertex String Match Suffix Tree Input String Machine Instruction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series, Series F: Computer and System Sciences, Vol. 12, pages 85–96, Springer-Verlag, Berlin, 1985.Google Scholar
  2. 2.
    A. Apostolico, C. Iliopoulos, G. M. Landau, B. Schieber, and U. Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3:347–365, 1988.Google Scholar
  3. 3.
    M. Ajtai, J. Komlós, and E. Szemerédi. An O(n log n) sorting network. In Proc. of the 15th ACM Symposium on Theory of Computing, pages 1–9, 1983.Google Scholar
  4. 4.
    S. Altschul and D. Lipman. Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Math, 49:197–209, 1989.Google Scholar
  5. 5.
    R. Cole. Parallel merge sort. In Proc. 27nd Annual Symposium on the Foundation of Computer Science, pages 511–516, 1986.Google Scholar
  6. 6.
    H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Math, 48:1073–1082, 1988.Google Scholar
  7. 7.
    R. Cole and U. Vishkin. The accelerated centroid decomposition technique for optimal parallel tree evaluation in logarithmic time. Algorithmica, 3:329–346, 1988.Google Scholar
  8. 8.
    D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Technical Report CSE-91-4, Computer Science, U. C. Davis, 1991.Google Scholar
  9. 9.
    T. Hagerup. Towards optimal parallel bucket sorting. Information and Computation, 75:39–51, 1987.Google Scholar
  10. 10.
    C. Kruskal, L. Rudolph, and M. Snir. The power of parallel prefix. IEEE Trans. Comput. C-34:965–968, 1985.Google Scholar
  11. 11.
    R. Lo. personal communications. 1991.Google Scholar
  12. 12.
    L. Ladner and M. Fischer. Parallel prefix computation. J.A.C.M., 27:831–838, 1980.Google Scholar
  13. 13.
    G. M. Landau and U. Vishkin. Introducing efficient parallelism into approximate string matching and a new serial algorithm. In Proc. of the 18th ACM Symposium on Theory of Computing, pages 220–230, 1986.Google Scholar
  14. 14.
    H. M. Martinez. An efficient method for finding repeats in molecular sequences. Nucleic Acids Research, 11(13):4629–4634, 1983.Google Scholar
  15. 15.
    E. M. McCreight. A space-economical suffix tree construction algorithm. J.A.C.M., 23(2):262–272, 1976.Google Scholar
  16. 16.
    Y. Maon, B. Schieber, and U. Vishkin. Open ear decomposition and s-t numbering in graphs. Theoretical Computer Science, 1987.Google Scholar
  17. 17.
    V. R. Pratt. Improvements and applications for the weiner repetition finder. 1975. unpublished manuscript.Google Scholar
  18. 18.
    S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM Journal on Computing, 18:594–607, 1989.Google Scholar
  19. 19.
    B. Schieber and U. Vishkin. On finding lowest common ancestors: simplification and parallelization. SIAM Journal on Computing, 17:1253–1262, 1988.Google Scholar
  20. 20.
    R. E. Tarjan and U. Vishkin. An efficient parallel biconnectivity algorithm. SIAM Journal on Computing, 14:862–874, 1985.Google Scholar
  21. 21.
    U. Vishkin. On efficient parallel strong orientation. I.P.L., 20:235–240, 1985.Google Scholar
  22. 22.
    P. Weiner. Linear pattern matching algorithms. In Proc. 14th IEEE Symp. on Switching and Automata Theory, pages 1–11, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Lucas Chi
    • 1
  • Kwong Hui
    • 1
  1. 1.Computer ScienceUniversity of California, DavisDavis

Personalised recommendations