Abstract
Approximate shortest common superstrings for a given setR of strings can be constructed by applying the greedy heuristics for finding a longest Hamiltonian path in the weighted graph that represents the pairwise overlaps between the strings inR. We develop an efficient implementation of this idea using a modified Aho-Corasick string-matching automaton. The resulting common superstring algorithm runs in timeO(n) or in timeO(n min(logm, log¦Σ¦)) depending on whether or not the goto transitions of the Aho-Corasick automaton can be implemented by direct indexing over the alphabet Σ. Heren is the total length of the strings inR andm is the number of such strings. The best previously known method requires timeO(n logm) orO(n logn) depending on the availability of direct indexing.
Similar content being viewed by others
References
A. V. Aho and M. J. Corasick: Efficient string matching: an aid to bibliographic search.Comm. ACM 18 (1975), 333–340.
J. K. Gallant: String Compression Algorithms. Ph.D. Thesis, Princeton University, Princeton, NJ, 1982.
J. Gallant, D. Maier, and J. A. Storer: On finding minimal length superstrings.J. Comput. System Sci. 20 (1980), 50–58.
M. R. Garey and D. S. Johnson:Computers and Intractability. Freeman, San Francisco, 1979.
T. R. Gingeras, J. P. Milazzo, D. Sciaky, and R. J. Roberts: Computer programs for the assembly of DNA sequences.Nucleic Acids Res. 7 (1979), 529–545.
D. Knuth, J. Morris, and V. Pratt: Fast pattern matching in strings.SIAM J. Comput. 6 (1977), 323–350.
H. Peltola, J. Söderlund, J. Tarhio, and E. Ukkonen: Algorithms for some string-matching problems arising in molecular genetics, inInformation Processing (R. E. A. Mason, ed.). Elsevier Science, Amsterdam, 1983, pp. 59–64.
H. Peltola, H. Söderlund, and E. Ukkonen: SEQAID: a DNA sequence assembling program based on a mathematical model.Nucleic Acids Res. 12 (1984), 307–321.
R. Staden: Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing.Nucleic Acids Res. 10 (1982), 4731–4751.
J. Tarhio and E. Ukkonen: A greedy algorithm for constructing shortest common superstrings, inMathematical Foundations of Computer Science. Lecture Notes in Computer Science, Vol. 233. Springer-Verlag, Berlin, 1986, pp. 602–610.
J. Tarhio and E. Ukkonen: A greedy approximation algorithm for constructing shortest common superstrings.Theoret. Comput. Sci. 57 (1988), 131–145.
J. S. Turner: Approximation Algorithms for the Shortest Common Superstring Problem. Technical Report WUCS-86-16, Department of Computer Science, Washington University, Saint Louis, MO, 1986.
Author information
Authors and Affiliations
Additional information
Communicated by Robert Sedgewick.
This work was supported by the Academy of Finland.
Rights and permissions
About this article
Cite this article
Ukkonen, E. A linear-time algorithm for finding approximate shortest common superstrings. Algorithmica 5, 313–323 (1990). https://doi.org/10.1007/BF01840391
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01840391