A linear-time algorithm for finding approximate shortest common superstrings

Ukkonen, Esko

doi:10.1007/BF01840391

A linear-time algorithm for finding approximate shortest common superstrings

Published: June 1990

Volume 5, pages 313–323, (1990)
Cite this article

Algorithmica Aims and scope Submit manuscript

Esko Ukkonen¹

495 Accesses
30 Citations
Explore all metrics

Abstract

Approximate shortest common superstrings for a given setR of strings can be constructed by applying the greedy heuristics for finding a longest Hamiltonian path in the weighted graph that represents the pairwise overlaps between the strings inR. We develop an efficient implementation of this idea using a modified Aho-Corasick string-matching automaton. The resulting common superstring algorithm runs in timeO(n) or in timeO(n min(logm, log¦Σ¦)) depending on whether or not the goto transitions of the Aho-Corasick automaton can be implemented by direct indexing over the alphabet Σ. Heren is the total length of the strings inR andm is the number of such strings. The best previously known method requires timeO(n logm) orO(n logn) depending on the availability of direct indexing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

A. V. Aho and M. J. Corasick: Efficient string matching: an aid to bibliographic search.Comm. ACM 18 (1975), 333–340.
Article MATH MathSciNet Google Scholar
J. K. Gallant: String Compression Algorithms. Ph.D. Thesis, Princeton University, Princeton, NJ, 1982.
Google Scholar
J. Gallant, D. Maier, and J. A. Storer: On finding minimal length superstrings.J. Comput. System Sci. 20 (1980), 50–58.
Article MATH MathSciNet Google Scholar
M. R. Garey and D. S. Johnson:Computers and Intractability. Freeman, San Francisco, 1979.
MATH Google Scholar
T. R. Gingeras, J. P. Milazzo, D. Sciaky, and R. J. Roberts: Computer programs for the assembly of DNA sequences.Nucleic Acids Res. 7 (1979), 529–545.
Article Google Scholar
D. Knuth, J. Morris, and V. Pratt: Fast pattern matching in strings.SIAM J. Comput. 6 (1977), 323–350.
Article MATH MathSciNet Google Scholar
H. Peltola, J. Söderlund, J. Tarhio, and E. Ukkonen: Algorithms for some string-matching problems arising in molecular genetics, inInformation Processing (R. E. A. Mason, ed.). Elsevier Science, Amsterdam, 1983, pp. 59–64.
Google Scholar
H. Peltola, H. Söderlund, and E. Ukkonen: SEQAID: a DNA sequence assembling program based on a mathematical model.Nucleic Acids Res. 12 (1984), 307–321.
Article Google Scholar
R. Staden: Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing.Nucleic Acids Res. 10 (1982), 4731–4751.
Article Google Scholar
J. Tarhio and E. Ukkonen: A greedy algorithm for constructing shortest common superstrings, inMathematical Foundations of Computer Science. Lecture Notes in Computer Science, Vol. 233. Springer-Verlag, Berlin, 1986, pp. 602–610.
Google Scholar
J. Tarhio and E. Ukkonen: A greedy approximation algorithm for constructing shortest common superstrings.Theoret. Comput. Sci. 57 (1988), 131–145.
Article MATH MathSciNet Google Scholar
J. S. Turner: Approximation Algorithms for the Shortest Common Superstring Problem. Technical Report WUCS-86-16, Department of Computer Science, Washington University, Saint Louis, MO, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Teollisuuskatu 23, SF-00510, Helsinki, Finland
Esko Ukkonen

Authors

Esko Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by Robert Sedgewick.

This work was supported by the Academy of Finland.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ukkonen, E. A linear-time algorithm for finding approximate shortest common superstrings. Algorithmica 5, 313–323 (1990). https://doi.org/10.1007/BF01840391

Download citation

Received: 01 May 1987
Revised: 02 May 1988
Issue Date: June 1990
DOI: https://doi.org/10.1007/BF01840391

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A linear-time algorithm for finding approximate shortest common superstrings

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

On the Existential Arithmetics with Addition and Bitwise Minimum

On the Practical Power of Automata in Pattern Matching

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

A linear-time algorithm for finding approximate shortest common superstrings

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

On the Existential Arithmetics with Addition and Bitwise Minimum

On the Practical Power of Automata in Pattern Matching

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation