Abstract
Given a finite set of strings, the median string problem consists in finding a string that minimizes the sum of the distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in Classification, in Speech and Pattern Recognition, and in Computational Biology. In the latter, Median String is related to the key problem of Multiple Alignment. In the recent literature, one finds a theorem stating the NP-completeness of the median string for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the median string problem is NP-complete for finite and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related centre string problem. Moreover, we study the parametrized complexity of both problems with respect to the number of input strings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. de la Higuera and F. Casacuberta. Topology of strings: Median string is NP-complete. Theoretical Computer Science, 230:39–48, 2000.
X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. A ptas for distinguishing (sub)string selection. In ICALP, pages 740–751, 2002.
R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999.
Michael R. Fellows, Jens Gramm, and Rolf Niedermeier. On the parameterized intractability of CLOSEST SUBSTRING and related problems. In Symposium on Theoretical Aspects of Computer Science, pages 262–273, 2002.
Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Exact solutions for CLOSEST STRING and related problems. In ISAAC, volume 2223 of LCNS, pages 441–453, 2001.
Dan Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol., 55:141–154, 1993.
Dan Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997.
Tao Jiang, Eugene L. Lawler, and Lusheng Wang. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315, 1996.
T. Kohonen. Median strings. Pattern Recognition Letters, 3:309–313, 1985.
J. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In SODA: ACM-SIAM Symposium on Discrete Algorithms, 1999.
V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and Reverseals. Cybernetics and Control Theory, 10(8):707–710, 1966.
M. Li, B. Ma, and L. Wang. On the closest string and substing problems. Journal of the ACM, 49(2):157–171, 2002.
Ming Li, Bin Ma, and Lusheng Wang. Finding similar regions in many strings. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), pages 473–482, 1999.
Bin Ma. A polynomial time approximation scheme for the closest substring problem. In CPM, volume 1848 of LNCS, pages 99–107, 2000.
D. Maier. The complexity of some problems on subsequences and supersequences. Journal of the Association for Computing Machinery, 25:322–336, 1978.
L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol, 7(3–4):345–62, 2000.
C. D. Martinez, A. Juan, and F. Casacuberta. Improving classification using median string and nn rules. In Spanish Symp. on Pattern Recognition and Image Analysis, pages 391–395, 2001.
C. D. Martinez-Hinarejos, A. Juan, and F. Casacuberta. Use of median string for classification. In 15th International Conference on Pattern Recognition, volume 2, pages 907–910, september 2000.
Pavel Pevzner. Computational Molecular Biology. MIT Press, 2000.
Krzysztof Pietrzak. On the parameterized complexity of the fixed alphabet shortest common supersequence and longest common subsequence problems. Journal of Computer and System Sciences, 2003. to appear.
J. S. Sim and K. Park. The consensus string problem for a metric is NP-complete. In R. Raman and J. Simpson, editors, Proceedings of the 10th Australasian Workshop On Combinatorial Algorithms, pages 107–113, Perth, WA, Australia, 1999.
David J. States and Pankaj Agarwal. Compact encoding strategies for DNA sequence similarity search. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, pages 211–217. AAAI Press, 1996.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168–173, 1974.
L. Wang and D. Gusfield. Improved approximation algorithms for tree alignment. J. Algorithms, 25(2):255–273, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nicolas, F., Rivals, E. (2003). Complexities of the Centre and Median String Problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_23
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive