Skip to main content

Complexities of the Centre and Median String Problems

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Included in the following conference series:

Abstract

Given a finite set of strings, the median string problem consists in finding a string that minimizes the sum of the distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in Classification, in Speech and Pattern Recognition, and in Computational Biology. In the latter, Median String is related to the key problem of Multiple Alignment. In the recent literature, one finds a theorem stating the NP-completeness of the median string for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the median string problem is NP-complete for finite and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related centre string problem. Moreover, we study the parametrized complexity of both problems with respect to the number of input strings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. C. de la Higuera and F. Casacuberta. Topology of strings: Median string is NP-complete. Theoretical Computer Science, 230:39–48, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  2. X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. A ptas for distinguishing (sub)string selection. In ICALP, pages 740–751, 2002.

    Google Scholar 

  3. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999.

    Google Scholar 

  4. Michael R. Fellows, Jens Gramm, and Rolf Niedermeier. On the parameterized intractability of CLOSEST SUBSTRING and related problems. In Symposium on Theoretical Aspects of Computer Science, pages 262–273, 2002.

    Google Scholar 

  5. Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Exact solutions for CLOSEST STRING and related problems. In ISAAC, volume 2223 of LCNS, pages 441–453, 2001.

    MathSciNet  Google Scholar 

  6. Dan Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol., 55:141–154, 1993.

    MATH  Google Scholar 

  7. Dan Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997.

    Google Scholar 

  8. Tao Jiang, Eugene L. Lawler, and Lusheng Wang. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315, 1996.

    MATH  MathSciNet  Google Scholar 

  9. T. Kohonen. Median strings. Pattern Recognition Letters, 3:309–313, 1985.

    Article  Google Scholar 

  10. J. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In SODA: ACM-SIAM Symposium on Discrete Algorithms, 1999.

    Google Scholar 

  11. V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and Reverseals. Cybernetics and Control Theory, 10(8):707–710, 1966.

    MathSciNet  Google Scholar 

  12. M. Li, B. Ma, and L. Wang. On the closest string and substing problems. Journal of the ACM, 49(2):157–171, 2002.

    Article  MathSciNet  Google Scholar 

  13. Ming Li, Bin Ma, and Lusheng Wang. Finding similar regions in many strings. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), pages 473–482, 1999.

    Google Scholar 

  14. Bin Ma. A polynomial time approximation scheme for the closest substring problem. In CPM, volume 1848 of LNCS, pages 99–107, 2000.

    Google Scholar 

  15. D. Maier. The complexity of some problems on subsequences and supersequences. Journal of the Association for Computing Machinery, 25:322–336, 1978.

    MATH  MathSciNet  Google Scholar 

  16. L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol, 7(3–4):345–62, 2000.

    Article  Google Scholar 

  17. C. D. Martinez, A. Juan, and F. Casacuberta. Improving classification using median string and nn rules. In Spanish Symp. on Pattern Recognition and Image Analysis, pages 391–395, 2001.

    Google Scholar 

  18. C. D. Martinez-Hinarejos, A. Juan, and F. Casacuberta. Use of median string for classification. In 15th International Conference on Pattern Recognition, volume 2, pages 907–910, september 2000.

    Google Scholar 

  19. Pavel Pevzner. Computational Molecular Biology. MIT Press, 2000.

    Google Scholar 

  20. Krzysztof Pietrzak. On the parameterized complexity of the fixed alphabet shortest common supersequence and longest common subsequence problems. Journal of Computer and System Sciences, 2003. to appear.

    Google Scholar 

  21. J. S. Sim and K. Park. The consensus string problem for a metric is NP-complete. In R. Raman and J. Simpson, editors, Proceedings of the 10th Australasian Workshop On Combinatorial Algorithms, pages 107–113, Perth, WA, Australia, 1999.

    Google Scholar 

  22. David J. States and Pankaj Agarwal. Compact encoding strategies for DNA sequence similarity search. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, pages 211–217. AAAI Press, 1996.

    Google Scholar 

  23. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168–173, 1974.

    Article  MATH  MathSciNet  Google Scholar 

  24. L. Wang and D. Gusfield. Improved approximation algorithms for tree alignment. J. Algorithms, 25(2):255–273, 1997.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolas, F., Rivals, E. (2003). Complexities of the Centre and Median String Problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics