Skip to main content

Faster Algorithm for the Set Variant of the String Barcoding Problem

  • Conference paper
Combinatorial Pattern Matching (CPM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5029))

Included in the following conference series:

  • 546 Accesses

Abstract

A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings \({\cal S}\). In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal \(O(n|{\cal S}|\log^3 n)\)-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al. [1] which obtains the best possible approximation ratio (1 + ln n), providing \(NP\not\subseteq DTIME(n^{\log\log n})\). The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berman, P., DasGupta, B., Kao, M.Y.: Tight approximability results for test set problems in bioinformatics. Journal of Computer and System Sciences 71(2), 145–162 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  2. Borneman, J., Chrobak, M., Vedova, G.D., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17, 39–48 (2001)

    Google Scholar 

  3. DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Dna-bar: distinguisher selection for dna barcoding. Bioinformatics 21(16), 3424–3426 (2005)

    Article  Google Scholar 

  4. DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Highly scalable algorithms for robust string barcoding. International Journal of Bioinformatics Research and Applications 1(2), 145–161 (2005)

    Article  Google Scholar 

  5. Gerhold, D., Rushmore, T., Caskey, C.T.: DNA chips: promising toys have become powerful tools. Trends Biochem. Sci. 24(5), 168–173 (1999)

    Article  Google Scholar 

  6. Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symposium on Theory of Computing (STOC), pp. 125–136 (1972)

    Google Scholar 

  7. Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using Integer Linear Programming. Bioinformatics 20, 186–193 (2004)

    Article  Google Scholar 

  8. Lancia, G., Rizzi, R.: The approximability of the string barcoding problem. Algorithms for Molecular Biology 1(12), 1–7 (2006)

    Google Scholar 

  9. Rash, S., Gusfield, D.: String Barcoding: Uncovering Optimal Virus Signatures. In: Proc. 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 254–261 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paolo Ferragina Gad M. Landau

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gąsieniec, L., Li, C.Y., Zhang, M. (2008). Faster Algorithm for the Set Variant of the String Barcoding Problem. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69068-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69066-5

  • Online ISBN: 978-3-540-69068-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics