Faster Algorithm for the Set Variant of the String Barcoding Problem

Gąsieniec, Leszek; Li, Cindy Y.; Zhang, Meng

doi:10.1007/978-3-540-69068-9_10

Leszek Gąsieniec¹,
Cindy Y. Li² &
Meng Zhang³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5029))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

546 Accesses

Abstract

A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings \({\cal S}\). In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal \(O(n|{\cal S}|\log^3 n)\)-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al. [1] which obtains the best possible approximation ratio (1 + ln n), providing \(NP\not\subseteq DTIME(n^{\log\log n})\). The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berman, P., DasGupta, B., Kao, M.Y.: Tight approximability results for test set problems in bioinformatics. Journal of Computer and System Sciences 71(2), 145–162 (2005)
Article MATH MathSciNet Google Scholar
Borneman, J., Chrobak, M., Vedova, G.D., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17, 39–48 (2001)
Google Scholar
DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Dna-bar: distinguisher selection for dna barcoding. Bioinformatics 21(16), 3424–3426 (2005)
Article Google Scholar
DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Highly scalable algorithms for robust string barcoding. International Journal of Bioinformatics Research and Applications 1(2), 145–161 (2005)
Article Google Scholar
Gerhold, D., Rushmore, T., Caskey, C.T.: DNA chips: promising toys have become powerful tools. Trends Biochem. Sci. 24(5), 168–173 (1999)
Article Google Scholar
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symposium on Theory of Computing (STOC), pp. 125–136 (1972)
Google Scholar
Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using Integer Linear Programming. Bioinformatics 20, 186–193 (2004)
Article Google Scholar
Lancia, G., Rizzi, R.: The approximability of the string barcoding problem. Algorithms for Molecular Biology 1(12), 1–7 (2006)
Google Scholar
Rash, S., Gusfield, D.: String Barcoding: Uncovering Optimal Virus Signatures. In: Proc. 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 254–261 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Liverpool, Liverpool, UK
Leszek Gąsieniec
Histocompatibility and Immunogenetics Laboratory, National Blood Service, Bristol, UK
Cindy Y. Li
College of Computer Science and Technology, Jilin University, Changchun, China
Meng Zhang

Authors

Leszek Gąsieniec
View author publications
You can also search for this author in PubMed Google Scholar
Cindy Y. Li
View author publications
You can also search for this author in PubMed Google Scholar
Meng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paolo Ferragina Gad M. Landau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gąsieniec, L., Li, C.Y., Zhang, M. (2008). Faster Algorithm for the Set Variant of the String Barcoding Problem. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-69068-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69066-5
Online ISBN: 978-3-540-69068-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics