Abstract
A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings \({\cal S}\). In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal \(O(n|{\cal S}|\log^3 n)\)-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al. [1] which obtains the best possible approximation ratio (1 + ln n), providing \(NP\not\subseteq DTIME(n^{\log\log n})\). The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berman, P., DasGupta, B., Kao, M.Y.: Tight approximability results for test set problems in bioinformatics. Journal of Computer and System Sciences 71(2), 145–162 (2005)
Borneman, J., Chrobak, M., Vedova, G.D., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17, 39–48 (2001)
DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Dna-bar: distinguisher selection for dna barcoding. Bioinformatics 21(16), 3424–3426 (2005)
DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Highly scalable algorithms for robust string barcoding. International Journal of Bioinformatics Research and Applications 1(2), 145–161 (2005)
Gerhold, D., Rushmore, T., Caskey, C.T.: DNA chips: promising toys have become powerful tools. Trends Biochem. Sci. 24(5), 168–173 (1999)
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symposium on Theory of Computing (STOC), pp. 125–136 (1972)
Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using Integer Linear Programming. Bioinformatics 20, 186–193 (2004)
Lancia, G., Rizzi, R.: The approximability of the string barcoding problem. Algorithms for Molecular Biology 1(12), 1–7 (2006)
Rash, S., Gusfield, D.: String Barcoding: Uncovering Optimal Virus Signatures. In: Proc. 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 254–261 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gąsieniec, L., Li, C.Y., Zhang, M. (2008). Faster Algorithm for the Set Variant of the String Barcoding Problem. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-69068-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69066-5
Online ISBN: 978-3-540-69068-9
eBook Packages: Computer ScienceComputer Science (R0)