Highly Scalable Algorithms for Robust String Barcoding

  • B. DasGupta
  • K. M. Konwar
  • I. I. Măndoiu
  • A. A. Shvartsman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3515)


String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.


Scalable Algorithm Greedy Selection Candidate Length Degenerate Base Bacterial Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Berman, P., DasGupta, B., Kao, M.-Y.: Tight approximability results for test set problems in bioinformatics. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 39–50. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Berman, P., DasGupta, B., Sontag, E.: Randomized approximation algorithms for set multicover problems with applications to reverse engineering of protein and gene networks. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 39–50. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Borneman, J., Chrobak, M., Vedova, G.D., Figueora, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 1, 1–9 (2001)Google Scholar
  4. 4.
    Cazalis, D., Milledge, T., Narasimhan, G.: Probe selection problem: Structure and algorithms. In: Proc. 8th Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2004), pp. 124–129 (2004)Google Scholar
  5. 5.
    Cheung, V.G., Nelson, S.F.: Whole genome amplification using a degenerate oligonucleotide primer allows hundreds of genotypes to be performed on less than one nanogram of genomic dna. Proc. Natl. Acad. Sci. USA. 93, 14676–14679 (1996)CrossRefGoogle Scholar
  6. 6.
    Chvátal, V.: A greedy heuristic for the set covering problem. Math. of Op. Res. 4, 233–235 (1979)zbMATHCrossRefGoogle Scholar
  7. 7.
    NCBI Completed Microbial Genomes (2004),
  8. 8.
    DasGupta, B., Konwar, K., Mandoiu, I.I., Shvartsman, A.: Highly scalable algorithms for robust string barcoding. ACM Computing Research Repository (2005) cs.DS/0502065Google Scholar
  9. 9.
    Dean, F.B., Hosono, S., Fang, L., Wu, X., Fawad Faruqi, A., Bray-Ward, P., Sun, Z., Zong, Q., Du, Y., Du, J., Driscoll, M., Song, W., Kingsmore, S.F., Egholm, M., Lasken, R.S.: Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. USA. 99, 5261–5266 (2002)CrossRefGoogle Scholar
  10. 10.
    Gharizadeh, B., Käller, M., Nyrén, P., Andersson, A., Uhlén, M., Lundeberg, J., Ahmadian, A.: Viral and microbial genotyping by a combination of multiplex competitive hybridization and specific extension followed by hybridization to generic tag arrays. Nucleic Acids Research 31(22) (2003)Google Scholar
  11. 11.
    Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Sys. Sci. 9, 256–278 (1974)zbMATHCrossRefGoogle Scholar
  12. 12.
    Linhart, C., Shamir, R.: The degenerate primer design problem. Bioinformatics 181, S172–S181 (2002)Google Scholar
  13. 13.
    Lovász, L.: On the ratio of optimal integral and fractional covers. Discrete Mathematics 13, 383–390 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Rash, S., Gusfield, D.: String barcoding: Uncovering optimal virus signatures. In: Proc. 6th Annual International Conference on Computational Biology, pp. 254–261 (2002)Google Scholar
  15. 15.
    Souvenir, R., Buhler, J., Stormo, G., Zhang, W.: Selecting degenerate multiplex PCR primers. In: Proc. 3rd Intl. Workshop on Algorithms in Bioinformatics (WABI), pp. 512–526 (2003)Google Scholar
  16. 16.
    Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • B. DasGupta
    • 1
  • K. M. Konwar
    • 2
  • I. I. Măndoiu
    • 2
  • A. A. Shvartsman
    • 2
  1. 1.Department of Computer ScienceUniversity of Illinois at ChicagoChicago
  2. 2.Computer Science and Engineering DepartmentUniversity of ConnecticutStorrs

Personalised recommendations