Fingerprint Clustering with Bounded Number of Missing Values

  • Paola Bonizzoni
  • Gianluca Della Vedova
  • Riccardo Dondi
  • Giancarlo Mauri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)


The problem of clustering fingerprint vectors with missing values is an interesting problem in Computational Biology that has been proposed in [6]. In this paper we show some improvements in closing the gaps between the known lower bounds and upper bounds on the approximability of variants of the biological problem. Moreover, we have studied two additional variants of the original problem. We prove that all such problems are APX-hard even when each fingerprint contains only two unknown positions and we present a greedy algorithm that has constant approximation factors for these variants. Despite the hardness of these restricted versions of the problem, we show that the general clustering problem on an unbounded number of missing values such that they occur for every fixed position of an input vector in at most one fingerprint is polynomial time solvable.


Vertex Cover Maximal Clique Bound Number Constant Approximation Factor Minimum Vertex Cover 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theoretical Computer Science 237(1–2), 123–134 (2000)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial optimization problems and their approximability properties. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  3. 3.
    Drmanac, R.: cDNA screening by array hybridization. Meth. in Enzym. 303, 165–178 (1999)CrossRefGoogle Scholar
  4. 4.
    Drmanac, S., Drmanac, R.: Processing of cDNA and genomic kilobase-size clones for massive screening mapping and sequencing by hybridization. Biotechn. 17, 328–336 (1994)Google Scholar
  5. 5.
    Drmanac, S., Stavropoulos, N., Labat, I., Vonau, J., Hauser, B., Soares, M., Drmanac, R.: Gene-representation cDNA clusters defined by hybridization of 57 419 clones from infant brain libraries with short oligonucleotide probes. Genomics 37, 29–40 (1996)CrossRefGoogle Scholar
  6. 6.
    Figueroa, A., Borneman, J., Jiang, T.: Clustering binary fingerprint vectors with missing values for DNA array data analysis. Journal of Computational Biology 11(5), 887–901 (2004)CrossRefGoogle Scholar
  7. 7.
    Figueroa, A., Goldstein, A., Jiang, T., Kurowski, M., Lingas, A., Persson, M.: Approximate clustering of fingerprint vectors with missing values. In: Proc. 11th Computing: The Australasian Theory Symposium (CATS). CRPIT, vol. 41, pp. 57–60 (2005)Google Scholar
  8. 8.
    Valinsky, L., Della Vedova, G., Jiang, T., Borneman, J.: Oligonucleotide fingerprinting of rRNA genes for analysis of fungal community composition. Applied and Environmental Microbiology 68(12), 5999–6004 (2002)CrossRefGoogle Scholar
  9. 9.
    Valinsky, L., Della Vedova, G., Scupham, A., Alvey, S., Figueroa, A., Yin, B., Hartin, R., Chrobak, M., Crowley, D., Jiang, T., Borneman, J.: Analysis of bacterial microbial community composition by oligonucleotide fingerprinting of rRNA genes. Applied and Environmental Microbiology 68(7), 3243–3250 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Paola Bonizzoni
    • 1
  • Gianluca Della Vedova
    • 2
  • Riccardo Dondi
    • 3
  • Giancarlo Mauri
    • 1
  1. 1.DISCoUniversità degli Studi di Milano-BicoccaMilanoItaly
  2. 2.Dip. StatisticaUniversità degli Studi di Milano-BicoccaMilanoItaly
  3. 3.Dipartimento di Scienze dei Linguaggi, della Comunicazione e degli Studi CulturaliUniversità degli Studi di BergamoBergamoItaly

Personalised recommendations