Skip to main content
Log in

Clustering Binary Oligonucleotide Fingerprint Vectors for DNA Clone Classification Analysis

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

We considered the problem of clustering binarized oligonucleotide fingerprints that attempts to identify clusters. Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and rRNA libraries and has many applications including gene expression profiling and DNA clone classification. DNA clone classification is the main application for the problem considered in this paper. Most of the existing approaches for clustering use normalized real intensity values and thus do not treat positive and negative hybridization signals equally. This is demonstrated in a series of recent publications where a discrete approach typically useful in the classification of microbial rRNA clones has been proposed. In the discrete approach, hybridization intensities are normalized and thresholds are set such that a value of 1 represents hybridization, a value of 0 represents no hybridization, and an N represents unknown, which is also called a missing value. A combinatorial optimization problem is then formulated attempting to cluster the fingerprints and resolve the missing values simultaneously. It has been examined that missing values cause much difficulty in clustering analysis and most clustering methods are very sensitive to them. In this paper, we turned a little back to the traditional clustering problem, which takes in no missing values but with the revised goal to stabilize the number of clusters and maintain the clustering quality. We adopted the binarizing scheme used in the discrete approach as it is shown to be typically useful for the clone classifications. We formulated such a problem into another combinatorial optimization problem. The computational complexity of this new clustering problem and its relationships to the discrete approach and the traditional clustering problem were studied. We have designed an exact algorithm for the new clustering problem, which is an A* search algorithm for finding a minimum number of clusters. The experimental results on two commonly tested real datasets demonstrated that the A* search algorithm runs fast and performs better than some popular hierarchical clustering methods, in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • T. Beissbarth, K. Fellenberg, B. Brors, R. Arribas-Prat, J.M. Boer, N.C. Hauser, M. Scheideler, J.D. Hoheisel, G. SchSutz, A. Poustka, and M. Vingron, “Processing and quality control of DNA array hybridization data,” Bioinformatics, vol. 16, pp. 1014–1022, 2000.

    CAS  PubMed  Google Scholar 

  • A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering gene expression patterns,” Journal of Computational Biology, vol. 6, pp. 281–297, 1999.

    CAS  PubMed  Google Scholar 

  • R. Drmanac and S. Drmanac, “cDNA screening by array hybridization,” Methods in Enzymology, vol. 303, pp. 165–178, 1999.

    CAS  PubMed  Google Scholar 

  • S. Drmanac, N. Stavropoulos, I. Labat, J. Vonau, B. Hauser, M. Soares, and R. Drmanac, “Gene representing cDNA clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes,” Genomics, vol. 37, pp. 29–40, 1996.

    CAS  PubMed  Google Scholar 

  • M. Eisen, P. Spellman, P. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” in Proceedings of the National Academy of Sciences of the United States of America, vol. 95, pp. 14863–14868, 1998.

  • A. Figueroa, J. Borneman, and T. Jiang, “Clustering binary fingerprint vectors with missing values for DNA array data analysis,” in Proceedings of the Second IEEE Computer Society Computational Systems Bioinformatics Conference (CSB’03), 2003, pp. 38–47.

  • A. Figueroa, J. Borneman, and T. Jiang, “Clustering binary fingerprint vectors with missing values for DNA array data analysis,” Journal of Computational Biology, vol. 11, pp. 887–910, 2004.

    CAS  PubMed  Google Scholar 

  • M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness, W.H. Freeman and Company, San Francisco, 1979.

    Google Scholar 

  • E. Hartuv, A. Schmitt, J. Lange, S. Meier-Ewert, H. Lehrach, and R. Shamir, “An algorithm for clustering cDNA fingerprints,” Genomics, vol. 66, pp. 249–256, 2000.

    CAS  PubMed  Google Scholar 

  • R. Herwig, A. Poustka, C. MiSuller, C. Bull, H. Lehrach, and J. O’Brien, “Large-scale clustering of cDNA-fingerprinting data,” Genome Research, vol. 9, pp. 1093–1105, 1999.

    CAS  PubMed  Google Scholar 

  • G. McLachlan, R. Bean, and D. Peel, “A mixture model-based approach to the clustering of microarray expression data,” Bioinformatics, vol. 18, pp. 413–422, 2002.

    CAS  PubMed  Google Scholar 

  • S. Meier-Ewert, J. Lange, H. Gerts, R. Herwig, A. Schmitt, J. Freund, T. Elge, R. Mott, B. Herrmann, and H. Lehrach, “Comparative gene expression profiling by oligonucleotide fingerprinting,” Nucleic Acids Research, vol. 26, pp. 2216–2223, 1998.

    CAS  PubMed  Google Scholar 

  • A. Milosavljević, Z. Strezosca, M. Zeremski, D. Grujić, T. Paunesku, and R. Crkvenjakov, “Clone clustering by hybridization,” Genomics, vol. 27, pp. 83–89, 1995.

    PubMed  Google Scholar 

  • I. Shmulevich and W. Zhang, “Binary analysis and optimization-based normalization of gene expression data,” Bioinformatics, vol. 18, pp. 555–565, 2002.

    CAS  PubMed  Google Scholar 

  • D.L. Swofford, “Paup: Phylogenetic analysis using parsimony, 2002,” Version 4.0 Beta 10.

  • I. Takahiro and I. Hiroshi, “Fast A* algorithms for multiple sequence alignment,” in Genome Informatics Workshop 94, 1994, pp. 90–99.

  • P. Tamayo, J. Slonim, D. Mesirov, J. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub, “Interpreting patterns of gene expression with selforganizing maps: methods and applications to hematopoietic differention,” in Proceedings of the National Academy of Sciences of the United States of America, vol. 96, pp. 2907–2912, 1999.

  • L. Valinsky, G. Della Vedova, T. Jiang, and J. Borneman, “Oligonucleotide fingerprinting of ribosomal RNA genes for analysis of fungal community composition,” Applied and Environmental Microbiology, vol. 68, pp. 5999–6004, 2002a.

    CAS  Google Scholar 

  • L. Valinsky, G. Della Vedova, A. Scupham, S. Alvey, A. Figueroa, B. Yin, R. Hartin, M. Chrobak, D. Crowley, T. Jiang, and J. Borneman, “Analysis of bacterial community composition by oligonucleotide fingerprinting of rRNA genes,” Applied and Environmental Microbiology, vol. 68, pp. 3243–3250, 2002b.

    CAS  Google Scholar 

  • E. Xing and R. Karp, “Cliff: Clustering of highdimensional microarray data via iterative feature filtering using normalized cuts,” Bioinformatics, vol. 17, pp. S306–S315, 2001.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohui Lin.

Additional information

Supported by NSERC and CFI.

Supported by NSERC.

Supported partially by NSERC, CFI, and NNSF Grant 60373012.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, Z., Heydari, M. & Lin, G. Clustering Binary Oligonucleotide Fingerprint Vectors for DNA Clone Classification Analysis. J Comb Optim 9, 199–211 (2005). https://doi.org/10.1007/s10878-005-6857-3

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-005-6857-3

Keywords

Navigation