Geometric Biclustering and Its Applications to Cancer Tissue Classification Based on DNA Microarray Gene Expression Data

  • Hongya Zhao
  • Hong YanEmail author
Part of the Applied Bioinformatics and Biostatistics in Cancer Research book series (ABB)


Biclustering is an important tool in microarray data analysis when only a subset of genes coregulates under a subset of conditions. It is a useful technique for cancer tissue classification based on gene expression data. Unlike standard clustering analysis, biclustering methodology can perform simultaneous classification on the two dimensions of genes and conditions in a data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this chapter, we present a novel geometric perspective of a biclustering problem and the related geometric algorithms. In the view of geometrical interpretation, different types of biclusters can be mapped to the linear geometric structures, such as points, lines, or hyperplanes in a high-dimensional data space. Such a perspective makes it possible to unify the formulation of biclusters and thus the biclustering process can be interpreted as a search for linear geometries in spatial space. Based on the linear geometry formulation, we develop Hough transform-based biclustering algorithms. Considering the computational complexity in searching the existence of noise in microarray data, and the biological meanings of biclusters, we propose several methods to improve the geometric biclustering algorithms. Simulation studies show that the algorithms can discover significant biclusters despite the increased noise level and regulatory complexity. Furthermore, the algorithms are also effective in extracting biologically meaningful biclusters from real microarray gene expression data.



This work is supported by a grant from the Hong Kong Research Grant Council (Projects CityU 122506 and 122607).


  1. Alizadeh AA, Eisen MB, Davis RE et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511CrossRefPubMedGoogle Scholar
  2. Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by olignucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750CrossRefPubMedGoogle Scholar
  3. Ballard DH (1981) Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn 13:111–122CrossRefGoogle Scholar
  4. Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  5. Barkow S, Bleuler S, Prelic A et al (2006) BicAT: a biclustering analysis toolbox. Bioinformatics 22:1282–1283CrossRefPubMedGoogle Scholar
  6. Ben-Dor A, Chor B, Karp R et al (2002) Discovering local structure in gene expression data: the order-preserving sub-matrix problem. In: Myers G et al (eds) Annual conference on research in computational molecular biology. Proceedings of the 6th annual international conference on Computational Biology. ACM, New York, pp 49–57Google Scholar
  7. Berrize GF, King OD, Bryant B et al (2003) Characterizing gene sets with FuncAssociate. Bioinformatics 19:2502–2504CrossRefGoogle Scholar
  8. Celveland WS (1993) Visualizing data. At & T Bell Labloratories, Murray Hill, NJGoogle Scholar
  9. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of 8th international conference on intelligent systems for molecular biology (ISMB’00), pp 93–103Google Scholar
  10. Cho RJ, Campbell MJ, Winzeler EA et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–73CrossRefPubMedGoogle Scholar
  11. Cowell JK, Hawthorn L (2007) The application of microarray technology to the analysis of the cancer genome. Curr Mol Med 7:103–120CrossRefPubMedGoogle Scholar
  12. Desper R, Khan J, Schaffer A (2004) Tumor classification using phylogenetic methods on expression data. J Theor Biol 228:477–496CrossRefPubMedGoogle Scholar
  13. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87CrossRefGoogle Scholar
  14. Fu A, Yan H (1997) A new probabilistic relaxation method based on probabilistic space partition. Pattern Recogn 30:1905–1917CrossRefGoogle Scholar
  15. Gan X, Liew AWC, Yan H (2008) Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinformatics 9:209CrossRefPubMedGoogle Scholar
  16. Goldenshluger A, Zeevi A (2004) The hough transform estimator. Ann. Stat. 32:1908–1932.CrossRefGoogle Scholar
  17. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRefPubMedGoogle Scholar
  18. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129CrossRefGoogle Scholar
  19. Ihmels J, Friedlander G, Bergmann S et al (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31:370–377PubMedGoogle Scholar
  20. Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20:1993–2003CrossRefPubMedGoogle Scholar
  21. Illingworth J, Kittler J (1988) A survey of the hough transform. Comput Vis Graph Image Process 44:87–116CrossRefGoogle Scholar
  22. Kittler J (2000) Probabilistic relaxation and the Hough transform. Pattern Recogn 33:705–714CrossRefGoogle Scholar
  23. Kittler J, Illingworh J (1985) A review of relaxation labeling, algorithm. Image Vis Comput 3:158–189CrossRefGoogle Scholar
  24. Lam B, Yan H (2006) Subdimension-based similarity measure for DNA microarray data clustering. Phys Rev E 74:041096Google Scholar
  25. Liew AWC, Yan H, Yang M (2005) Pattern recognition techniques for the emerging field of bioinformatics: A review. Pattern Recogn 38:2055–2073CrossRefGoogle Scholar
  26. Liu X, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23:50–56CrossRefPubMedGoogle Scholar
  27. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinformatics 1:24–45CrossRefGoogle Scholar
  28. Murli TM, Kasif S (2003) Extracting conserved gene expression motif from gene expression data. In: Proceedings of the 8th Pacific symposium on biocomputing, Lihue, Hawaii, pp 77–88Google Scholar
  29. Ochs MF, Godwin AK (2003) Microarrays in cancer: research and applications. BioTechniques 34:4–15Google Scholar
  30. Prelic A, Bleuler S, Zimmermann P et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22:1122–1129CrossRefPubMedGoogle Scholar
  31. Rosenfeld A, Hummel R, Zucker S (1976) Scene labeling by relaxation operations. IEEE Trans System Man Cybernet 6:420–433CrossRefGoogle Scholar
  32. Ross D, Scherf U, Eisen M et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:208–209CrossRefGoogle Scholar
  33. Son C, Bilke S, Davis S et al (2005) Database of mRNA gene expression profiles of multiple human organs. Genome Res 15:443–450CrossRefPubMedGoogle Scholar
  34. Stoughton RB (2005) Applications of DNA microarrays in biology. Annu Rev Biochem 74:53–82CrossRefPubMedGoogle Scholar
  35. Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:136–144.Google Scholar
  36. Tanay A, Sharan R, Shamir R (2006) Biclustering algorithms: a survey. In: Aluru S (ed) Handbook of computational molecular biology. Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
  37. The Gene Ontology Consortium (2000) Gene ontology tool for the unification of biology. Nat Genet 25:25–29CrossRefGoogle Scholar
  38. Theis FJ, Georgiev P, Cichocki (2007) A robust sparse component analysis based on a generalized Hough transform. EURASIP J Adv Signal Process 2007:13Google Scholar
  39. Wang L, Montano M, Rarick M et al (2008) Conditional clustering of temporal expression profiles. BMC Bioinformatics 9:147CrossRefPubMedGoogle Scholar
  40. Wu S, Liew AWC, Yan H et al (2004) Cluster analysis of gene expression data based on self-splitting and merging. IEEE Trans Inf Technol Biomed 8:5–15CrossRefPubMedGoogle Scholar
  41. Yang J, Wang W, Yu PS (2002) Delta-clusters: capturing subspace correlation in a large data set. In: Proceedings of 18th IEEE international conference on data engineering, 2002, pp 517–528Google Scholar
  42. Yoon S, Nardini C, Benini L et al (2005) Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams. IEEE ACM Trans Comput Biol Bioinformatics 2:339–354CrossRefGoogle Scholar
  43. Zhao H, Yan H (2007) HoughFeature: a novel method for assessing drug effects in three-color cDNA microarray experiments. BMC Bioinformatics 8:256CrossRefPubMedGoogle Scholar
  44. Zhao H, Liew AWC, Xie X et al (2008) A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data. J Theor Biol 251:264–274CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Electronic EngineeringCity University of Hong KongKowloonHong Kong
  2. 2.School of Electrical and Information EngineeringUniversity of SydneySydneyAustralia

Personalised recommendations