Geometric Biclustering and Its Applications to Cancer Tissue Classification Based on DNA Microarray Gene Expression Data
Biclustering is an important tool in microarray data analysis when only a subset of genes coregulates under a subset of conditions. It is a useful technique for cancer tissue classification based on gene expression data. Unlike standard clustering analysis, biclustering methodology can perform simultaneous classification on the two dimensions of genes and conditions in a data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this chapter, we present a novel geometric perspective of a biclustering problem and the related geometric algorithms. In the view of geometrical interpretation, different types of biclusters can be mapped to the linear geometric structures, such as points, lines, or hyperplanes in a high-dimensional data space. Such a perspective makes it possible to unify the formulation of biclusters and thus the biclustering process can be interpreted as a search for linear geometries in spatial space. Based on the linear geometry formulation, we develop Hough transform-based biclustering algorithms. Considering the computational complexity in searching the existence of noise in microarray data, and the biological meanings of biclusters, we propose several methods to improve the geometric biclustering algorithms. Simulation studies show that the algorithms can discover significant biclusters despite the increased noise level and regulatory complexity. Furthermore, the algorithms are also effective in extracting biologically meaningful biclusters from real microarray gene expression data.
This work is supported by a grant from the Hong Kong Research Grant Council (Projects CityU 122506 and 122607).
- Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
- Ben-Dor A, Chor B, Karp R et al (2002) Discovering local structure in gene expression data: the order-preserving sub-matrix problem. In: Myers G et al (eds) Annual conference on research in computational molecular biology. Proceedings of the 6th annual international conference on Computational Biology. ACM, New York, pp 49–57Google Scholar
- Celveland WS (1993) Visualizing data. At & T Bell Labloratories, Murray Hill, NJGoogle Scholar
- Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of 8th international conference on intelligent systems for molecular biology (ISMB’00), pp 93–103Google Scholar
- Lam B, Yan H (2006) Subdimension-based similarity measure for DNA microarray data clustering. Phys Rev E 74:041096Google Scholar
- Murli TM, Kasif S (2003) Extracting conserved gene expression motif from gene expression data. In: Proceedings of the 8th Pacific symposium on biocomputing, Lihue, Hawaii, pp 77–88Google Scholar
- Ochs MF, Godwin AK (2003) Microarrays in cancer: research and applications. BioTechniques 34:4–15Google Scholar
- Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:136–144.Google Scholar
- Tanay A, Sharan R, Shamir R (2006) Biclustering algorithms: a survey. In: Aluru S (ed) Handbook of computational molecular biology. Chapman & Hall/CRC, Boca Raton, FLGoogle Scholar
- Theis FJ, Georgiev P, Cichocki (2007) A robust sparse component analysis based on a generalized Hough transform. EURASIP J Adv Signal Process 2007:13Google Scholar
- Yang J, Wang W, Yu PS (2002) Delta-clusters: capturing subspace correlation in a large data set. In: Proceedings of 18th IEEE international conference on data engineering, 2002, pp 517–528Google Scholar