Abstract
The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.
This is a preview of subscription content, access via your institution.
References
Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1(1): 24-45.
Cheng Y, Church G M. Biclustering of expression data. In Proc. the 8th Int. Conf. Intelligent Systems for Molecular Biology, Aug. 2000, pp.93-103.
Yang J, Wang W, Wang H, Yu P S. Enhanced biclustering on expression data. In Proc. the 3rd IEEE Symposium on Bioinformatics and Bioengineering, Mar. 2000, pp.321-327.
Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics, 2003, 19(suppl. 2): 196-205.
Tang C, Zhang L, Zhang A, Ramanathan M. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In Proc. the 2nd IEEE Int. Symposium on Bioinformatics and Bioengineering, Nov. 2001, pp.41-48.
Divina F, Aguilar-Ruize J. Biclustering of expression data with evolutionary computation. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(5): 590-602.
Aggarwal C C, Procopiuc C, Wolf J L, Yu P S, Park J S. Fast algorithm for projected clustering. ACM SIGMOD Record, 1999, 28(2): 61-72.
Yip K Y, Cheng D W, Ng M K. HARP: A practical projected clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1387-1397.
Bouguessa M, Wang S. PCGEN: A practical approach to projected clustering and its application to gene expression data. In Proc. the IEEE Symposium on Computational Intelligence and Data Mining, April 2007, pp.661-667.
Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 2002, 18(suppl. 1): 136-144.
Ayadi W, Elloumi M, Hao J K. BicFinder: A biclustering algorithm for microarray data analysis. Knowledge and Information Systems, 2012, 30(2): 341-358.
Vukićević M, Kirchner K, Delibašić B, Jovanović M, Ruhland J, SuknovićM. Finding best algorithmic components for clustering microarray data. Knowledge and Information Systems, 2013, 35(1): 111-130.
Leyton-Brown K. Resource allocation in competitive multi-agent systems [Ph.D. Thesis]. Stanford University, 2003.
Rothkopf M, Pekec A, Harstad R. Computationally manageable combinatorial auctions. Management Science, 1998, 44(8): 1131-1147.
de Vries S, Vohra R. Combinatorial auctions: A survey. INFORMS Journal on Computing, 2003, 15(3): 284-309.
Nisan N. Bidding and allocation in combinatorial auctions. In Proc. the 2nd ACM Conference on Electronic Commerce, Oct. 2000, pp.1-12.
Tenenholtz M. Some tractable cominatorial auctions. In Proc. the AAAI/IAAI, Jul. 2000, pp.98-103.
Sandholm T. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 2002, 135(1/2): 1-54.
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001, 63(2): 411-423.
Mohajer M, Englmeier K H, Schmid V J. A comparison of Gap statistic definitions with and without logarithm function. arXiv:1103.4767v1 [Stat ME], 2011.
Armstrong S A, Stauton J E, Silveman L B et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 2002, 30(1): 41-47.
Gordon G, Jensen R, Hsiao L et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963-4967.
Hubert L, Araie P. Comparing partitions. Journal of Classificastion, 1985, 2(1): 193-218.
Duan K B, Rajapakse J C, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression Amichai Painsky et al.: Exclusive Row Biclustering via Optimal Set Cover data. IEEE Transactions on NanoBioscience, 2005, 4(3): 228-234.
Alba E, Garcia-Nieto J, Jourdan L, Talbi E G. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In Proc. the IEEE Congress on Evolutionary Computation, Sept. 2007, pp.284-290.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was funded in part by Israeli Science Foundation under Grant No. 1227/09 and by a grant to Amichai Painsky from the Israeli Center for Absorption in Science.
A preliminary version of the paper was published in the Proceedings of ICDM 2012.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 77 kb)
Rights and permissions
About this article
Cite this article
Painsky, A., Rosset, S. Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression. J. Comput. Sci. Technol. 29, 423–435 (2014). https://doi.org/10.1007/s11390-014-1440-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-014-1440-y
Keywords
- biclustering
- exclusive row biclustering
- projected clustering
- gene expression