Journal of Computer Science and Technology

, Volume 29, Issue 3, pp 423–435 | Cite as

Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression

Regular Paper

Abstract

The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.

Keywords

biclustering exclusive row biclustering projected clustering gene expression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2014_1440_MOESM1_ESM.pdf (78 kb)
ESM 1(PDF 77 kb)

References

  1. [1]
    Madeira S C, Oliveira A L. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1(1): 24-45.CrossRefGoogle Scholar
  2. [2]
    Cheng Y, Church G M. Biclustering of expression data. In Proc. the 8th Int. Conf. Intelligent Systems for Molecular Biology, Aug. 2000, pp.93-103.Google Scholar
  3. [3]
    Yang J, Wang W, Wang H, Yu P S. Enhanced biclustering on expression data. In Proc. the 3rd IEEE Symposium on Bioinformatics and Bioengineering, Mar. 2000, pp.321-327.Google Scholar
  4. [4]
    Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics, 2003, 19(suppl. 2): 196-205.Google Scholar
  5. [5]
    Tang C, Zhang L, Zhang A, Ramanathan M. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In Proc. the 2nd IEEE Int. Symposium on Bioinformatics and Bioengineering, Nov. 2001, pp.41-48.Google Scholar
  6. [6]
    Divina F, Aguilar-Ruize J. Biclustering of expression data with evolutionary computation. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(5): 590-602.CrossRefGoogle Scholar
  7. [7]
    Aggarwal C C, Procopiuc C, Wolf J L, Yu P S, Park J S. Fast algorithm for projected clustering. ACM SIGMOD Record, 1999, 28(2): 61-72.CrossRefGoogle Scholar
  8. [8]
    Yip K Y, Cheng D W, Ng M K. HARP: A practical projected clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1387-1397.CrossRefGoogle Scholar
  9. [9]
    Bouguessa M, Wang S. PCGEN: A practical approach to projected clustering and its application to gene expression data. In Proc. the IEEE Symposium on Computational Intelligence and Data Mining, April 2007, pp.661-667.Google Scholar
  10. [10]
    Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 2002, 18(suppl. 1): 136-144.CrossRefGoogle Scholar
  11. [11]
    Ayadi W, Elloumi M, Hao J K. BicFinder: A biclustering algorithm for microarray data analysis. Knowledge and Information Systems, 2012, 30(2): 341-358.CrossRefGoogle Scholar
  12. [12]
    Vukićević M, Kirchner K, Delibašić B, Jovanović M, Ruhland J, SuknovićM. Finding best algorithmic components for clustering microarray data. Knowledge and Information Systems, 2013, 35(1): 111-130.CrossRefGoogle Scholar
  13. [13]
    Leyton-Brown K. Resource allocation in competitive multi-agent systems [Ph.D. Thesis]. Stanford University, 2003.Google Scholar
  14. [14]
    Rothkopf M, Pekec A, Harstad R. Computationally manageable combinatorial auctions. Management Science, 1998, 44(8): 1131-1147.CrossRefMATHGoogle Scholar
  15. [15]
    de Vries S, Vohra R. Combinatorial auctions: A survey. INFORMS Journal on Computing, 2003, 15(3): 284-309.CrossRefMATHMathSciNetGoogle Scholar
  16. [16]
    Nisan N. Bidding and allocation in combinatorial auctions. In Proc. the 2nd ACM Conference on Electronic Commerce, Oct. 2000, pp.1-12.Google Scholar
  17. [17]
    Tenenholtz M. Some tractable cominatorial auctions. In Proc. the AAAI/IAAI, Jul. 2000, pp.98-103.Google Scholar
  18. [18]
    Sandholm T. Algorithm for optimal winner determination in combinatorial auctions. Artificial Intelligence, 2002, 135(1/2): 1-54.CrossRefMATHMathSciNetGoogle Scholar
  19. [19]
    Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001, 63(2): 411-423.Google Scholar
  20. [20]
    Mohajer M, Englmeier K H, Schmid V J. A comparison of Gap statistic definitions with and without logarithm function. arXiv:1103.4767v1 [Stat ME], 2011.Google Scholar
  21. [21]
    Armstrong S A, Stauton J E, Silveman L B et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 2002, 30(1): 41-47.CrossRefGoogle Scholar
  22. [22]
    Gordon G, Jensen R, Hsiao L et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963-4967.Google Scholar
  23. [23]
    Hubert L, Araie P. Comparing partitions. Journal of Classificastion, 1985, 2(1): 193-218.CrossRefGoogle Scholar
  24. [24]
    Duan K B, Rajapakse J C, Wang H, Azuaje F. Multiple SVM-RFE for gene selection in cancer classification with expression Amichai Painsky et al.: Exclusive Row Biclustering via Optimal Set Cover data. IEEE Transactions on NanoBioscience, 2005, 4(3): 228-234.Google Scholar
  25. [25]
    Alba E, Garcia-Nieto J, Jourdan L, Talbi E G. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In Proc. the IEEE Congress on Evolutionary Computation, Sept. 2007, pp.284-290.Google Scholar

Copyright information

© Springer Science+Business Media New York & Science Press, China 2014

Authors and Affiliations

  1. 1.School of Mathematical SciencesTel Aviv UniversityTel AvivIsrael

Personalised recommendations