Approximation Algorithms for Bi-clustering Problems

  • Lusheng Wang
  • Yu Lin
  • Xiaowen Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals and tissues), that exhibit similar expression patterns. This is the so-called bi-clustering problem. In this paper, we consider two variations of the bi-clustering problem: the Consensus Submatrix Problem and the Bottleneck Submatrix Problem. The input of the problems contains a m×n matrix A and integers l and k. The Consensus Submatrix Problem is to find a l×k submatrix with l<m and k<n and a consensus vector such that the sum of distance between all rows in the submatrix and the vector is minimized. The Bottleneck Submatrix Problem is to find a l×k submatrix with l<m and k<n, an integer d and a center vector such that the distance between every row in the submatrix and the vector is at most d and d is minimized. We show that both problems are NP-hard and give randomized approximation algorithms for special cases of the two problems. Using standard techniques, we can derandomize the algorithms to get polynomial time approximation schemes for the two problems. To our knowledge, this is the first time that approximation algorithms with guaranteed ratio are presented for microarray analysis.


Approximation Algorithm Gene Expression Data Fractional Solution Center Vector Consensus Score 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stoughton, R.B.: Applications of DNA microarrays in biology. Annual Rev. Biochem. 74, 53–82 (2005)CrossRefGoogle Scholar
  2. 2.
    Allison, D.B., Cui, X., Page, G.P., Sabripou, M.: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 7, 55–65 (2006)CrossRefGoogle Scholar
  3. 3.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)CrossRefGoogle Scholar
  4. 4.
    Wu, F.X., Zhang, W.J., Kusalik, A.J.: A genetic K-means clustering algorithm applied to gene expression data. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS(LNAI), vol. 2671, pp. 520–526. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Nat’l Acad. Sci. USA 96, 2907–2912 (1999)CrossRefGoogle Scholar
  6. 6.
    Ressom, H., Wang, D., Natarajan, P.: Clustering gene expression data using adaptive double selforganizing map. Physiol. Genomics 14, 35–46 (2003)Google Scholar
  7. 7.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Nat’l Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  8. 8.
    Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson Jr., J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
  9. 9.
    Qin, J., Lewis, D.P., Noble, W.S.: Kernel hierarchical gene clustering from microarray expression data. Bioinformatics 19, 2097–2104 (2003)CrossRefGoogle Scholar
  10. 10.
    Alter, O., Brown, P.O., Botstein, D.: Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc. Nat’l Acad. Sci. USA 100, 3351–3356 (2003)CrossRefGoogle Scholar
  11. 11.
    Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., Fedoroff, N.V.: Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc. Nat’l Acad. Sci. USA 97, 8409–8414 (2000)CrossRefGoogle Scholar
  12. 12.
    Li, K.C., Yan, M., Yuan, S.S.: A simple statistical model for depicting the cdc15-synchronized yeast cell-cycle regulated gene expression data. Statistica Sinica 12, 141–158 (2002)MATHMathSciNetGoogle Scholar
  13. 13.
    Tjaden, B.: An approach for clustering gene expression data with error Information. BMC Bioinformatics 7, 17 (2006)CrossRefGoogle Scholar
  14. 14.
    Mecham, B.H., Wetmore, D.Z., Szallasi, Z., Sadovsky, Y., Kohane, I., Mariani, T.J.: Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics 18, 308–315 (2004)CrossRefGoogle Scholar
  15. 15.
    Rocke, D.M., Dubin, B.: A Model for Measurement Error for Gene Expression Arrays. J. of Computational Biology 8(6), 557–569 (2001)CrossRefGoogle Scholar
  16. 16.
    Draghici, S., Khatri, P., Eklund, A.C., Szallasi, Z.: Reliability and reproducibility issues in DNA microarray measurements. Trends in Genetics 22(2), 101–109 (2006)CrossRefGoogle Scholar
  17. 17.
    Brody, J.P., Williams, B.A., Wold, B.J., Quake, S.R.: Significance and statistical errors in the analysis of DNA microarray data. Proc. Nat’l Acad. Sci. USA 99, 12975–12978 (2002)CrossRefGoogle Scholar
  18. 18.
    Purdom, E., Holmes, S.P.: Error distribution for gene expression data. Statistical Applications in Genetics and Molecular Biology 4(1), 16 (2005)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Cho, H., Lee, J.K.: Bayesian hierarchical error model for analysis of gene expression data. Bioinformatics 20, 2016–2025 (2004)CrossRefGoogle Scholar
  20. 20.
    Getz, G., Levine, E., Domany, E.: Coupled two–way clustering analysis of gene microarray data. Proc. Nat’l Acad. Sci. USA, 12079–12084 (2000)Google Scholar
  21. 21.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. 8th Conf. on Intelligent Systems for Molecular Biology ISMB 2000, pp. 93–103 (2000)Google Scholar
  22. 22.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)CrossRefGoogle Scholar
  23. 23.
    Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 102–116. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131(3), 651–654 (2003)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Gillman, D.: A Chernoff bound for random walks on expander graphs. In: Proc. 34th Symp. on Foundations of Computer Science FOCS 1993. IEEE Computer Society Press, Los Alamitos (1993)Google Scholar
  27. 27.
    Arora, S., Karger, D., Karpinski, M.: Polynomial-time approximation schemes for dense instances of NP-hard problems. In: Proc. 27th ACM Symp. on Theory of Computing STOC 1995, pp. 284–293. ACM Press, New York (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lusheng Wang
    • 1
  • Yu Lin
    • 1
    • 2
  • Xiaowen Liu
    • 1
  1. 1.Department of Computer ScienceCity University of Hong KongHong Kong
  2. 2.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations