Approximation Algorithms for Bi-clustering Problems

Wang, Lusheng; Lin, Yu; Liu, Xiaowen

doi:10.1007/11851561_29

Lusheng Wang²¹,
Yu Lin^21,22 &
Xiaowen Liu²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4175))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

868 Accesses

Abstract

One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals and tissues), that exhibit similar expression patterns. This is the so-called bi-clustering problem. In this paper, we consider two variations of the bi-clustering problem: the Consensus Submatrix Problem and the Bottleneck Submatrix Problem. The input of the problems contains a m×n matrix A and integers l and k. The Consensus Submatrix Problem is to find a l×k submatrix with l<m and k<n and a consensus vector such that the sum of distance between all rows in the submatrix and the vector is minimized. The Bottleneck Submatrix Problem is to find a l×k submatrix with l<m and k<n, an integer d and a center vector such that the distance between every row in the submatrix and the vector is at most d and d is minimized. We show that both problems are NP-hard and give randomized approximation algorithms for special cases of the two problems. Using standard techniques, we can derandomize the algorithms to get polynomial time approximation schemes for the two problems. To our knowledge, this is the first time that approximation algorithms with guaranteed ratio are presented for microarray analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stoughton, R.B.: Applications of DNA microarrays in biology. Annual Rev. Biochem. 74, 53–82 (2005)
Article Google Scholar
Allison, D.B., Cui, X., Page, G.P., Sabripou, M.: Microarray data analysis: from disarray to consolidation and consensus. Nature Reviews Genetics 7, 55–65 (2006)
Article Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)
Article Google Scholar
Wu, F.X., Zhang, W.J., Kusalik, A.J.: A genetic K-means clustering algorithm applied to gene expression data. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS(LNAI), vol. 2671, pp. 520–526. Springer, Heidelberg (2003)
Chapter Google Scholar
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Nat’l Acad. Sci. USA 96, 2907–2912 (1999)
Article Google Scholar
Ressom, H., Wang, D., Natarajan, P.: Clustering gene expression data using adaptive double selforganizing map. Physiol. Genomics 14, 35–46 (2003)
Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Nat’l Acad. Sci. USA 95, 14863–14868 (1998)
Article Google Scholar
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson Jr., J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)
Article Google Scholar
Qin, J., Lewis, D.P., Noble, W.S.: Kernel hierarchical gene clustering from microarray expression data. Bioinformatics 19, 2097–2104 (2003)
Article Google Scholar
Alter, O., Brown, P.O., Botstein, D.: Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc. Nat’l Acad. Sci. USA 100, 3351–3356 (2003)
Article Google Scholar
Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., Fedoroff, N.V.: Fundamental patterns underlying gene expression profiles: simplicity from complexity. Proc. Nat’l Acad. Sci. USA 97, 8409–8414 (2000)
Article Google Scholar
Li, K.C., Yan, M., Yuan, S.S.: A simple statistical model for depicting the cdc15-synchronized yeast cell-cycle regulated gene expression data. Statistica Sinica 12, 141–158 (2002)
MATH MathSciNet Google Scholar
Tjaden, B.: An approach for clustering gene expression data with error Information. BMC Bioinformatics 7, 17 (2006)
Article Google Scholar
Mecham, B.H., Wetmore, D.Z., Szallasi, Z., Sadovsky, Y., Kohane, I., Mariani, T.J.: Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics 18, 308–315 (2004)
Article Google Scholar
Rocke, D.M., Dubin, B.: A Model for Measurement Error for Gene Expression Arrays. J. of Computational Biology 8(6), 557–569 (2001)
Article Google Scholar
Draghici, S., Khatri, P., Eklund, A.C., Szallasi, Z.: Reliability and reproducibility issues in DNA microarray measurements. Trends in Genetics 22(2), 101–109 (2006)
Article Google Scholar
Brody, J.P., Williams, B.A., Wold, B.J., Quake, S.R.: Significance and statistical errors in the analysis of DNA microarray data. Proc. Nat’l Acad. Sci. USA 99, 12975–12978 (2002)
Article Google Scholar
Purdom, E., Holmes, S.P.: Error distribution for gene expression data. Statistical Applications in Genetics and Molecular Biology 4(1), 16 (2005)
Article MathSciNet Google Scholar
Cho, H., Lee, J.K.: Bayesian hierarchical error model for analysis of gene expression data. Bioinformatics 20, 2016–2025 (2004)
Article Google Scholar
Getz, G., Levine, E., Domany, E.: Coupled two–way clustering analysis of gene microarray data. Proc. Nat’l Acad. Sci. USA, 12079–12084 (2000)
Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. 8th Conf. on Intelligent Systems for Molecular Biology ISMB 2000, pp. 93–103 (2000)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Article Google Scholar
Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 102–116. Springer, Heidelberg (2004)
Chapter Google Scholar
Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131(3), 651–654 (2003)
Article MATH MathSciNet Google Scholar
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Article MathSciNet Google Scholar
Gillman, D.: A Chernoff bound for random walks on expander graphs. In: Proc. 34th Symp. on Foundations of Computer Science FOCS 1993. IEEE Computer Society Press, Los Alamitos (1993)
Google Scholar
Arora, S., Karger, D., Karpinski, M.: Polynomial-time approximation schemes for dense instances of NP-hard problems. In: Proc. 27th ACM Symp. on Theory of Computing STOC 1995, pp. 284–293. ACM Press, New York (1995)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong
Lusheng Wang, Yu Lin & Xiaowen Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yu Lin

Authors

Lusheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Polytechnique Fédérale de Lausanne, Switzerland
Philipp Bücher
Laboratory for Computational Biology and Bioinformatics, EPFL (Ecole Polytechnique Fédérale de Lausanne), Swiss Institute of Bioinformatics, Lausanne, Switzerland
Bernard M. E. Moret

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Lin, Y., Liu, X. (2006). Approximation Algorithms for Bi-clustering Problems. In: Bücher, P., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science(), vol 4175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11851561_29

Download citation

DOI: https://doi.org/10.1007/11851561_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39583-6
Online ISBN: 978-3-540-39584-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics