Approximation Algorithms for Bi-clustering Problems
One of the main goals in the analysis of microarray data is to identify groups of genes and groups of experimental conditions (including environments, individuals and tissues), that exhibit similar expression patterns. This is the so-called bi-clustering problem. In this paper, we consider two variations of the bi-clustering problem: the Consensus Submatrix Problem and the Bottleneck Submatrix Problem. The input of the problems contains a m×n matrix A and integers l and k. The Consensus Submatrix Problem is to find a l×k submatrix with l<m and k<n and a consensus vector such that the sum of distance between all rows in the submatrix and the vector is minimized. The Bottleneck Submatrix Problem is to find a l×k submatrix with l<m and k<n, an integer d and a center vector such that the distance between every row in the submatrix and the vector is at most d and d is minimized. We show that both problems are NP-hard and give randomized approximation algorithms for special cases of the two problems. Using standard techniques, we can derandomize the algorithms to get polynomial time approximation schemes for the two problems. To our knowledge, this is the first time that approximation algorithms with guaranteed ratio are presented for microarray analysis.
KeywordsApproximation Algorithm Gene Expression Data Fractional Solution Center Vector Consensus Score
Unable to display preview. Download preview PDF.
- 5.Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Nat’l Acad. Sci. USA 96, 2907–2912 (1999)CrossRefGoogle Scholar
- 6.Ressom, H., Wang, D., Natarajan, P.: Clustering gene expression data using adaptive double selforganizing map. Physiol. Genomics 14, 35–46 (2003)Google Scholar
- 8.Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson Jr., J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)CrossRefGoogle Scholar
- 20.Getz, G., Levine, E., Domany, E.: Coupled two–way clustering analysis of gene microarray data. Proc. Nat’l Acad. Sci. USA, 12079–12084 (2000)Google Scholar
- 21.Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. 8th Conf. on Intelligent Systems for Molecular Biology ISMB 2000, pp. 93–103 (2000)Google Scholar
- 26.Gillman, D.: A Chernoff bound for random walks on expander graphs. In: Proc. 34th Symp. on Foundations of Computer Science FOCS 1993. IEEE Computer Society Press, Los Alamitos (1993)Google Scholar