A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood

Lee, Seokho; Huang, Jianhua Z.

doi:10.1007/s11222-013-9379-3

A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood

Published: 31 January 2013

Volume 24, pages 429–441, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Seokho Lee¹ &
Jianhua Z. Huang²

576 Accesses
12 Citations
Explore all metrics

Abstract

We propose a new biclustering method for binary data matrices using the maximum penalized Bernoulli likelihood estimation. Our method applies a multi-layer model defined on the logits of the success probabilities, where each layer represents a simple bicluster structure and the combination of multiple layers is able to reveal complicated, multiple biclusters. The method allows for non-pure biclusters, and can simultaneously identify the 1-prevalent blocks and 0-prevalent blocks. A computationally efficient algorithm is developed and guidelines are provided for specifying the tuning parameters, including initial values of model parameters, the number of layers, and the penalty parameters. Missing-data imputation can be handled in the EM framework. The method is tested using synthetic and real datasets and shows good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

References

Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving sub-matrix problem. In: Proceedings of the 6th Annual International Conference on Computational Biology, pp. 49–57 (2002)
Google Scholar
Brookes, A.J.: Review: the essence of SNPs. Gene 234, 177–186 (1999)
Article Google Scholar
Cheng, Y., Church, G.L.: Biclustering of expression data. In: Proceedings of International Conference on Intelligence Systems for Molecular Biology, pp. 93–103 (2000)
Google Scholar
Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advanced in Neural Information Processing System, vol. 14, pp. 617–pages 642. MIT Press, Cambridge (2002)
Google Scholar
de Leeuw, J.: Principal component analysis of binary data by iterated singular value decomposition. Comput. Stat. Data Anal. 50, 21–39 (2006)
Article MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Dhollander, T., Sheng, Q., Lemmens, K., De Moore, B., Marchal, K., Moreau, Y.: Query-driven module discovery in microarray data. Bioinformatics 23, 2573–2580 (2007)
Article Google Scholar
Ewens, W.J., Spielman, R.S.: The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57, 455–464 (1995)
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml, Irvine, CA: University of California, School of Information and Computer Science (2010)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Article MATH MathSciNet Google Scholar
Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with application in election studies. Ann. Appl. Stat. 2, 1452–1477 (2008)
Article MATH MathSciNet Google Scholar
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52, 3233–3245 (2008)
Article MATH MathSciNet Google Scholar
Huber, P., Ronchetti, E., victoria-Feser, M.: Estimation of generalized linear latent variable models. J. R. Stat. Soc. B 66, 893–908 (2004)
Article MATH MathSciNet Google Scholar
Hunter, D.R., Li, R.: Variable selection using MM algorithm. Ann. Stat. 33, 1617–1642 (2005)
Article MATH MathSciNet Google Scholar
Ihmels, J., Friedlander, G., Bergman, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370–377 (2002)
Google Scholar
Jaakkola, T.S., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10, 25–37 (2000)
Article Google Scholar
Kwok, P.Y., Deng, Q., Zakeri, H., Taylor, S.L., Nicerson, D.A.: Increasing the information content of STS-based genome maps: identifying polymorphisms in mapped STSs. Genomics 31, 123–126 (1996)
Article Google Scholar
Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective function (with discussion). J. Comput. Graph. Stat. 9, 1–20 (2000)
MathSciNet Google Scholar
Lazzeroni, L., Owen, A.B.: Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002)
MATH MathSciNet Google Scholar
Lee, M., Shen, H., Huang, J.Z., Marron, J.S.: Biclustering via sparse singular value decomposition. Biometrics 66, 1087–1095 (2010a)
Article MATH MathSciNet Google Scholar
Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4, 1579–1601 (2010b)
Article MATH MathSciNet Google Scholar
Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 77–88 (2003)
Google Scholar
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122–1129 (2006)
Article Google Scholar
Rodriguez-Baena, D.S., Perez-Pulido, A., Aguilar-Ruiz, J.S.: A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics 27, 2746–2753 (2011)
Article Google Scholar
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Serre, D., Montpetit, A., Paré, G., Engert, J.C., Yusuf, S., Keavney, B., Judson, T.J., Anand, S.: Correction of population stratification in large multi-ethnic association studies. PLoS ONE 2, e1382 (2008). doi:10.1371/journal.pone.0001382
Article Google Scholar
Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER—an integrative program suite for microarray data analysis. BMC Bioinform. 6, 232 (2005)
Article Google Scholar
Schein, A., Saul, L.K., Ungar, L.H.: A generalized linear model for principal component analysis of binary data. In: Bishop, C.M., Frey, B.J. (eds.) Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, pp. 14–21. Key West, FL (2003)
Google Scholar
Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034 (2008)
Article MATH MathSciNet Google Scholar
Sheng, Q., Moreau, Y., De Moor, B.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)
Article Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), S136–S144 (2002)
Article Google Scholar
The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005)
Article Google Scholar
Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15, 1329–1345 (2008)
Article MathSciNet Google Scholar
Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22, 415–428 (2012)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor, the associate editor, and two reviewers for helpful comments. Dr. Lan Zhou carefully read the paper and gave many useful suggestions for improving the writing. Lee’s work was supported by Basic Science Research Program through the National Research Foundation (NRF) of Korea (2011-0011608). Huang’s work was partially supported by NCI (CA57030), NSF (DMS-0907170, DMS-1007618, DMS-1208952), and King Abdullah University of Science and Technology (KUS-CI-016-04).

Author information

Authors and Affiliations

Department of Statistics, Hankuk University of Foreign Studies, Yongin, 449-791, Republic of Korea
Seokho Lee
Department of Statistics, Texas A&M University, College Station, TX, 77843-3143, USA
Jianhua Z. Huang

Authors

Seokho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Z. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seokho Lee.

Appendix

1.1 Derivations of (4) and (5)

Lemma 1

The function π(x){1−π(x)} is decreasing in x≥0 where π(x)={1+exp(−x)}⁻¹.

Proof

The first derivative of the underlying function is π′(x){1−π(x)}−π(x)π′(x)=π′(x){1−2π(x)}=π(x){1−π(x)}{1−2π(x)}. Since 1/2≤π(x)≤1 for x≥0, the derivative is negative and therefore the function is decreasing. □

Lemma 2

The function $r(x)=\log\pi(\sqrt{x})-\sqrt{x}/2$ is convex.

Proof

The second derivative of r(x) is

$$r''(x) = \frac{1}{4x} \biggl[ \frac{2\pi(\sqrt{x})-1}{2\sqrt{x}} - \pi(\sqrt{x})\bigl\{1-\pi(\sqrt{x})\bigr\} \biggr]. $$

Note that

with $\xi\in(-\sqrt{x},\sqrt{x})$ from the mean value theorem. From $\xi<\sqrt{x}$ and Lemma 1, the second derivative of r(x) is positive. This completes the proof of Lemma 2. □

From the convexity of r(x), we get r(x)≥r(y)+r′(y)(x−y) at any y, so that

or

By replacing $\sqrt{x}$ and $\sqrt{y}$ with x and y respectively, we obtain

The coefficient of the quadratic term is bounded above by 1/8 because

for ξ∈(−y,y) by the mean value theorem. At y=0, the coefficient of the quadratic term is not defined properly. In this case, we use the limit when y approaches zero. By L’hopital’s theorem we get

$$\lim_{y\rightarrow0} \frac{2\pi(y)-1}{4y} = \lim_{y\rightarrow 0}\frac {2\pi'(y)}{4} = \lim_{y\rightarrow0}\frac{\pi(y)\{1-\pi(y)\}}{2} = \frac{1}{8}. $$

Now, by completing squares around x, we get the upper bound as

$$-\log\pi(y) + 2\bigl\{1-\pi(y)\bigr\}^2 + \frac{1}{8}\bigl[x- \bigl\{y+4\bigl(1-\pi(y)\bigr)\bigr\}\bigr]^2. $$

Replacing x and y by q _ij θ _ij and $q_{ij}\theta_{ij}^{o}$ respectively, we obtain the upper bound in (4). In fact, the first two terms in (4) are obtained straightforwardly after x and y are replaced by q _ij θ _ij and $q_{ij}\theta _{ij}^{o}$ respectively. After the replacement, the third term of the above displayed formula becomes

In the above, we used $q_{ij}^{2}=1$ since q _ij takes values −1 or 1 only. Now, if we define $x_{ij}=\theta_{ij}^{o}+4q_{ij} \{1-\pi (q_{ij}\theta_{ij}^{o}) \}$, the desired result is obtained.

Applying (4) to (1), we obtain the upper bound of the penalized Bernoulli log likelihood

where $g(\mbox{\boldmath$\varXi$}|\mbox{\boldmath$\varXi$}^{o})$ is in the equation (5) of the manuscript, and C is the constant term $-\sum_{i=1}^{n}\sum _{j=1}^{p}\log (q_{ij}\theta_{ij}^{o}) + 2\sum_{i=1}^{n}\sum_{j=1}^{p}\{1-\pi (q_{ij}\theta _{ij}^{o})\}^{2}$, which does not involve θ _ij. Thus, g can be used as the surrogate function for the upper-bound optimization.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Huang, J.Z. A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat Comput 24, 429–441 (2014). https://doi.org/10.1007/s11222-013-9379-3

Download citation

Received: 20 December 2011
Accepted: 18 January 2013
Published: 31 January 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11222-013-9379-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A survey of Bayesian Network structure learning

Entropy-Based Subsampling Methods for Big Data

References

Acknowledgements