Skip to main content
Log in

A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose a new biclustering method for binary data matrices using the maximum penalized Bernoulli likelihood estimation. Our method applies a multi-layer model defined on the logits of the success probabilities, where each layer represents a simple bicluster structure and the combination of multiple layers is able to reveal complicated, multiple biclusters. The method allows for non-pure biclusters, and can simultaneously identify the 1-prevalent blocks and 0-prevalent blocks. A computationally efficient algorithm is developed and guidelines are provided for specifying the tuning parameters, including initial values of model parameters, the number of layers, and the penalty parameters. Missing-data imputation can be handled in the EM framework. The method is tested using synthetic and real datasets and shows good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving sub-matrix problem. In: Proceedings of the 6th Annual International Conference on Computational Biology, pp. 49–57 (2002)

    Google Scholar 

  • Brookes, A.J.: Review: the essence of SNPs. Gene 234, 177–186 (1999)

    Article  Google Scholar 

  • Cheng, Y., Church, G.L.: Biclustering of expression data. In: Proceedings of International Conference on Intelligence Systems for Molecular Biology, pp. 93–103 (2000)

    Google Scholar 

  • Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advanced in Neural Information Processing System, vol. 14, pp. 617–pages 642. MIT Press, Cambridge (2002)

    Google Scholar 

  • de Leeuw, J.: Principal component analysis of binary data by iterated singular value decomposition. Comput. Stat. Data Anal. 50, 21–39 (2006)

    Article  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • Dhollander, T., Sheng, Q., Lemmens, K., De Moore, B., Marchal, K., Moreau, Y.: Query-driven module discovery in microarray data. Bioinformatics 23, 2573–2580 (2007)

    Article  Google Scholar 

  • Ewens, W.J., Spielman, R.S.: The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet. 57, 455–464 (1995)

    Article  Google Scholar 

  • Frank, A., Asuncion, A.: UCI machine learning repository. http://archive.ics.uci.edu/ml, Irvine, CA: University of California, School of Information and Computer Science (2010)

  • Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with application in election studies. Ann. Appl. Stat. 2, 1452–1477 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52, 3233–3245 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Huber, P., Ronchetti, E., victoria-Feser, M.: Estimation of generalized linear latent variable models. J. R. Stat. Soc. B 66, 893–908 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Hunter, D.R., Li, R.: Variable selection using MM algorithm. Ann. Stat. 33, 1617–1642 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Ihmels, J., Friedlander, G., Bergman, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370–377 (2002)

    Google Scholar 

  • Jaakkola, T.S., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10, 25–37 (2000)

    Article  Google Scholar 

  • Kwok, P.Y., Deng, Q., Zakeri, H., Taylor, S.L., Nicerson, D.A.: Increasing the information content of STS-based genome maps: identifying polymorphisms in mapped STSs. Genomics 31, 123–126 (1996)

    Article  Google Scholar 

  • Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective function (with discussion). J. Comput. Graph. Stat. 9, 1–20 (2000)

    MathSciNet  Google Scholar 

  • Lazzeroni, L., Owen, A.B.: Plaid models for gene expression data. Stat. Sin. 12, 61–86 (2002)

    MATH  MathSciNet  Google Scholar 

  • Lee, M., Shen, H., Huang, J.Z., Marron, J.S.: Biclustering via sparse singular value decomposition. Biometrics 66, 1087–1095 (2010a)

    Article  MATH  MathSciNet  Google Scholar 

  • Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4, 1579–1601 (2010b)

    Article  MATH  MathSciNet  Google Scholar 

  • Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 77–88 (2003)

    Google Scholar 

  • Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22, 1122–1129 (2006)

    Article  Google Scholar 

  • Rodriguez-Baena, D.S., Perez-Pulido, A., Aguilar-Ruiz, J.S.: A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics 27, 2746–2753 (2011)

    Article  Google Scholar 

  • Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Serre, D., Montpetit, A., Paré, G., Engert, J.C., Yusuf, S., Keavney, B., Judson, T.J., Anand, S.: Correction of population stratification in large multi-ethnic association studies. PLoS ONE 2, e1382 (2008). doi:10.1371/journal.pone.0001382

    Article  Google Scholar 

  • Shamir, R., Maron-Katz, A., Tanay, A., Linhart, C., Steinfeld, I., Sharan, R., Shiloh, Y., Elkon, R.: EXPANDER—an integrative program suite for microarray data analysis. BMC Bioinform. 6, 232 (2005)

    Article  Google Scholar 

  • Schein, A., Saul, L.K., Ungar, L.H.: A generalized linear model for principal component analysis of binary data. In: Bishop, C.M., Frey, B.J. (eds.) Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, pp. 14–21. Key West, FL (2003)

    Google Scholar 

  • Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Sheng, Q., Moreau, Y., De Moor, B.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19, 196–205 (2003)

    Article  Google Scholar 

  • Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl. 1), S136–S144 (2002)

    Article  Google Scholar 

  • The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005)

    Article  Google Scholar 

  • Tibshirani, R.J.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15, 1329–1345 (2008)

    Article  MathSciNet  Google Scholar 

  • Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22, 415–428 (2012)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor, the associate editor, and two reviewers for helpful comments. Dr. Lan Zhou carefully read the paper and gave many useful suggestions for improving the writing. Lee’s work was supported by Basic Science Research Program through the National Research Foundation (NRF) of Korea (2011-0011608). Huang’s work was partially supported by NCI (CA57030), NSF (DMS-0907170, DMS-1007618, DMS-1208952), and King Abdullah University of Science and Technology (KUS-CI-016-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seokho Lee.

Appendix

Appendix

1.1 Derivations of (4) and (5)

Lemma 1

The function π(x){1−π(x)} is decreasing in x≥0 where π(x)={1+exp(−x)}−1.

Proof

The first derivative of the underlying function is π′(x){1−π(x)}−π(x)π′(x)=π′(x){1−2π(x)}=π(x){1−π(x)}{1−2π(x)}. Since 1/2≤π(x)≤1 for x≥0, the derivative is negative and therefore the function is decreasing. □

Lemma 2

The function \(r(x)=\log\pi(\sqrt{x})-\sqrt{x}/2\) is convex.

Proof

The second derivative of r(x) is

$$r''(x) = \frac{1}{4x} \biggl[ \frac{2\pi(\sqrt{x})-1}{2\sqrt{x}} - \pi(\sqrt{x})\bigl\{1-\pi(\sqrt{x})\bigr\} \biggr]. $$

Note that

with \(\xi\in(-\sqrt{x},\sqrt{x})\) from the mean value theorem. From \(\xi<\sqrt{x}\) and Lemma 1, the second derivative of r(x) is positive. This completes the proof of Lemma 2. □

From the convexity of r(x), we get r(x)≥r(y)+r′(y)(xy) at any y, so that

or

By replacing \(\sqrt{x}\) and \(\sqrt{y}\) with x and y respectively, we obtain

The coefficient of the quadratic term is bounded above by 1/8 because

for ξ∈(−y,y) by the mean value theorem. At y=0, the coefficient of the quadratic term is not defined properly. In this case, we use the limit when y approaches zero. By L’hopital’s theorem we get

$$\lim_{y\rightarrow0} \frac{2\pi(y)-1}{4y} = \lim_{y\rightarrow 0}\frac {2\pi'(y)}{4} = \lim_{y\rightarrow0}\frac{\pi(y)\{1-\pi(y)\}}{2} = \frac{1}{8}. $$

Now, by completing squares around x, we get the upper bound as

$$-\log\pi(y) + 2\bigl\{1-\pi(y)\bigr\}^2 + \frac{1}{8}\bigl[x- \bigl\{y+4\bigl(1-\pi(y)\bigr)\bigr\}\bigr]^2. $$

Replacing x and y by q ij θ ij and \(q_{ij}\theta_{ij}^{o}\) respectively, we obtain the upper bound in (4). In fact, the first two terms in (4) are obtained straightforwardly after x and y are replaced by q ij θ ij and \(q_{ij}\theta _{ij}^{o}\) respectively. After the replacement, the third term of the above displayed formula becomes

In the above, we used \(q_{ij}^{2}=1\) since q ij takes values −1 or 1 only. Now, if we define \(x_{ij}=\theta_{ij}^{o}+4q_{ij} \{1-\pi (q_{ij}\theta_{ij}^{o}) \}\), the desired result is obtained.

Applying (4) to (1), we obtain the upper bound of the penalized Bernoulli log likelihood

where \(g(\mbox{\boldmath$\varXi$}|\mbox{\boldmath$\varXi$}^{o})\) is in the equation (5) of the manuscript, and C is the constant term \(-\sum_{i=1}^{n}\sum _{j=1}^{p}\log (q_{ij}\theta_{ij}^{o}) + 2\sum_{i=1}^{n}\sum_{j=1}^{p}\{1-\pi (q_{ij}\theta _{ij}^{o})\}^{2}\), which does not involve θ ij . Thus, g can be used as the surrogate function for the upper-bound optimization.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Huang, J.Z. A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat Comput 24, 429–441 (2014). https://doi.org/10.1007/s11222-013-9379-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9379-3

Keywords

Navigation