Skip to main content
Log in

Tensor latent block model for co-clustering

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering, which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to a straightforward simultaneous interpretation of row clusters and column clusters. Many approaches exist; in this paper, we rely on the latent block model (LBM), which is flexible, allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM), allowing different relations between entities. To show the interest of TLBM, we consider continuous, binary, and contingency tables datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/movielens/.

  2. https://linqs.soe.ucsc.edu/data.

  3. https://aminer.org/citation.

References

  1. Ailem, M., Role, F., Nadif, M.: Model-based co-clustering for the effective handling of sparse data. Pattern Recognit. 72, 108–122 (2017)

    Article  Google Scholar 

  2. Ailem, M., Role, F., Nadif, M.: Sparse poisson latent block model for document clustering. IEEE Trans. Knowl. Data Eng. 29(7), 1563–1576 (2017)

    Article  Google Scholar 

  3. Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: Proceedings of the Eleventh ACM SIGKDD, pp. 532–537 (2005)

  4. Bouchareb, A., Boullé, M., Clérot, F., Rossi, F.: Co-clustering based exploratory analysis of mixed-type data tables. In: Advances in Knowledge Discovery and Management, pp. 23–41. Springer (2019)

  5. Bourgeois, F., Lassalle, J.C.: An extension of the Munkres algorithm for the assignment problem to rectangular matrices. Commun. ACM 14(12), 802–804 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boutalbi, R., Labiod, L., Nadif, M.: Co-clustering from tensor data. In: Yang, Q., Zhou, Z.H., Gong, Z., Zhang, M.L., Huang, S.J. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 370–383 (2019)

  7. Briand, A.S., Côme, E., El Mahrsi, M.K., Oukhellou, L.: A mixture model clustering approach for temporal passenger pattern characterization in public transport. Int. J. Data Sci. Anal. 1(1), 37–50 (2016)

    Article  Google Scholar 

  8. Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  10. Deodhar, M., Ghosh, J.: SCOAL: a framework for simultaneous co-clustering and learning from complex data. ACM Trans. Knowl. Discov. Data 4, 1–31 (2010)

    Article  Google Scholar 

  11. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD, pp. 89–98 (2003)

  12. Feizi, S., Javadi, H., Tse, D.: Tensor biclustering. In: Advances in Neural Information Processing Systems, vol. 30, pp. 1311–1320. Curran Associates, Inc. (2017)

  13. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)

    Article  MATH  Google Scholar 

  14. Govaert, G., Nadif, M.: Comparison of the mixture and the classification maximum likelihood in cluster analysis with binary data. Comput. Stat. Data Anal. 23(1), 65–81 (1996)

    Article  MATH  Google Scholar 

  15. Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recognit. 36, 463–473 (2003)

    Article  MATH  Google Scholar 

  16. Govaert, G., Nadif, M.: An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 643–647 (2005)

    Article  MATH  Google Scholar 

  17. Govaert, G., Nadif, M.: Fuzzy clustering to estimate the parameters of block mixture models. Soft. Comput. 10(5), 415–422 (2006)

    Article  Google Scholar 

  18. Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Govaert, G., Nadif, M.: Co-clustering. Wiley-IEEE Press, Hoboken (2013)

    Book  MATH  Google Scholar 

  20. Govaert, G., Nadif, M.: Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv. Data Anal. Classif. 12(3), 455–488 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  21. Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)

    Article  Google Scholar 

  22. Kossaifi, J., Panagakis, Y., Anandkumar, A., Pantic, M.: Tensorly: tensor learning in python (2018). CoRR arXiv:1610.09555

  23. Kumar, R.M., Sreekumar, K.: A survey on image feature descriptors. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(1), 7668–7673 (2014)

    Google Scholar 

  24. Kurban, H., Jenne, M., Dalkilic, M.M.: Using data to build a better em: Em* for big data. Int. J. Data Sci. Anal. 4(2), 83–97 (2017)

    Article  Google Scholar 

  25. Labiod, L., Nadif, M.: Co-clustering under nonnegative matrix tri-factorization. In: International Conference on Neural Information Processing, pp. 709–717. Springer (2011)

  26. Labiod, L., Nadif, M.: A unified framework for data visualization and coclustering. IEEE Trans. Neural Netw. Learn. Syst. 26(9), 2194–2199 (2014)

    Article  MathSciNet  Google Scholar 

  27. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  28. Pagès, J.: Multiple Factor Analysis by Example Using R. Chapman and Hall, London (2014)

    Book  MATH  Google Scholar 

  29. Role, F., Morbieu, S., Nadif, M.: Coclust: a python package for co-clustering. J. Stat. Softw. 88, 1–29 (2019)

    Article  Google Scholar 

  30. Salah, A., Nadif, M.: Model-based von Mises-Fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 246–254. SIAM (2017)

  31. Salah, A., Nadif, M.: Directional co-clustering. In: Advances in Data Analysis and Classification, pp. 1–30 (2018)

  32. Steinley, D.: Properties of the Hubert-Arabie adjusted rand index. Psychol. Methods 9(3), 386 (2004)

    Article  Google Scholar 

  33. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  34. Vu, D., Aitkin, M.: Variational algorithms for biclustering models. In: Computational Statistics and Data Analysis, pp. 12–24 (2015)

  35. Wu, T., Benson, A.R., Gleich, D.F.: General tensor spectral co-clustering for higher-order data. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 2559–2567. Curran Associates, Inc., Red Hook (2016)

    Google Scholar 

  36. Zhang, H., Yang, M., Yang, W., Lv, J.: Spatial-aware hyperspectral image classification via multifeature kernel dictionary learning. Int. J. Data Sci. Anal. 7(2), 115–129 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafika Boutalbi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This submission is an extension version of the PAKDD 2019 paper ’Co-clustering from Tensor Data’.

Appendices

Appendix: Update \(\tilde{z}_{ik}\) and \(\tilde{w}_{j\ell }\) \(\forall i,k,j,\ell \)

To obtain the expression of \(\tilde{z}_{ik}\), we maximize the above soft criterion \(F_C({\tilde{{\mathbf {z}}}},{\tilde{{\mathbf {w}}}};\Omega )\) with respect to \(\tilde{z}_{ik}\), subject to the constraint \(\sum _k \tilde{z}_{ik}=1\). The corresponding Lagrangian, up to terms which are not function of \(\tilde{z}_{ik}\), is given by :

$$\begin{aligned} L({\widetilde{{\mathbf {z}}}},\beta )&= \sum _{i,k} \tilde{z}_{ik} \log \pi _{k} + \sum _{i,j,k,\ell } \tilde{z}_{ik}\tilde{w}_{jk} \log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k\ell }))\nonumber \\&-\sum _{i,k}\tilde{z}_{ik}\log (\tilde{z}_{ik}) + \beta (1-\sum _{k} \tilde{z}_{ik}). \end{aligned}$$
(9)

Taking derivatives with respect to \(\tilde{z}_{ik}\), we obtain:

$$\begin{aligned} \frac{\partial L({\widetilde{{\mathbf {z}}}},\beta )}{\partial \tilde{z}_{ik}}= & {} \log \pi _{k} + \sum _{j,\ell }w_{j\ell }\log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k\ell })) \\&- \log \tilde{z}_{ik} - 1 - \beta . \end{aligned}$$

Setting this derivative to zero yields:

$$\begin{aligned} \tilde{z}_{ik} = \frac{ \pi _{k} \exp (\sum _{j,\ell }w_{j\ell }\log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k\ell }))}{\exp (\beta +1)}. \end{aligned}$$

Summing both sides over all \(k'\) yields

$$\begin{aligned} \exp (\beta + 1)= \sum _{k'} \pi _{k'} \exp (\sum _{j,\ell }w_{j\ell }\log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k'\ell })). \end{aligned}$$

Plugging \(\exp (\beta )\) in \(\tilde{z}_{ik}\) leads to:

$$\begin{aligned} \tilde{z}_{ik} \propto \pi _{k} \exp (\sum _{j,\ell }w_{j\ell }\log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k\ell })). \end{aligned}$$

In the same way, we can estimate \(\tilde{w}_{jk}\) maximizing \(F_C({\tilde{{\mathbf {z}}}},{\tilde{{\mathbf {w}}}};\Omega )\) with respect to \(\tilde{w}_{j\ell }\), subject to the constraint \(\sum _\ell \tilde{w}_{j\ell }=1\); we obtain

$$\begin{aligned} \tilde{w}_{j\ell } \propto \rho _{k} \exp (\sum _{i,k}\tilde{z}_{ik}\log (\varPhi ({\mathbf {x}}_{ij},\varvec{\lambda }_{k\ell })). \end{aligned}$$

Appendix: Estimation of the \(\mu _{k\ell }\)’s and \(\varSigma _{k\ell }\)’s of Gaussian TLBM

The \(\mu _{k\ell }\)’s and \(\varSigma _{k\ell }\)’s can be obtained from the following derivatives:

$$\begin{aligned} \frac{\partial {{{{\mathcal {F}}}}_C^{k\ell }}}{\partial {\varvec{\mu }_{k\ell }}}=\frac{\partial {{{{\mathcal {L}}}}_C^{k\ell }}}{\partial {\varvec{\mu }_{k\ell }}} \quad \text{ and } \quad \frac{\partial {{{{\mathcal {F}}}}_C^{k\ell }}}{\partial {\varvec{\varSigma }_{k\ell }}}=\frac{\partial {{{{\mathcal {L}}}}_C^{k\ell }}}{\partial {\varvec{\varSigma }_{k\ell }}} \end{aligned}$$

where

$$\begin{aligned} {{{\mathcal {L}}}}_C^{k\ell }= & {} -\frac{1}{2} \tilde{z}_{.k}\tilde{w}_{.\ell }\log |\varSigma _{k\ell }|\\&\quad - \frac{1}{2}\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell }({\mathbf {x}}_{ij}-\varvec{\mu }_{k\ell })^\top \varvec{\varSigma }_{k\ell }^{-1}({\mathbf {x}}_{ij}-\varvec{\mu }_{k\ell }), \end{aligned}$$

with \( \tilde{z}_{.k}=\sum _i{\tilde{z}_{ik}}\) and \(\tilde{w}_{.\ell }=\sum _j \tilde{w}_{j\ell }.\) The following formulas involving the vector-by-vector (\({\mathbf {x}}\)) and matrix-by-matrix (\({\mathbf {M}}\)) derivatives

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\partial {{\mathbf {x}}^\top {\mathbf {M}}{\mathbf {x}}}}{\partial {{\mathbf {x}}}}=2 {\mathbf {M}}{\mathbf {x}}, \\ \dfrac{\partial {\log |{\mathbf {M}}|}}{\partial {{\mathbf {M}}}}=({\mathbf {M}}^{-1})^\top ,\\ \dfrac{\partial {{\mathbf {x}}^\top {\mathbf {M}}{\mathbf {x}}}}{\partial {{\mathbf {M}}}}=({\mathbf {M}}^{-1}){\mathbf {x}}{\mathbf {x}}^{\top }({\mathbf {M}}^{-1})^\top \end{array}\right. } \end{aligned}$$

lead to

$$\begin{aligned} \frac{\partial {{{{\mathcal {L}}}}_C^{k\ell }}}{\partial {\varvec{\mu }_{k\ell }}}=-\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell }\varvec{\varSigma }_{k\ell }^{-1} ({\mathbf {x}}_{ij}-\varvec{\mu }_{k\ell }) \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \frac{\partial {{{{\mathcal {L}}}}_C^{k\ell }}}{\partial {\varvec{\varSigma }_{k\ell }}}&=-\tilde{z}_{.k}\tilde{w}_{.\ell }\log (\varSigma _{k\ell }^{-1})^\top \\&\quad + \frac{1}{2}\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell } (\varvec{\varSigma }_{k\ell }^{-1})^\top ({\mathbf {x}}_{ij}-\varvec{\mu }_{k\ell })\\&\qquad \times ({\mathbf {x}}_{ij}-\varvec{\mu }_{k\ell })^\top (\varvec{\varSigma }_{k\ell }^{-1})^\top . \end{aligned} \end{aligned}$$

The two partial derivatives set to 0 lead to

$$\begin{aligned} {\hat{\varvec{\mu }}}_{k\ell }=\frac{\sum _{i,j}\tilde{z}_{ik} \tilde{w}_{j\ell }{\mathbf {x}}_{ij}}{\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell }}, \end{aligned}$$

and

$$\begin{aligned} {\hat{\varvec{\varSigma }}}_{k\ell }=\frac{\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell }({\mathbf {x}}_{ij} -\varvec{\mu }_{k\ell })({\mathbf {x}}_{ij} -\varvec{\mu }_{k\ell })^{\top }}{\sum _{i,j}\tilde{z}_{ik}\tilde{w}_{j\ell }}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boutalbi, R., Labiod, L. & Nadif, M. Tensor latent block model for co-clustering. Int J Data Sci Anal 10, 161–175 (2020). https://doi.org/10.1007/s41060-020-00205-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-020-00205-5

Keywords

Navigation