Noise-free latent block model for high dimensional data


Co-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.

    The datasets can be found here: and the code will be available upon publication.

Laclau, C., Brault, V. Noise-free latent block model for high dimensional data. Data Min Knowl Disc 33, 446–473 (2019).

  • Latent block model
  • Feature selection
  • Clustering
  • Biclustering
  • High dimensional data