Abstract
In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at https://github.com/redakhoufache/Distributed-NPLBM
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aeberhard, S., Forina, M.: Wine. UCI Machine Learning Repository (1991). https://doi.org/10.24432/C5PC7J
Ben Slimen, Y., Allio, S., Jacques, J.: Model-based co-clustering for functional data. Neurocomputing 291, 97–108 (2018)
Box, G.E., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B Stat Methodol. 26(2), 211–243 (1964)
Cheng, X., Su, S., Gao, L., Yin, J.: Co-clusterd: a distributed framework for data co-clustering with sequential updates. IEEE Trans. Knowl. Data Eng. 27(12), 3231–3244 (2015)
Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in rnalater preservative. J Mol Diagn 8(1), 31–39 (2006)
Deodhar, M., Jones, C., Ghosh, J.: Parallel simultaneous co-clustering and learning with map-reduce. In: 2010 IEEE International Conference on Granular Computing, pp. 149–154 (2010)
Folino, F., Greco, G., Guzzo, A., Pontieri, L.: Scalable parallel co-clustering over multiple heterogeneous data types, pp. 529 – 535, August 2010
Goffinet, E.: Multi-Block Clustering and Analytical Visualization of Massive Time Series from Autonomous Vehicle Simulation. Theses, Université Paris 13 Sorbonne Paris Nord, December 2021
Goffinet, E., Lebbah, M., Azzag, G., Loic, G., Coutant, A.: Non-parametric multivariate time series co-clustering model applied to driving-assistance systems validation. In: International Workshop on Advanced Analysis & Learning on Temporal Data (2021)
Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recogn. 36, 463–473 (2003)
Greco, G., Guzzo, A., Pontieri, L.: Coclustering multiple heterogeneous domains: Linear combinations and agreements. IEEE Trans. Knowl. Data Eng. 22(12), 1649–1663 (2010)
Hanisch, D., Zien, A., Zimmer, R.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, 05 (2002)
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)
Meeds, E., Roweis, S., Meeds, E., Roweis, S.: Nonparametric bayesian biclustering (2007)
Murphy, K.P.: Conjugate bayesian analysis of the gaussian distribution. def 1(2\(\sigma \)2), 16 (2007)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)
Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
Papadimitriou, S., Sun, J.: Disco: distributed co-clustering with map-reduce: a case study towards petabyte-scale end-to-end mining. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 512–521 (2008)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Acknowledgements
This work has been supported by the Paris Île-de-France Région in the framework of DIM AI4IDF. I thank Grid5000 for providing the essential computational resources and the start-up HephIA for the invaluable exchange on scalable algorithms.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Khoufache, R., Belhadj, A., Azzag, H., Lebbah, M. (2024). Distributed MCMC Inference for Bayesian Non-parametric Latent Block Model. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_22
Download citation
DOI: https://doi.org/10.1007/978-981-97-2242-6_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2241-9
Online ISBN: 978-981-97-2242-6
eBook Packages: Computer ScienceComputer Science (R0)