Abstract
In this paper, we present a new SOM-based bi-clustering approach for continuous data. This approach is called Bi-SOM (for Bi-clustering based on Self-Organizing Map). The main goal of bi-clustering aims to simultaneously group the rows and columns of a given data matrix. In addition, we propose in this work to deal with some issues related to this task: (1) the topological visualization of bi-clusters with respect to their neighborhood relation, (2) the optimization of these bi-clusters in macro-blocks and (3) the dimensionality reduction by eliminating noise blocks, iteratively. Finally, experiments are given over several data sets for validating our approach in comparison with other bi-clustering methods.
Similar content being viewed by others
References
Angiulli F, Cesario E, Pizzuti C (2008) Random walk biclustering for microarray data. Inf Sci 178:1479–1497
Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 21:2859–2865
BenDor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order preserving sub matrix problem. J Comput Biol 10(3–4):373–384
Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression. Bioinformatics 20(13):1993–2003
Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. CBMS 2005:383–388
Busygin S, Jacobsen G, Kramer E (2002) Double conjugated clustering applied to leukemia microarray data. In: Proceedings of the 2nd SIAM international conference on data mining, workshop on clustering high dimensional data
Cheng Y, Church G (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for molecular biology (ISMB’00), vol 8, pp 93–103
Cottrell M, Ibbou S, Letrémy P (2004) Som-based algorithms for qualitative variables. Neural Netw 17(8–9):1149–1167
Cottrell M, Letrémy MP (2005) How to use the kohonen algorithm to simultaneously analyze individuals and modalities in a survey. Neurocomputing 63:193–207
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274
Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868
Fort J, Cottrel M, Letrémy P (2001) Stochastic on-row algorithm versus batch algorithm for quantization and self-organizing maps. Neural networks for signal processing XI, 2001. In: Proceedings of the 2001 IEEE signal processing society workshop, pp 43–52
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression. Science 286:531–537
Govaert G, Nadif M (2008) Block clustering with mixture models: comparison of different approaches. Comput Stat Data Anal 52:3233–3245
Govaert G (1983) Classification Croisée. Thèse d’état, Université de Paris6
Hartigan J (1972) Direct clustering of data matrix. J Am Stat Assoc 67(337):123–129
Hartigan J (1975) Direct splitting. Clustering algorithms, Chap. 14. Wiley, New York, pp 251–277
Klugar Y, Basri R, Chang J, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Stat Sin 12:61–86
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297
Madeira S, Oliveira A (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45
Meeds E, Roweis S (2007) Nonparametric bayesian bi-clustering. Technical report
Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12):2464–2477
Murali T, Kasif S (2003) Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 8:77–88
Pensa R, Boulicaut J-F, Cordero F, Atzori M (2010) Co-clustering numerical data under user-defined constraints. Stat Anal Data Min 3(1):38–55
Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1131
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Santamaria R, Quintales L, Theron R (2007) Methods to bicluster validation and comparison in microarray data. In: Proceedings of IDEAL 2007, LNCS4881, pp 780–789
Schummer M, Ng W, Bumgarner R, Nelson P, Schummer B, Bednarski D, Hassell L, Baldwin R, Karlan B, Hood L (1999) Comparative hybridization of an array of 21500 ovarian cdnas for the discovery of genes overexpressed in ovarian carcinomas. Gene 238(2):375–385
Shi J, Malik J (2000) Normalized cuts and image segmentation. Technical report, University of California at Berkeley, Berkeley, CA, USA
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18:36–44
Xiaowen L, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1):50–56
Yang J, Wang W, Wang H, Yu P (2003) Enhanced biclustering on expression data. BIBE ’03, pp. 321–327
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Benabdeslem, K., Allab, K. Bi-clustering continuous data with self-organizing map. Neural Comput & Applic 22, 1551–1562 (2013). https://doi.org/10.1007/s00521-012-1047-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1047-6