Abstract
Bi-clustering, or co-clustering, refers to the task of finding sub-matrices (indexed by a group of columns and a group of rows) within a matrix such that the elements of each sub-matrix are related in some way, for example, that they are similar under some metric. As in traditional clustering, a crucial parameter in bi-clustering methods is the number of groups that one expects to find in the data, something which is not always available or easy to guess. The present paper proposes a novel method for performing bi-clustering based on the concept of low-rank sparse non-negative matrix factorization (S-NMF), with the additional benefit that the optimum rank k is chosen automatically using a minimum description length (MDL) selection procedure, which favors models which can represent the data with fewer bits. This MDL procedure is tested in combination with three different S-NMF algorithms, two of which are novel, on a simulated example in order to assess the validity of the procedure.
Chapter PDF
Similar content being viewed by others
References
Madeira, S., Oliveira, A.: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE Trans. CBB 1(1), 24–45 (2004)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 7880–791 (1999)
Hoyer, P.: Non-negative matrix factorization with sparseness constraints. JMLR 5, 1457–1469 (2004)
Barron, A., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. IT 44(6), 2743–2760 (1998)
Jornsten, R., Yu, B.: Simultaneous gene clustering and subset selection for sample classification via MDL. Bioinformatics 19(9), 1100–1109 (2003)
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research 37, 3311–3325 (1997)
Aharon, M., Elad, M., Bruckstein, A.: The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Trans. SP 54(11), 4311–4322 (2006)
A bi-clustering formulation of multiple model estimation (submitted, 2013)
Zou, H., Hastie, T., Tibshirani, R.: Sparse Principal Component Analysis. Computational and Graphical Statistics 15(2), 265–286 (2006)
Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mitterecker, A., Kasim, A., Adetayo, K., Tatsiana, S., Suzy, V., Lin, D., Talloen, W., Bijnens, L., Shkedy, Z.: FABIA: factor analysis for biclustering acquisition. Bioinformatics 26(12), 1520–1527 (2010)
Lee, M., Shen, H., Huang, J.Z., Marron, J.S.: Biclustering via sparse singular value decomposition. Biometrics 66(4), 1087–1095 (2010)
Bruckstein, A.M., Donoho, D.L., Elad, M.: From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images. SIAM Review 51(1), 34–81 (2009)
Beck, A., Teboulle, M.: A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences 2(1), 183–202 (2009)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive function approximation with applications to wavelet decomposition. In: Proc. 27th Ann. Asilomar Conf. Signals, Systems, and Computers (1993)
Cover, T.M.: Enumerative source coding. IEEE Trans. Inform. Theory 19, 73–77 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramírez, I., Tepper, M. (2013). Bi-clustering via MDL-Based Matrix Factorization. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-41822-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)