Abstract
Image clustering plays an important role in computer vision and machine learning. However, most of the existing clustering algorithms flatten the image into one-dimensional vector as an image representation for subsequent learning without fully considering the spatial relationship between pixels, which may lose some useful intrinsic structural information of the matrix data samples and result in high computational complexity. In this paper, we propose a novel two-dimensional k-subspace clustering (2DkSC). By projecting data samples into a discriminant low-dimensional space, 2DkSC maximizes the between-cluster difference and meanwhile minimizes within-cluster distance of matrix data samples in the projected space, thus dimensionality reduction and clustering can be realized simultaneously. The weight between the between-cluster and within-cluster terms is derived from a Bhattacharyya upper bound, which is determined by the involved input data samples. This weighting constant makes the proposed 2DkSC adaptive without setting any parameters, which improves the computational efficiency. Moreover, 2DkSC can be effectively solved by a standard eigenvalue decomposition problem. Experimental results on three different types of image datasets show that 2DkSC achieves the best clustering results in terms of average clustering accuracy and average normalized mutual information, which demonstrates the superiority of the proposed method.
Similar content being viewed by others
References
Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining. Addison Wesley, Boston
Zheng CT (2018) C, Liu, H. San Wong, Corpus based topic diffusion for short text clustering, Neurocomputing 275:2444–2458
Abasi AK, Khader AT, Al-Betar MA et al (2020) Link based multi verse optimizer for text documents clustering. Appl Soft Comput 87:106002
Costa G, Ortale R (2021) Jointly modeling and simultaneously discovering topics and clusters in text corpora using word vectors. Inf Sci 563:226–240
Thirumoorthy K, Muneeswaran K (2021) A hybrid approach for text document clustering using jaya optimization algorithm. Expert Syst Appl 178:115040
Jiang Z, Li T, Min W et al (2017) Fuzzy c-means clustering based on weights and gene expression programming. Pattern Recogn Lett 90:1–7
Shukla AK, Muhuri PK (2019) Big data clustering with interval type 2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intell 77:268–282
Zeng YP, Xu ZS, He Y et al (2020) Fuzzy entropy clustering by searching local border points for the analysis of gene expression data. Knowledge Based Systems 190:105309
Rahman MA, Ang LM, Seng KP (2020) Clustering biomedical and gene expression datasets with kernel density and unique neighborhood set based vein detection. Inf Syst 91:101490
Wang M, Deng WH (2020) Deep face recognition with clustering based domain adaptation. Neurocomputing 393:1–14
Liu N, Guo B, Li XJ et al (2021) Gradient clustering algorithm based on deep learning aerial image detection. Pattern Recogn Lett 141:37–44
Fang U, Li JX, Lu XQ et al (2021) Self-supervised cross-iterative clustering for unlabeled plant disease images. Neurocomputing 456:36–48
Pham TX, Siarry P, Oulhadj H (2018) Integrating fuzzy entropy clustering with an improved PSO for MRI brain image segmentation. Appl Soft Comput 65:230–242
Mahata N, Kahali S, Adhikari SK et al (2018) Local contextual information and Gaussian function induced fuzzy clustering algorithm for brain MR image segmentation and intensity inhomogeneity estimation. Appl Soft Comput 68:586–596
Lei T, Jia X, Zhang Y et al (2019) Superpixel-based fast fuzzy C-means clustering for color image segmentation. IEEE Trans Fuzzy Syst 27(9):1753–1766
Wei D, Wang ZB, Si L et al (2021) An image segmentation method based on a modified local information weighted intuitionistic fuzzy C-means clustering and gold panning algorithm. Eng Appl Artif Intell 101:104209
Wu J, Liu H, Xiong H et al (2015) k-means based consensus clustering: a unified view. IEEE Trans Knowl Data Eng 27(1):155–169
Bradley PS, Mangasarian OL (2000) k-plane clustering. J Global Optim 16(1):23–32
Tseng P (2000) Nearest q-Flat to m Points. J Optim Theory Appl 105:249–252
Liu LM, Guo YR, Wang Z et al (2017) k-proximal plane clustering. Int J Mach Learn Cybern 8(5):1537–1554
Wang Z, Shao YH, Bai L et al (2015) Twin support vector machine for clustering. IEEE Trans Neural Netw Learn Sys 26(10):2583–2588
Khemchandani R, Pal A, Chandra S (2018) Fuzzy least squares twin support vector clustering. Neural Comput Appl 29(2):553–563
Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Arun Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. mExpert Sys With Appl 36(4):7535–7543
Ye Q, Zhao H, Li Z et al (2017) L1-norm distance minimization-based fast robust twin support vector \(k\)-plane clustering. IEEE Trans Neural Netw Learn Sys 29(9):4494–4503
Li CN, Shao YH, Guo YR et al (2019) Robust k-subspace discriminant clustering. Appl Soft Comput 85:105858
Li Z, Yao L, Wang S et al (2020) Adaptive two-dimensional embedded image clustering, Proceedings of the AAAI conference on. Artif Intell 34(04):4796–4803
Lu Y, Yuan C, Lai Z et al (2019) Horizontal and vertical nuclear norm based 2DLDA for image representation. IEEE Trans Circuits Syst Video Technol 29(4):941–955
Li CN, Shao YH, Deng NY (2015) Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw 65:92–104
Li CN, Shang MQ, Shao YH et al (2019) Sparse L1-norm two dimensional linear discriminant analysis via the generalized elastic net regularization. Neurocomputing 337:80–96
Lu Y, Yuan C, Lai Z et al (2018) Horizontal and vertical nuclear norm-based 2DLDA for image representation. IEEE Trans Circuits Syst Video Technol 29(4):941–955
Li CN, Shao YH, Chen WJ et al (2021) Generalized two-dimensional linear discriminant analysis with regularization. Neural Netw 142:73–91
Li CN, Shao YH, Wang Z et al (2019) Robust bilateral Lp-norm two-dimensional linear discriminant analysis. Inf Sci 500:274–297
Guo YR, Bai YQ, Li CN et al (2021) Two dimensional Bhattacharyya bound linear discriminant analysis with its applications. Appl Intell 1-17
Ma Z, Lai Y, Kleijn WB et al (2019) Variational bayesian learning for dirichlet process mixture of inverted dirichlet distributions in non-gaussian image feature modeling. IEEE Trans Neural Netw Learn Sys 30(2):449–463
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 5147–5156. https://doi.org/10.1109/CVPR.2016.556
Xie Y, Lin B, Qu Y et al (2020) Joint deep multi-view learning for image clustering. IEEE Trans Knowledge Data Eng 33(11):3594–3606
Nene SA, Nayar SK, Murase H (1996) Columbia object image library: Coil-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University, New York
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Jain V (2002) The Indian face database, http://vis-www.cs.umass.edu/~vidit/IndianFaceDatabase/
Phillips PJ, Moon H, Rizvi SA et al (2000) The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Nielsen F (2014) Generalized bhattacharyya and chernoff upper bounds on bayes error using quasi-arithmetic means. Pattern Recogn Lett 42:25–34
Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, New York
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.12171307) and Zhejiang Soft Science Research Project (No.2021C35003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In the appendix, we present the proof procedure of the relevant Bhattacharyya error bound. It is further explained that the weighting constant \(\Delta _{i}\) balances the importance between clusters and the importance within clusters, which is derived by minimizing an upper bound of theoretical framework of the Bhattacharyya error bound optimality.
The Bhattacharyya error [43] is a close upper bound to the Bayes error, which is given by
where \({\textbf {X}}\) is a data sample, \(P_i\) is the prior probability, and \(p_i({\textbf {X}})\) is the probability density function of the i-th class of the data.
Proposition 1
Assume \(P_i\) and \(p_i({\textbf {X}})\) are the prior probability and the probability density function of the i-th class for the training data set T, respectively, and the data samples in each class are independent and identically normally distributed. Let \(p_1({\textbf {X}}), p_2({\textbf {X}}),\ldots , p_k({\textbf {X}})\) be the Gaussian functions given by \(p_i({\textbf {X}})=\mathcal {N}({\textbf {X}}\mid {\overline{{\textbf {X}}}}_i, \varvec{\Sigma }_i)\), where \({\overline{{\textbf {X}}}}_i\) and \(\varvec{\Sigma }_i\) are the class mean and the class covariance matrix, respectively. We further suppose \(\varvec{\Sigma }_i=\varvec{\Sigma }\), \(i=1,2,\ldots ,k\), where \(\varvec{\Sigma }\) is the covariance matrix of the data set T, and \({\overline{{\textbf {X}}}}_i\) and \(\varvec{\Sigma }\) can be estimated accurately from T. Then for arbitrary projection vector \({\textbf {w}}\in \mathbb {R}^{m}\), the Bhattacharyya error bound \(\epsilon _B\) defined by (1) on the data set \(\widetilde{T}=\{\widetilde{{\textbf {X}}}_i\mid \widetilde{{\textbf {X}}}_i={\textbf {w}}^T{\textbf {X}}_i\in \mathbb {R}^{1\times n}\}\) satisfies the following [34]:
where \(\Delta =\frac{1}{4}\sum \nolimits _{i<j}^k\frac{\sqrt{N_iN_j}}{N}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2\), \(P_i=\frac{N_i}{N}\), \(P_j=\frac{N_j}{N}\), and \(a>0\) is some constant.
Proof
We first note that \(p_i(\widetilde{{{\textbf {X}}}})=\mathcal {N}(\widetilde{{{\textbf {X}}}}\mid \widetilde{{\overline{{\textbf {X}}}}}_i, \widetilde{\varvec{\Sigma }})\), where \(\widetilde{ {{\textbf {X}}}}_i={\textbf {w}}^T{\textbf {X}}_i\), \(\widetilde{{\overline{{\textbf {X}}}}}_i={\textbf {w}}^T\overline{{\textbf {X}}}_i\in \mathbb {R}^{1\times n}\) is the i-class mean, and \(\widetilde{\varvec{\Sigma }}\) is the covariance matrix in the \(1\times n\) projected space. Denote
Then \(\widetilde{\varvec{\Sigma }}=({\textbf {D}}-\widetilde{\overline{{\textbf {X}}}}_{{\textbf {I}}}) ({\textbf {D}}-\widetilde{\overline{{\textbf {X}}}}_{{\textbf {I}}})^T\).
According to [44], we have
The upper bound of the error \(\epsilon _B\) can be estimated as
where \(\Delta _{ij}'= \frac{1}{4}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2\), \(a>0\) is some constant.
For the first inequality of (5), note that the real value function \(f(z)=e^{-z}\) is concave when \(z\in [0,b]\), \(b>0\); therefore, \(e^{-z}\le 1-\frac{1-e^{-b}}{b}z\). By taking \(a=\frac{1-e^{-b}}{b}\) and noting \(\widetilde{{\overline{{\textbf {X}}}}_{i}}={\textbf {w}}^T{\overline{{\textbf {X}}}}_{i}\), the first inequality is obtained. For the second inequality, we first note that for any \({\textbf {z}}\in \mathbb {R}^{1\times {n}}\) and an invertible \({\textbf {A}}\in \mathbb {R}^{n\times n}\), \(\Vert {\textbf {z}}\Vert _2=\Vert ({\textbf {z}}{} {\textbf {A}}){\textbf {A}}^{-1}\Vert _2\le \Vert {\textbf {z}}{} {\textbf {A}}\Vert _2\cdot \Vert {\textbf {A}}^{-1}\Vert _F\), which implies \(\Vert {\textbf {z}}{} {\textbf {A}}\Vert _2\ge \frac{\Vert {\textbf {z}}\Vert _2}{\Vert {\textbf {A}}^{-1}\Vert _F}\). By taking \({\textbf {z}}={\textbf {w}}^T{{\overline{{\textbf {X}}}}_{i}}-{\textbf {w}}^T{{\overline{{\textbf {X}}}}_{j}}\) and \({\textbf {A}}=\widetilde{\varvec{\Sigma }}^{-\frac{1}{2}}\), we get the second inequality. For the last inequality, since \(\Vert {\textbf {w}}\Vert _2=1\), \(\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2\le \Vert {\textbf {w}}\Vert _2^2\cdot \Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2 = \Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2\) and \(\frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2} \left( 1-\frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\right) \le \frac{1}{4}\), we have
which implies
By multiplying \(\frac{a}{8}\sqrt{P_iP_j}\) to both sides of (7) and summing it over all \(1\le i<j\le k\), we obtain the last inequality of (5).
Take \(\Delta =\sum \nolimits _{i<j}^k\sqrt{P_iP_j}\Delta _{ij}'= \frac{1}{4}\sum \nolimits _{i<j}^k\frac{\sqrt{N_iN_j}}{N}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2\), and note that \(\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2=\sum \nolimits _{i=1}^{k}\sum \nolimits _{s=1}^{N_i}\Vert {\textbf {w}}^T({\textbf {X}}_{s}^{i}-\overline{{\textbf {X}}}_i)\Vert _2^2\), we then obtain (2). \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, Y.R., Bai, Y.Q. Two-dimensional k-subspace clustering and its applications on image recognition. Int. J. Mach. Learn. & Cyber. 14, 2671–2683 (2023). https://doi.org/10.1007/s13042-023-01790-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01790-0