Abstract
Clustering is a hot topic in machine learning. For high dimension data, nonnegative matrix factorization (NMF) is a crucial technology in clustering. However, NMF has some disadvantages. First, NMF clusters data in original space while outliers and noise will weaken NMF clustering results. Second, NMF does not take local structure which is beneficial for clustering of data into consideration. To address these two disadvantages, a new algorithm is proposed called nonnegative matrix factorization with the nearest neighbor after per-treatments (PNNMF). Per-treatments are used to alleviate effects of outliers and noise. After per-treatments, some credible connected components generated by the nesrest neighbor of data are chosen to capture local structure. Moreover a new initialization for basis matrix is proposed basing these credible connected components. Experiments on real data sets confirm the effectiveness of PNNMF.
Similar content being viewed by others
Data availability statement
The datasets analysed during the current study are available in the homepage of Deng Cai (http://www.cad.zju.edu.cn/home/dengcai/).
References
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal mach Intell 19(7):711–720
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Cai D, He X, Han J, et al (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Chen WS, Liu J, Pan B, et al (2019) Face recognition using nonnegative matrix factorization with fractional power inner product kernel. Neurocomputing 348:40–53
Ding C, Li T, Peng W et al (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’06. p 126–135, DOI https://doi.org/10.1145/1150402.1150420
Gao B, Woo WL, Dlay SS (2012) Variational regularized 2-d nonnegative matrix factorization. IEEE Trans Neural Netw Learn Syst 23(5):703–716
Hedjam R, Abdesselam A, Melgani F (2021) Nmf with feature relationship preservation penalty term for clustering problems. Pattern Recogn 112:107814. https://doi.org/10.1016/j.patcog.2021.107814
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2 (1):193–218
Kong D, Ding C, Huang H (2011) Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM international conference on information and knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’11, p 673–682, DOI https://doi.org/10.1145/2063576.2063676
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. pp 556–562. MIT Press, Cambridge, MA, USA
Li Z, Tang J (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288. https://doi.org/10.1109/TIP.2016.2624140
Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Networks Learn Syst 29 (5):1947–1960. https://doi.org/10.1109/TNNLS.2017.2691725
Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell PP:1–1. https://doi.org/10.1109/TPAMI.2018.2852750
Li Z, Tang J, Zhang L, et al (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128. https://doi.org/10.1007/s11263-020-01331-0
Peng S, Ser W, Chen B, et al (2020) Robust orthogonal nonnegative matrix tri-factorization for data representation. Knowl-Based Syst 201-202:106054. https://doi.org/10.1016/j.knosys.2020.106054
Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA., pp 8934–8943
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal mach Intell 22(8):888–905
Sun Y, Wang J, Guo J et al (2022) Globality constrained adaptive graph regularized non-negative matrix factorization for data representation. IET Image Processing 16(10):2577–2592
Tang C, Bian M, Liu X, et al (2019) Unsupervised feature selection via latent representation learning and manifold regularization. Neural Netw 117:163–178
Wang Y, Chen L, Mei JP (2014) Stochastic gradient descent based fuzzy clustering for large data. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). p 2511–2518. https://doi.org/10.1109/FUZZ-IEEE.2014.6891755
Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. pp 202–209. Association for Computing Machinery, New York, NY, USA
Zhang X, Gao H, Li G, et al (2018) Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition. Inf Sci 432:463–478
Zhang Z, Jia L, Zhao M, et al (2018) Adaptive non-negative projective semi-supervised learning for inductive classification. Neural Netw 108:128–145
Zhou J (2019) Research of swnmf with new iteration rules for facial feature extraction and recognition. Symmetry 11(3):354
Zurada JM, Ensari T, Asl EH et al (2013) Nonnegative matrix factorization and its application to pattern analysis and text mining. In: 2013 Federated Conference on Computer Science and Information Systems, IEEE, pp 11–16
Acknowledgments
This work is supported by the National Natural Science Foundation of China(11961010, 61967004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest statement
The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
To proof Theorem 1, we first give a definition and two lemmas.
Definition 1: \(G(v, v^{\prime })\) is an auxiliary function of F(v) if \(G(v, v^{\prime })\) sastisfies: 1.\(G(v, v^{\prime })\) \(\geqslant \) F(v); 2.G(v,v) = F(v).
Lemma 1: If G is the auxiliary function of F, F is unincreased under the condition
Proof: F(vt+ 1) = G(vt+ 1,vt+ 1) \(\leqslant \) G(vt+ 1,vt) \(\leqslant \) G(vt,vt) = F(vt). \(_{\square }\)
We can rewrite the cost function (16) as
Now we consider the element vab in V. To update the vab, we denote the related part of vab in O as:
Basing Fab, we can calculate its first-order derivative and second-order derivative as follow:
Lemma 2: Function
is the auxiliary function of Fab.
Proof: It is obvious that \(G(v^{t}_{ab}, v^{t}_{ab}) = F_{ab}(v)\). Next we will prove that \(G(v, v^{t}_{ab}) \geqslant F_{ab}(v)\).
Because Fab(v) is a quadratic function, the Taylor expansion of Fab(v) on \(v_{ab}^{t}\) is
If \(G(v, v^{t}_{ab}) \geqslant F_{ab}(v)\), we only need
We can rewrite (20) as
Because V and U are nonnegative, we have
It means the (20) is hold and the Lemma 2 is proved. \(_{\square }\)
Now we give the proof of Theorem 1.
Proof: For each element in V, we can find its auxiliary function \(G(v, v^{t}_{ab})\). Because \(G(v, v^{t}_{ab})\) is a quadratic function, calculate the first-order derivative as follow:
Let the derivative equal 0 and we can get the update rule:
It is same as the update rule (19). For U, we can use the similar method to prove.
So, the proof of Theorem 1 is done. \(_{\square }\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jia, M., Li, X. & Zhang, Y. An algorithm of non-negative matrix factorization with the nearest neighbor after per-treatments. Multimed Tools Appl 82, 30669–30688 (2023). https://doi.org/10.1007/s11042-023-14571-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14571-2