Skip to main content
Log in

An algorithm of non-negative matrix factorization with the nearest neighbor after per-treatments

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Clustering is a hot topic in machine learning. For high dimension data, nonnegative matrix factorization (NMF) is a crucial technology in clustering. However, NMF has some disadvantages. First, NMF clusters data in original space while outliers and noise will weaken NMF clustering results. Second, NMF does not take local structure which is beneficial for clustering of data into consideration. To address these two disadvantages, a new algorithm is proposed called nonnegative matrix factorization with the nearest neighbor after per-treatments (PNNMF). Per-treatments are used to alleviate effects of outliers and noise. After per-treatments, some credible connected components generated by the nesrest neighbor of data are chosen to capture local structure. Moreover a new initialization for basis matrix is proposed basing these credible connected components. Experiments on real data sets confirm the effectiveness of PNNMF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability statement

The datasets analysed during the current study are available in the homepage of Deng Cai (http://www.cad.zju.edu.cn/home/dengcai/).

References

  1. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans Pattern Anal mach Intell 19(7):711–720

    Article  Google Scholar 

  2. Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637

    Article  Google Scholar 

  3. Cai D, He X, Han J, et al (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560

    Google Scholar 

  4. Chen WS, Liu J, Pan B, et al (2019) Face recognition using nonnegative matrix factorization with fractional power inner product kernel. Neurocomputing 348:40–53

    Article  Google Scholar 

  5. Ding C, Li T, Peng W et al (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’06. p 126–135, DOI https://doi.org/10.1145/1150402.1150420

  6. Gao B, Woo WL, Dlay SS (2012) Variational regularized 2-d nonnegative matrix factorization. IEEE Trans Neural Netw Learn Syst 23(5):703–716

    Article  Google Scholar 

  7. Hedjam R, Abdesselam A, Melgani F (2021) Nmf with feature relationship preservation penalty term for clustering problems. Pattern Recogn 112:107814. https://doi.org/10.1016/j.patcog.2021.107814

    Article  Google Scholar 

  8. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2 (1):193–218

    Article  MATH  Google Scholar 

  9. Kong D, Ding C, Huang H (2011) Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM international conference on information and knowledge management. Association for Computing Machinery, New York, NY, USA, CIKM ’11, p 673–682, DOI https://doi.org/10.1145/2063576.2063676

  10. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788

    Article  MATH  Google Scholar 

  11. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. pp 556–562. MIT Press, Cambridge, MA, USA

  12. Li Z, Tang J (2017) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288. https://doi.org/10.1109/TIP.2016.2624140

    Article  MathSciNet  MATH  Google Scholar 

  13. Li Z, Tang J, He X (2018) Robust structured nonnegative matrix factorization for image representation. IEEE Trans Neural Networks Learn Syst 29 (5):1947–1960. https://doi.org/10.1109/TNNLS.2017.2691725

    Article  MathSciNet  Google Scholar 

  14. Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell PP:1–1. https://doi.org/10.1109/TPAMI.2018.2852750

    Article  Google Scholar 

  15. Li Z, Tang J, Zhang L, et al (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128. https://doi.org/10.1007/s11263-020-01331-0

  16. Peng S, Ser W, Chen B, et al (2020) Robust orthogonal nonnegative matrix tri-factorization for data representation. Knowl-Based Syst 201-202:106054. https://doi.org/10.1016/j.knosys.2020.106054

    Article  Google Scholar 

  17. Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, USA., pp 8934–8943

  18. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal mach Intell 22(8):888–905

    Article  Google Scholar 

  19. Sun Y, Wang J, Guo J et al (2022) Globality constrained adaptive graph regularized non-negative matrix factorization for data representation. IET Image Processing 16(10):2577–2592

    Article  Google Scholar 

  20. Tang C, Bian M, Liu X, et al (2019) Unsupervised feature selection via latent representation learning and manifold regularization. Neural Netw 117:163–178

    Article  Google Scholar 

  21. Wang Y, Chen L, Mei JP (2014) Stochastic gradient descent based fuzzy clustering for large data. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). p 2511–2518. https://doi.org/10.1109/FUZZ-IEEE.2014.6891755

  22. Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. pp 202–209. Association for Computing Machinery, New York, NY, USA

  23. Zhang X, Gao H, Li G, et al (2018) Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition. Inf Sci 432:463–478

    Article  MathSciNet  MATH  Google Scholar 

  24. Zhang Z, Jia L, Zhao M, et al (2018) Adaptive non-negative projective semi-supervised learning for inductive classification. Neural Netw 108:128–145

    Article  MATH  Google Scholar 

  25. Zhou J (2019) Research of swnmf with new iteration rules for facial feature extraction and recognition. Symmetry 11(3):354

    Article  MATH  Google Scholar 

  26. Zurada JM, Ensari T, Asl EH et al (2013) Nonnegative matrix factorization and its application to pattern analysis and text mining. In: 2013 Federated Conference on Computer Science and Information Systems, IEEE, pp 11–16

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China(11961010, 61967004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangli Li.

Ethics declarations

Conflict of interest statement

The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To proof Theorem 1, we first give a definition and two lemmas.

Definition 1: \(G(v, v^{\prime })\) is an auxiliary function of F(v) if \(G(v, v^{\prime })\) sastisfies: 1.\(G(v, v^{\prime })\) \(\geqslant \) F(v); 2.G(v,v) = F(v).

Lemma 1: If G is the auxiliary function of F, F is unincreased under the condition

$$ \begin{array}{@{}rcl@{}} v^{t+1} = \mathop{\arg\min}_{v} G(v, v^{t}). \end{array} $$

Proof: F(vt+ 1) = G(vt+ 1,vt+ 1) \(\leqslant \) G(vt+ 1,vt) \(\leqslant \) G(vt,vt) = F(vt). \(_{\square }\)

We can rewrite the cost function (16) as

$$ \begin{array}{@{}rcl@{}} O = tr({X^{P}}^{T}X^{P}) - 2\sum\limits^{n}_{i=1}\sum\limits^{k}_{j=1} V_{ij}(U^{T}X)_{ji} + \sum\limits^{n}_{i=1}\sum\limits^{k}_{j=1} V_{ij}(U^{T}UV^{T})_{ji} \end{array} $$
$$ \begin{array}{@{}rcl@{}} + \lambda \sum\limits^{k}_{i=1}\sum\limits^{n}_{j=1} V^{T}_{ij}((H^{T}-B^{T})(H-B)V)_{ji}. \end{array} $$

Now we consider the element vab in V. To update the vab, we denote the related part of vab in O as:

$$ \begin{array}{@{}rcl@{}} F_{ab} = 2v_{ab}(U^{T}X^{P})_{ab}+v_{ab}(U^{T}UV^{T})_{ab}+(VU^{T}U)_{ab}v^{T}_{ab}+\lambda v^{T}_{ba}((H^{T}-B^{T})(H-B)V)_{ab} \end{array} $$
$$ \begin{array}{@{}rcl@{}} +\lambda (V^{T}(H^{T}-B^{T})(H-B))_{ba}v_{ab}. \end{array} $$

Basing Fab, we can calculate its first-order derivative and second-order derivative as follow:

$$ \begin{array}{@{}rcl@{}} F_{ab}^{\prime} = -2({X^{P}}^{T}U)_{ab}+2(VU^{T}U)_{ab}+2\lambda ((H^{T}-B^{T})(H-T)V)_{ab}. \end{array} $$
$$ \begin{array}{@{}rcl@{}} F_{ab}^{\prime\prime} = 2\sum\limits_{i=1}^{m}U_{ib}^{2}+2\lambda \sum\limits_{i=1}^{n}(H_{ia}-B_{ia})^{2}. \end{array} $$

Lemma 2: Function

$$ \begin{array}{@{}rcl@{}} G(v, v^{t}_{ab}) = F_{ab}(v^{t}_{ab})+F^{\prime}_{ab}(v^{t}_{ab})(v-v^{t}_{ab})+\frac{(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}}{v^{t}_{ab}}(v-v^{t}_{ab})^{2} \end{array} $$

is the auxiliary function of Fab.

Proof: It is obvious that \(G(v^{t}_{ab}, v^{t}_{ab}) = F_{ab}(v)\). Next we will prove that \(G(v, v^{t}_{ab}) \geqslant F_{ab}(v)\).

Because Fab(v) is a quadratic function, the Taylor expansion of Fab(v) on \(v_{ab}^{t}\) is

$$ \begin{array}{@{}rcl@{}} F_{ab}(v) = F_{ab}(v^{t}_{ab})+F^{\prime}_{ab}(v^{t}_{ab})(v-v^{t}_{ab})+(\sum\limits_{i=1}^{m}U_{ib}^{2}+\lambda \sum\limits_{i=1}^{n}(H_{ia}-B_{ia})^{2})(v-v^{t}_{ab})^{2}. \end{array} $$

If \(G(v, v^{t}_{ab}) \geqslant F_{ab}(v)\), we only need

$$ \begin{array}{@{}rcl@{}} \frac{(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}}{v^{t}_{ab}} \geqslant \sum\limits_{i=1}^{m}U_{ib}^{2}+\lambda \sum\limits_{i=1}^{n}(H_{ia}-B_{ia})^{2}. \end{array} $$
(20)

We can rewrite (20) as

$$ \begin{array}{@{}rcl@{}} (VU^{T}U+\lambda H^{T}HV+\lambda E^{T}EV)_{ab} \geqslant v^{t}_{ab}(U^{T}U)_{bb}+\lambda (H^{T}H-H^{T}B-B^{T}H+B^{T}B)_{aa}v^{t}_{ab}. \end{array} $$
(21)

Because V and U are nonnegative, we have

$$ \begin{array}{@{}rcl@{}} (VU^{T}U+\lambda H^{T}HV+\lambda E^{T}EV)_{ab} \geqslant v_{ab}^{t}(U^{T}U)_{bb}+\lambda (H^{T}H+B^{T}B)_{aa}v^{t}_{ab} \end{array} $$
$$ \begin{array}{@{}rcl@{}} \geqslant v^{t}_{ab}(U^{T}U)_{bb}+\lambda (H^{T}H-H^{T}B-B^{T}H+B^{T}B)_{aa}v^{t}_{ab}. \end{array} $$

It means the (20) is hold and the Lemma 2 is proved. \(_{\square }\)

Now we give the proof of Theorem 1.

Proof: For each element in V, we can find its auxiliary function \(G(v, v^{t}_{ab})\). Because \(G(v, v^{t}_{ab})\) is a quadratic function, calculate the first-order derivative as follow:

$$ \begin{array}{@{}rcl@{}} G^{\prime}(v, v^{t}_{ab})=F^{\prime}_{ab}(v_{ab}^{t})+\frac{2(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}}{v_{ab}^{t}}(v-v^{t}_{ab}). \end{array} $$

Let the derivative equal 0 and we can get the update rule:

$$ \begin{array}{@{}rcl@{}} v^{t+1}_{ab} = v^{t}_{ab}-\frac{v^{t}_{ab}}{2} \frac{F^{\prime}_{ab}(v_{ab}^{t})}{(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}} \end{array} $$
$$ \begin{array}{@{}rcl@{}} =v^{t}_{ab}-\frac{v^{t}_{ab}}{2} \frac{2((-{X^{P}}^{T}U+VU^{T}U+\lambda (H^{T}-B^{T})(H-B)V))_{ab}}{(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}} \end{array} $$
$$ \begin{array}{@{}rcl@{}} =\frac{({X^{P}}^{T}U+\lambda H^{T}BV+\lambda B^{T}HV)_{ab}}{(VU^{T}U+\lambda H^{T}HV+\lambda B^{T}BV)_{ab}}v^{t}_{ab}. \end{array} $$

It is same as the update rule (19). For U, we can use the similar method to prove.

So, the proof of Theorem 1 is done. \(_{\square }\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, M., Li, X. & Zhang, Y. An algorithm of non-negative matrix factorization with the nearest neighbor after per-treatments. Multimed Tools Appl 82, 30669–30688 (2023). https://doi.org/10.1007/s11042-023-14571-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14571-2

Keywords

Navigation