Cauchy balanced nonnegative matrix factorization

Xiong, He; Kong, Deguang; Nie, Feiping

doi:10.1007/s10462-022-10379-y

Cauchy balanced nonnegative matrix factorization

Published: 24 March 2023

Volume 56, pages 11867–11903, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

He Xiong¹,
Deguang Kong² &
Feiping Nie³

377 Accesses
Explore all metrics

Abstract

Nonnegative Matrix Factorization (NMF) plays an important role in many data mining and machine learning tasks. Standard NMF uses the Frobenius norm as the loss function which is well-known to be sensitive to noise. To address this issue, we propose a robust formulation of NMF, i.e., Cauchy-NMF, which is derived based on the assumption that the noise generally follows identical independent distributed (i.i.d.) Cauchy distribution. In particular, we derive the Cauchy Balanced NMF model (Cauchy-B-NMF) using Cauchy distribution, where (a) the numerical value of each element in the coefficient matrix is viewed as the posterior probability, which allows the clustering result to be obtained directly from the coefficient matrix without any additional post-processing; (b) a novel manifold regularization term is incorporated into the loss function, explicitly making the distant data points have dissimilar embeddings, while implicitly making the neighbouring data points have similar embeddings; (c) a balanced clustering term is enforced to achieve the desired equal number of data points across different clusters. We derive an efficient computational algorithm to solve the resultant optimization problem, and also provide a rigorous analysis of the algorithm convergence. Experimental results on several benchmarks demonstrate the effectiveness of our algorithms, which consistently provides better clustering results compared to many other NMF variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Non-negative Matrix Factorization with Schatten p-norms Reguralization

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Article 04 October 2020

Weakly supervised nonnegative matrix factorization for user-driven clustering

Article 05 October 2014

Notes

It is also interesting to see the difference between Eqs. (19) and (20) is that the minuend is always non-negative in Eq. (19) although the absolute values are the same.
In the paper, let LHS be the left-hand-side of an equation, and RHS be the right hand-side of an equation.Biography The resumes of the three authors are listed as follows. He Xiong received the masters degree from the University of Science and Technology of China, in 2013. He is currently a lecture in the School of Computer and Information Engineering at BengBu University. His research interests include machine learning, data mining and computer vision. Deguang Kong received his Ph.D degree in Computer Science from University of Texas in 2013. He currently works at Amazon, and ever worked in Google, Yahoo Research (Sunnyvale), Los Alamos national Lab, NEC research lab, Penn State University and Samsung Research America as a researcher. His research interests focus on feature learning and compressive sensing, user engagement understanding and recommendation,etc. He has published over 30 referred articles in top conferences, including ICML, NIPS, AAAI, CVPR, KDD, ICDM, SDM, WSDM, CIKM, ECML/PKDD, etc. He has served as a program committee member in NIPS, AAAI, IJCAI, KDD, SDM and a reviewer for TPAMI, TKDE, DMKD, TIFS, TNNLS, TDSC, etc. Feiping Nie received the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in 2009. He is currently a professor with the Center for OPTical Imagery Analysis and Learning, Northwestern Polytechnical University, Xian, Shaanxi, China. He has published over 100 articles in the prestigious journals and conferences. His current research interests include machine learning and its applications fields, such as pattern recognition, data mining, computer vision, image processing, and information retrieval. Dr. Nie currently serves as an associate editor or a program committee member for several prestigious journals and conferences in the related fields.

References

Althoff T, Ulges A, Dengel A (2011) Balanced clustering for content-based image browsing. In: Informatiktage 2011-Fachwissenschaftlicher Informatik-Kongress, 25. und 26. März 2011, B-IT Bonn-Aachen International Center for Information Technology in Bonn, Vol. S-10 of LNI, GI, pp. 27–30
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Bojchevski A, Matkovic Y, ünnemann SG (2017) Robust spectral clustering for noisy data: modeling sparse corruptions improves latent embeddings, in: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, August 13-17, pp 737–746. https://doi.org/10.1145/3097983.3098156
Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560. https://doi.org/10.1109/TPAMI.2010.231
Article Google Scholar
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J. ACM 58(3):1–11
Article MathSciNet MATH Google Scholar
Cao X, Chen Y, Zhao Q, Meng D, Wang Y, Wang D, Xu Z (2015) Low-rank matrix factorization under general mixture noise distributions. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, pp 1493–1501. https://doi.org/10.1109/ICCV.2015.175
Chen P, Wang N, Zhang NL, Yeung D (2015) Bayesian adaptive matrix factorization with automatic model selection. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, June 7–12, pp. 1284–1292. https://doi.org/10.1109/CVPR.2015.7298733
Ding CHQ, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55. https://doi.org/10.1109/TPAMI.2008.277
Article Google Scholar
Du L, Li X, Shen Y (2012) Robust nonnegative matrix factorization via half-quadratic minimization. In: 12th IEEE International conference on Data Mining, ICDM 2012, Brussels, December 10–13, 2012, pp 201–210. https://doi.org/10.1109/ICDM.2012.39
Gao H, Nie F, Cai W, Huang H (2015) Robust capped norm nonnegative matrix factorization: capped norm NMF. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15, ACM, New York, pp 871–880. https://doi.org/10.1145/2806416.2806568
Gao H, Nie F, Huang H (2017) Local centroids structured non-negative matrix factorization. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, San Francisco, 2017, pp 1905–1911. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15027
Gobinet C, Perrin E, Huez R (2004) Application of non-negative matrix factorization to fluorescence spectroscopy. In: 2004 12th European signal processing conference, Vienna, September 6–10, pp 1095–1098. http://ieeexplore.ieee.org/document/7079789/
Graham DB, Allinson NM (1998) Characterising virtual eigensignatures for general purpose face recognition. In: Face recognition, Springer, pp 446–456
Guan N, Liu T, Zhang Y, Tao D, Davis LS (2019) Truncated Cauchy non-negative matrix factorization. IEEE Trans Pattern Anal Mach Intell 41(1):246–259. https://doi.org/10.1109/TPAMI.2017.2777841
Article Google Scholar
Guo X, Pan B, Cai D, He X (2017) Robust asymmetric bayesian adaptive matrix factorization. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, August 19–25, pp 1760–1766. https://doi.org/10.24963/ijcai.2017/244
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition-volume 2, CVPR ’06, IEEE Computer Society, Washington, pp 1735–1742. https://doi.org/10.1109/CVPR.2006.100
Hamza A, Brady D (2006) Reconstruction of reflectance spectra using robust nonnegative matrix factorization. Trans Sig Proc 54(9):3637–3642. https://doi.org/10.1109/TSP.2006.879282
Article MATH Google Scholar
Huang J, Nie F, Huang H, Ding C (2014) Robust manifold nonnegative matrix factorization. ACM Trans. Knowl. Discov. 8(3):1–11. https://doi.org/10.1145/2601434
Article Google Scholar
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. https://doi.org/10.1109/34.291440
Article Google Scholar
Kannan R, Ballard G, Park H (2016) A high-performance parallel algorithm for nonnegative matrix factorization. In: Asenjo R, Harris T(eds), Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2016, Barcelona, March 12–16, ACM, pp 9:1–9:11. https://doi.org/10.1145/2851141.2851152
Kim H, Park H, Drake BL (2007) Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinform 9:8. https://doi.org/10.1186/1471-2105-8-S9-S6
Article Google Scholar
Kong D, Ding C, Huang H (2011) Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, ACM, New York, pp 673–682. https://doi.org/10.1145/2063576.2063676
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds), Advances in neural information processing systems 13, MIT Press, pp 556–562. http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083. https://doi.org/10.1109/TPAMI.2018.2852750
Article Google Scholar
Lian D, Liu R, Ge Y, Zheng K, Xie X, Cao L (2017) Discrete content-aware matrix factorization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’17, ACM, New York, pp 325–334. https://doi.org/10.1145/3097983.3098008
Liao Q, Guan N, Zhang Q (2015) Logdet divergence based sparse non-negative matrix factorization for stable representation. In: 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, November 14–17, pp 871–876. https://doi.org/10.1109/ICDM.2015.52
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Lin W, He Z, Xiao M (2019) Balanced clustering: a uniform model and fast algorithm. In: Kraus S (ed), Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, August 10–16, ijcai.org, 2019, pp 2987–2993. https://doi.org/10.24963/ijcai.2019/414
Li Z, Nie F, Chang X, Ma Z, Yang Y (2018) Balanced clustering via exclusive lasso: a pragmatic approach. In: McIlraith SA, Weinberger KQ (eds), Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, February 2–7, 2018, AAAI Press, pp 3596–3603. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16711
Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311. https://doi.org/10.1109/TPAMI.2011.217
Article Google Scholar
Liu H, Han J, Nie F, Li X (2017) Balanced clustering with least square regression. In: Singh SP, Markovitch S (eds), Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, AAAI Press, pp 2231–2237. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14693
Liu T, Sun J, Zheng NN, Tang X, Shum HY (2007) Learning to detect a salient object In: IEEE conference on computer vision and pattern recognition 2007: pp 1–8. https://doi.org/10.1109/CVPR.2007.383047
Liutkus A, Fitzgerald D, Badeau R (2015) Cauchy nonnegative matrix factorization. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) pp 1–5
Lu Y, Lai Z, Xu Y, Li X, Zhang D, Yuan C (2017) Nonnegative discriminant matrix factorization. IEEE Trans Circuits Syst Video Technol 27(7):1392–1405. https://doi.org/10.1109/TCSVT.2016.2539779
Article Google Scholar
Luo L, Zhang Y, Huang H (2020) Adversarial nonnegative matrix factorization. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July, Virtual Event, Vol. 119 of Proceedings of machine learning research, PMLR, 2020, pp 6479–6488. http://proceedings.mlr.press/v119/luo20c.html
Malinen MI, Fränti P (2014) Balanced k-means for clustering. In: Fränti P, Brown G, Loog M, Escolano F, Pelillo M (eds), Structural, syntactic, and statistical pattern recognition-joint IAPR international workshop, S+SSPR 2014, Joensuu, August 20–22, 2014. Proceedings, Vol. 8621 of Lecture Notes in Computer Science, Springer, 2014, pp 32–41. https://doi.org/10.1007/978-3-662-44415-3_4
Meng D, la Torre FD (2013) Robust matrix factorization with unknown noise. In: IEEE international conference on computer vision, ICCV 2013, Sydney December 1–8, pp 1337–1344. https://doi.org/10.1109/ICCV.2013.169
Mitra A, Vijayan P, Parthasarathy S, Ravindran B (2020) A unified non-negative matrix factorization framework for semi supervised learning on graphs. In: Demeniconi C, Chawla NV (eds), Proceedings of the 2020 SIAM international conference on data mining, SDM 2020, Cincinnati, May 7–9, SIAM, pp 487–495. https://doi.org/10.1137/1.9781611976236.55
Moon GE, Ellis JA, Sukumaran-Rajam A, Parthasarathy S, Sadayappan P (2020) ALO-NMF: accelerated locality-optimized non-negative matrix factorization. In: Gupta R, Liu Y, Tang J, Prakash BA (eds), KDD ’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, August 23–27, 2020, ACM, 2020, pp 1758–1767. https://doi.org/10.1145/3394486.3403227
Nene SA, Nayar SK, Murase H et al. Columbia object image library (coil-20)
Peng C, Kang Z, Hu Y, Cheng J, Cheng Q (2017) Robust graph regularized nonnegative matrix factorization for clustering. TKDD 11(3):1–33. https://doi.org/10.1145/3003730
Article Google Scholar
Peng S, Ser W, Chen B, Lin Z (2021) Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recognit. 111:107683. https://doi.org/10.1016/j.patcog.2020.107683
Article Google Scholar
Rodrguez-Domnguez U, Dalmau O (2022) Symmetric nonnegative matrix factorization with elastic-net regularized block-wise weighted representation for clustering. Pattern Anal Appl pp 1–11
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Applications of computer vision, 1994., Proceedings of the second IEEE workshop on, IEEE, pp 138–142
Siavoshi S, Kavian YS, Sharif H (2016) Load-balanced energy efficient clustering protocol for wireless sensor networks. IET Wirel Sens Syst 6(3):67–73. https://doi.org/10.1049/iet-wss.2015.0069
Article Google Scholar
Su W, Hu J, Lin C, Shen SX (2015) Sla-aware tenant placement and dynamic resource provision in SAAS. In: Miller JA, Zhu H (eds), 2015 IEEE international conference on web services, ICWS 2015, New York, June 27 - July 2, IEEE Computer Society, 2015, pp 615–622. https://doi.org/10.1109/ICWS.2015.87
Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller BW (2017) A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Intell 39(3):417–429. https://doi.org/10.1109/TPAMI.2016.2554555
Article Google Scholar
Wang J-Y, Almasri I, Gao X (2012) Adaptive graph regularized nonnegative matrix factorization via feature selection. In: Pattern recognition (ICPR), 2012 21st international conference on, IEEE, pp 963–966
Wang H, Yang W, Guan N (2019) Cauchy sparse NMF with manifold regularization: a robust method for hyperspectral Unmixing. Knowl Based Syst 184:104898. https://doi.org/10.1016/j.knosys.2019.104898
Article Google Scholar
Wright J, Ganesh A, Rao SR, Peng Y, Ma Y (2009d) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in neural information processing systems 22: 23rd Annual conference on neural information processing systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, 2009, pp 2080–2088. http://papers.nips.cc/paper/3704-robust-principal-component-analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization
Wu Y, Shen B, Ling H (2014) Visual tracking via online nonnegative matrix factorization. IEEE Trans Circuits Syst Video Technol 24(3):374–383. https://doi.org/10.1109/TCSVT.2013.2278199
Article Google Scholar
Xiong H, Kong D (2019) Elastic nonnegative matrix factorization. Pattern Recognit 90:464–475. https://doi.org/10.1016/j.patcog.2018.07.007
Article Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’03, ACM, New York, 2003, pp 267–273. https://doi.org/10.1145/860435.860485
Zhang S, Wang W, Ford J, Makedon F (2006) Learning from incomplete ratings using non-negative matrix factorization. In: Ghosh J, Lambert F, Skillicorn DB, Srivastava J (eds), SDM, SIAM, 2006, pp 549–553. http://dblp.uni-trier.de/db/conf/sdm/sdm2006.htmlZhangWFM06
Zhao N, Zhang L, Du B, Zhang Q, You J, Tao D (2017) Robust dual clustering with adaptive manifold regularization. IEEE Trans Knowl Data Eng 29(11):2498–2509. https://doi.org/10.1109/TKDE.2017.2732986
Article Google Scholar
Zhao Y, Wang H, Pei J (2021) Deep non-negative matrix factorization architecture based on underlying basis images learning. IEEE Trans Pattern Anal Mach Intell 43(6):1897–1913. https://doi.org/10.1109/TPAMI.2019.2962679
Article Google Scholar
Zhao Q, Meng D, Xu Z, Zuo W, Zhang L (2014) Robust principal component analysis with complex noise. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, 21–26 June, pp 55–63. http://jmlr.org/proceedings/papers/v32/zhao14.html
Zhu Z, Li X, Liu K, Li Q (2018) Dropping symmetry for fast symmetric nonnegative matrix factorization. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds), Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, pp 5160–5170. https://proceedings.neurips.cc/paper/2018/hash/d9ff90f4000eacd3a6c9cb27f78994cf-Abstract.html

Download references

Acknowledgments

This research was supported by the Anhui Province Excellent Talent Support Program for Universities (gxyq2020063) and the Key Projects of Natural Science Research in Anhui Universities (2022AH051915).

Author information

Authors and Affiliations

BengBu University, Bengbu, Anhui, China
He Xiong
Amazon, Sunnyvale, CA, USA
Deguang Kong
Northwestern Polytechnical University, Xian, Shaanxi, China
Feiping Nie

Authors

He Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Deguang Kong
View author publications
You can also search for this author in PubMed Google Scholar
Feiping Nie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deguang Kong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1. Proof of Lemma 3

Proof

We utilize an auxiliary function to prove Lemma 1.

If $R(\textbf{G}, \textbf{G}^\prime )$ satisfies the following conditions, it is an auxiliary function for $P(\textbf{G})$.

$$\begin{aligned} P(\textbf{G})&\le R(\textbf{G}, \textbf{G}^\prime )\quad \forall \textbf{G}^\prime , \end{aligned}$$

(29)

$$\begin{aligned} P(\textbf{G})&=R(\textbf{G}, \textbf{G}). \end{aligned}$$

(30)

Define

$$\begin{aligned} \textbf{G}^{t+1}=\arg \min _{\textbf{G}}R(\textbf{G}, \textbf{G}^t), \end{aligned}$$

(31)

then we have

$$\begin{aligned} P(\textbf{G}^{t+1})\le R(\textbf{G}^{t+1}, \textbf{G}^t)\le R(\textbf{G}^t, \textbf{G}^t)=P(\textbf{G}^t). \end{aligned}$$

(32)

Equation (32) indicates that $P(\textbf{G})$ doesn’t increase under the updating rule of Eq. (31).

Let

$$\begin{aligned} \begin{aligned} P(\textbf{G})=\,&\textrm{Tr}(\textbf{X}-\textbf{F}\textbf{G})\textbf{D}(\textbf{X}-\textbf{F}\textbf{G})^{{\textrm{T}}}\\&+\lambda \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G})\\ =\,&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}\textbf{D})\\&+\lambda \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G}),\\ \end{aligned} \end{aligned}$$

(33)

then Eq. (21) can be rewritten as:

$$\begin{aligned} P(\textbf{G}^{t+1})\le P(\textbf{G}^{t}). \end{aligned}$$

(34)

In the remainder of the proof, we have to find an auxiliary function for $P(\textbf{G})$ and the global minima of the auxiliary function.

An auxiliary function for $P(\textbf{G})$ is showed as follows:

$$\begin{aligned} \begin{aligned} R(\textbf{G}, \textbf{G}^\prime )=&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }\\&\quad +\sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G}). \end{aligned} \end{aligned}$$

(35)

The following matrix inequality in Ding et al. (2010) is utilized in our proof.

$$\begin{aligned} \textrm{Tr}(\textbf{H}{^{\textrm{T}}}\textbf{A}\textbf{H}\textbf{B})\le \sum _{ki}(\textbf{A}\textbf{H}^\prime \textbf{B})_{ki}\frac{H_{ki}^2}{H_{ki}^\prime }, \end{aligned}$$

(36)

where $\textbf{A}, \textbf{B}, \textbf{H}$ are nonnegative matrices with appropriate sizes and $\textbf{A}=\textbf{A}^{{\textrm{T}}}, \textbf{B}=\textbf{B}^{{\textrm{T}}}$. The equality holds when $\textbf{H}=\textbf{H}^\prime$.

In the inequality Eq. (36), set $\textbf{H}=\textbf{G}, \textbf{A}=\textbf{F}^{{\textrm{T}}}\textbf{F}, \textbf{B}=\textbf{D}$, then

$$\begin{aligned} \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}\textbf{D})\le \sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }. \end{aligned}$$

(37)

In the inequality Eq. (36), set $\textbf{H}=\textbf{G}, \textbf{A}=\textbf{I}, \textbf{B}=\textbf{Q}$, then

$$\begin{aligned} \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})\le \sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }. \end{aligned}$$

(38)

Based on Eqs. (37) and (38), $R(\textbf{G}, \textbf{G}^\prime )$ of Eq. (35) is an auxiliary function for $P(\textbf{G})$ of Eq. (33).

Now we should find the global minima for $R(\textbf{G},\textbf{G}^\prime )$ under the constraint condition $G_{ki}\ge 0$ and $\sum _{k}G_{ki}=1$, the problem can be formulated as:

$$\begin{aligned} \min _{G_{ki}\ge 0,\sum _{k}G_{ki}=1}R(\textbf{G},\textbf{G}^\prime ). \end{aligned}$$

(39)

The Lagrangian function of the problem (39) is

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})\\ =\,&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }\\&+\sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G})\\&+\alpha _i\sum _{k}(G_{ki}-1)+\beta _{ki}G_{ki}, \end{aligned} \end{aligned}$$

(40)

where $\alpha _{i}$ and $\beta _{ki}\ge 0$ are the Lagrangian multipliers.

The derivative of Eq. (40) w.r.t. $G_{ki}$ is:

$$\begin{aligned} \begin{aligned} \frac{\partial {\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})}{\partial G_{ki}}&= -2(\textbf{F}^{{\textrm{T}}}\textbf{X}\textbf{D})_{ki}+2\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}}{G_{ki}^\prime }\\&\quad +2\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}}{G_{ki}^\prime }+2\mu G_{ki}M_{kk}N_{ki}\\&\quad +\alpha _{i}+\beta _{ki}. \end{aligned} \end{aligned}$$

(41)

Set $\frac{\partial {\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})}{\partial G_{ki}}=0$ and note that $\beta _{ki}G_{ki}=0$ according to the KKT condition, then we have

$$\begin{aligned} G_{ki}=\left( \frac{((\textbf{F}^{{\textrm{T}}}\textbf{X}\textbf{D})_{ki}-0.5\alpha _{i})G_{ki}^\prime }{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}+G_{ki}^\prime M_{kk}N_{ki}+(\textbf{G}^\prime \textbf{Q})_{ki}}\right) _{+}, \end{aligned}$$

(42)

where $(e)_{+}=\max (0,e)$.

According to the constraint condition $\sum _{k}G_{ki}=1$, we have the following equation:

$$\begin{aligned} h_i(\alpha _i)=0, \end{aligned}$$

(43)

where $h_i(\alpha _i)$ is defined as Eq. (19).

Therefore, the value of $\alpha _i$ is the root of the above equation which can be obtained by Newton’s method. And $G_{ki}$ is computed by substituting the value of $\alpha _i$ into Eq. (42).

Noting $G_{ki}^{t+1}\leftarrow G_{ki}$ and $G_{ki}^t\leftarrow G_{ki}^\prime$, Eq. (42) recovers the updating rule of Eq. (17). Therefore, under the updating rule of Eq. (17), $P(\textbf{G})$ doesn’t increase. $\square$

Appendix 2. Proof of Lemma 4

Proof

First of all, the following inequality is required:

$$\begin{aligned} \ln z\le z-1. \end{aligned}$$

(44)

We show the proof of Eq. (44) as follows.

Let $f(z)=\ln z-z+1$, then the first-order and second-order gradient of f(z) are:

$$\begin{aligned} \frac{\partial f(z)}{\partial z}=\frac{1}{z}-1, \;\; \frac{\partial ^{2} f(z)}{\partial z^2}=-\frac{1}{z^2}. \end{aligned}$$

(45)

Therefore, the global maxima of f(z) is obtained by setting the gradient of f(z) to zero, i.e., $z=1$.

$$\begin{aligned} f(z)\le f(1)=0. \end{aligned}$$

(46)

This implies that Eq. (44) holds.

Let

$$\begin{aligned} z=\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2+\gamma }{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }, \end{aligned}$$

(47)

then Eq. (44) turns into the following inequality:

$$\begin{aligned} \begin{aligned}&\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}||^2+\gamma )-\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma )\le \\&\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }-\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }. \end{aligned} \end{aligned}$$

(48)

On the other hand, Eq. (22) can be written as:

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^n\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2+\gamma )-\sum _{i=1}^n\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma )\le \\&\sum _{i=1}^n\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}||^2+\gamma }-\sum _{i=1}^n\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }. \end{aligned} \end{aligned}$$

(49)

Equation (48) implies that in Eq. (49) each i-th term of the LHS is not more than the i-th term of the RHS. Thus, add all the terms together, Eq. (48) becomes into Eq. (49). $\square$

Appendix 3. Proof of Lemma 5

Proof

The following inequality holds based on cauchy inequality:

$$\begin{aligned} \left( \sum _{i} G_{ki}^{t+1}\right) ^2\le \sum _{i} \frac{(G_{ki}^{t+1})^2}{G_{ki}^t}\sum _{i} G_{ki}^t. \end{aligned}$$

(50)

Then we have

$$\begin{aligned} \begin{aligned} \sum _k\left( \sum _{i} G_{ki}^{t+1}\right) ^2&\le \sum _k\left( \sum _{i}\frac{(G_{ki}^{t+1})^2}{G_{ki}^t}\sum _{i} G_{ki}^t\right) \\&=\textrm{Tr}(((\textbf{G}^{t+1})^{\textrm{T}}\textbf{M})\otimes \textbf{N}^{\textrm{T}}\textbf{G}^{t+1}). \end{aligned} \end{aligned}$$

(51)

Note that

$$\begin{aligned} \sum _k\left( \sum _{i} G_{ki}^{t}\right) ^2=\textrm{Tr}(((\textbf{G}^{t})^{\textrm{T}}\textbf{M})\otimes \textbf{N}^{\textrm{T}}\textbf{G}^{t}). \end{aligned}$$

(52)

Based on Eqs. (51) and (52), Eq. (23) holds. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, H., Kong, D. & Nie, F. Cauchy balanced nonnegative matrix factorization. Artif Intell Rev 56, 11867–11903 (2023). https://doi.org/10.1007/s10462-022-10379-y

Download citation

Published: 24 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10462-022-10379-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cauchy balanced nonnegative matrix factorization

Abstract

Access this article

Similar content being viewed by others

Non-negative Matrix Factorization with Schatten p-norms Reguralization

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Weakly supervised nonnegative matrix factorization for user-driven clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1. Proof of Lemma 3

Proof

Appendix 2. Proof of Lemma 4

Proof

Appendix 3. Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cauchy balanced nonnegative matrix factorization

Abstract

Access this article

Similar content being viewed by others

Non-negative Matrix Factorization with Schatten p-norms Reguralization

Dual local learning regularized nonnegative matrix factorization and its semi-supervised extension for clustering

Weakly supervised nonnegative matrix factorization for user-driven clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1. Proof of Lemma 3

Proof

Appendix 2. Proof of Lemma 4

Proof

Appendix 3. Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation