Skip to main content
Log in

Cauchy balanced nonnegative matrix factorization

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Nonnegative Matrix Factorization (NMF) plays an important role in many data mining and machine learning tasks. Standard NMF uses the Frobenius norm as the loss function which is well-known to be sensitive to noise. To address this issue, we propose a robust formulation of NMF, i.e., Cauchy-NMF, which is derived based on the assumption that the noise generally follows identical independent distributed (i.i.d.) Cauchy distribution. In particular, we derive the Cauchy Balanced NMF model (Cauchy-B-NMF) using Cauchy distribution, where (a) the numerical value of each element in the coefficient matrix is viewed as the posterior probability, which allows the clustering result to be obtained directly from the coefficient matrix without any additional post-processing; (b) a novel manifold regularization term is incorporated into the loss function, explicitly making the distant data points have dissimilar embeddings, while implicitly making the neighbouring data points have similar embeddings; (c) a balanced clustering term is enforced to achieve the desired equal number of data points across different clusters. We derive an efficient computational algorithm to solve the resultant optimization problem, and also provide a rigorous analysis of the algorithm convergence. Experimental results on several benchmarks demonstrate the effectiveness of our algorithms, which consistently provides better clustering results compared to many other NMF variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. It is also interesting to see the difference between Eqs. (19) and (20) is that the minuend is always non-negative in Eq. (19) although the absolute values are the same.

  2. In the paper, let LHS be the left-hand-side of an equation, and RHS be the right hand-side of an equation.Biography The resumes of the three authors are listed as follows. He Xiong received the masters degree from the University of Science and Technology of China, in 2013. He is currently a lecture in the School of Computer and Information Engineering at BengBu University. His research interests include machine learning, data mining and computer vision. Deguang Kong received his Ph.D degree in Computer Science from University of Texas in 2013. He  currently works at Amazon, and ever worked in Google, Yahoo Research (Sunnyvale), Los Alamos national Lab, NEC research lab, Penn State University and Samsung Research America as a researcher. His research interests focus on feature learning and compressive sensing, user engagement understanding and recommendation,etc. He has published over 30 referred articles in top conferences, including ICML, NIPS, AAAI, CVPR, KDD, ICDM, SDM, WSDM, CIKM, ECML/PKDD, etc. He has served as a program committee member in NIPS, AAAI, IJCAI, KDD, SDM and a reviewer for TPAMI, TKDE, DMKD, TIFS, TNNLS, TDSC, etc. Feiping Nie received the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in 2009. He is currently a professor with the Center for OPTical Imagery Analysis and Learning, Northwestern Polytechnical University, Xian, Shaanxi, China. He has published over 100 articles in the prestigious journals and conferences. His current research interests include machine learning and its applications fields, such as pattern recognition, data mining, computer vision, image processing, and information retrieval. Dr. Nie currently serves as an associate editor or a program committee member for several prestigious journals and conferences in the related fields.

References

  • Althoff T, Ulges A, Dengel A (2011) Balanced clustering for content-based image browsing. In: Informatiktage 2011-Fachwissenschaftlicher Informatik-Kongress, 25. und 26. März 2011, B-IT Bonn-Aachen International Center for Information Technology in Bonn, Vol. S-10 of LNI, GI, pp. 27–30

  • Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  • Bojchevski A, Matkovic Y, ünnemann SG (2017) Robust spectral clustering for noisy data: modeling sparse corruptions improves latent embeddings, in: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, August 13-17, pp 737–746. https://doi.org/10.1145/3097983.3098156

  • Cai D, He X, Han J, Huang T (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560. https://doi.org/10.1109/TPAMI.2010.231

    Article  Google Scholar 

  • Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J. ACM 58(3):1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Cao X, Chen Y, Zhao Q, Meng D, Wang Y, Wang D, Xu Z (2015) Low-rank matrix factorization under general mixture noise distributions. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, pp 1493–1501. https://doi.org/10.1109/ICCV.2015.175

  • Chen P, Wang N, Zhang NL, Yeung D (2015) Bayesian adaptive matrix factorization with automatic model selection. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, June 7–12, pp. 1284–1292. https://doi.org/10.1109/CVPR.2015.7298733

  • Ding CHQ, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55. https://doi.org/10.1109/TPAMI.2008.277

    Article  Google Scholar 

  • Du L, Li X, Shen Y (2012) Robust nonnegative matrix factorization via half-quadratic minimization. In: 12th IEEE International conference on Data Mining, ICDM 2012, Brussels, December 10–13, 2012, pp 201–210. https://doi.org/10.1109/ICDM.2012.39

  • Gao H, Nie F, Cai W, Huang H (2015) Robust capped norm nonnegative matrix factorization: capped norm NMF. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15, ACM, New York, pp 871–880. https://doi.org/10.1145/2806416.2806568

  • Gao H, Nie F, Huang H (2017) Local centroids structured non-negative matrix factorization. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, San Francisco, 2017, pp 1905–1911. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15027

  • Gobinet C, Perrin E, Huez R (2004) Application of non-negative matrix factorization to fluorescence spectroscopy. In: 2004 12th European signal processing conference, Vienna, September 6–10, pp 1095–1098. http://ieeexplore.ieee.org/document/7079789/

  • Graham DB, Allinson NM (1998) Characterising virtual eigensignatures for general purpose face recognition. In: Face recognition, Springer, pp 446–456

  • Guan N, Liu T, Zhang Y, Tao D, Davis LS (2019) Truncated Cauchy non-negative matrix factorization. IEEE Trans Pattern Anal Mach Intell 41(1):246–259. https://doi.org/10.1109/TPAMI.2017.2777841

    Article  Google Scholar 

  • Guo X, Pan B, Cai D, He X (2017) Robust asymmetric bayesian adaptive matrix factorization. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, August 19–25, pp 1760–1766. https://doi.org/10.24963/ijcai.2017/244

  • Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition-volume 2, CVPR ’06, IEEE Computer Society, Washington, pp 1735–1742. https://doi.org/10.1109/CVPR.2006.100

  • Hamza A, Brady D (2006) Reconstruction of reflectance spectra using robust nonnegative matrix factorization. Trans Sig Proc 54(9):3637–3642. https://doi.org/10.1109/TSP.2006.879282

    Article  MATH  Google Scholar 

  • Huang J, Nie F, Huang H, Ding C (2014) Robust manifold nonnegative matrix factorization. ACM Trans. Knowl. Discov. 8(3):1–11. https://doi.org/10.1145/2601434

    Article  Google Scholar 

  • Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. https://doi.org/10.1109/34.291440

    Article  Google Scholar 

  • Kannan R, Ballard G, Park H (2016) A high-performance parallel algorithm for nonnegative matrix factorization. In: Asenjo R, Harris T(eds), Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2016, Barcelona, March 12–16, ACM, pp 9:1–9:11. https://doi.org/10.1145/2851141.2851152

  • Kim H, Park H, Drake BL (2007) Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations. BMC Bioinform 9:8. https://doi.org/10.1186/1471-2105-8-S9-S6

    Article  Google Scholar 

  • Kong D, Ding C, Huang H (2011) Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, ACM, New York, pp 673–682. https://doi.org/10.1145/2063576.2063676

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds), Advances in neural information processing systems 13, MIT Press, pp 556–562. http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf

  • Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083. https://doi.org/10.1109/TPAMI.2018.2852750

    Article  Google Scholar 

  • Lian D, Liu R, Ge Y, Zheng K, Xie X, Cao L (2017) Discrete content-aware matrix factorization. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’17, ACM, New York, pp 325–334. https://doi.org/10.1145/3097983.3098008

  • Liao Q, Guan N, Zhang Q (2015) Logdet divergence based sparse non-negative matrix factorization for stable representation. In: 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, November 14–17, pp 871–876. https://doi.org/10.1109/ICDM.2015.52

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Lin W, He Z, Xiao M (2019) Balanced clustering: a uniform model and fast algorithm. In: Kraus S (ed), Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, August 10–16, ijcai.org, 2019, pp 2987–2993. https://doi.org/10.24963/ijcai.2019/414

  • Li Z, Nie F, Chang X, Ma Z, Yang Y (2018) Balanced clustering via exclusive lasso: a pragmatic approach. In: McIlraith SA, Weinberger KQ (eds), Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, February 2–7, 2018, AAAI Press, pp 3596–3603. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16711

  • Liu H, Wu Z, Li X, Cai D, Huang TS (2012) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311. https://doi.org/10.1109/TPAMI.2011.217

    Article  Google Scholar 

  • Liu H, Han J, Nie F, Li X (2017) Balanced clustering with least square regression. In: Singh SP, Markovitch S (eds), Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, AAAI Press, pp 2231–2237. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14693

  • Liu T, Sun J, Zheng NN, Tang X, Shum HY (2007) Learning to detect a salient object In: IEEE conference on computer vision and pattern recognition 2007: pp 1–8. https://doi.org/10.1109/CVPR.2007.383047

  • Liutkus A, Fitzgerald D, Badeau R (2015) Cauchy nonnegative matrix factorization. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) pp 1–5

  • Lu Y, Lai Z, Xu Y, Li X, Zhang D, Yuan C (2017) Nonnegative discriminant matrix factorization. IEEE Trans Circuits Syst Video Technol 27(7):1392–1405. https://doi.org/10.1109/TCSVT.2016.2539779

    Article  Google Scholar 

  • Luo L, Zhang Y, Huang H (2020) Adversarial nonnegative matrix factorization. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July, Virtual Event, Vol. 119 of Proceedings of machine learning research, PMLR, 2020, pp 6479–6488. http://proceedings.mlr.press/v119/luo20c.html

  • Malinen MI, Fränti P (2014) Balanced k-means for clustering. In: Fränti P, Brown G, Loog M, Escolano F, Pelillo M (eds), Structural, syntactic, and statistical pattern recognition-joint IAPR international workshop, S+SSPR 2014, Joensuu, August 20–22, 2014. Proceedings, Vol. 8621 of Lecture Notes in Computer Science, Springer, 2014, pp 32–41. https://doi.org/10.1007/978-3-662-44415-3_4

  • Meng D, la Torre FD (2013) Robust matrix factorization with unknown noise. In: IEEE international conference on computer vision, ICCV 2013, Sydney December 1–8, pp 1337–1344. https://doi.org/10.1109/ICCV.2013.169

  • Mitra A, Vijayan P, Parthasarathy S, Ravindran B (2020) A unified non-negative matrix factorization framework for semi supervised learning on graphs. In: Demeniconi C, Chawla NV (eds), Proceedings of the 2020 SIAM international conference on data mining, SDM 2020, Cincinnati, May 7–9, SIAM, pp 487–495. https://doi.org/10.1137/1.9781611976236.55

  • Moon GE, Ellis JA, Sukumaran-Rajam A, Parthasarathy S, Sadayappan P (2020) ALO-NMF: accelerated locality-optimized non-negative matrix factorization. In: Gupta R, Liu Y, Tang J, Prakash BA (eds), KDD ’20: the 26th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, August 23–27, 2020, ACM, 2020, pp 1758–1767. https://doi.org/10.1145/3394486.3403227

  • Nene SA, Nayar SK, Murase H et al. Columbia object image library (coil-20)

  • Peng C, Kang Z, Hu Y, Cheng J, Cheng Q (2017) Robust graph regularized nonnegative matrix factorization for clustering. TKDD 11(3):1–33. https://doi.org/10.1145/3003730

    Article  Google Scholar 

  • Peng S, Ser W, Chen B, Lin Z (2021) Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recognit. 111:107683. https://doi.org/10.1016/j.patcog.2020.107683

    Article  Google Scholar 

  • Rodrguez-Domnguez U, Dalmau O (2022) Symmetric nonnegative matrix factorization with elastic-net regularized block-wise weighted representation for clustering. Pattern Anal Appl pp 1–11

  • Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Applications of computer vision, 1994., Proceedings of the second IEEE workshop on, IEEE, pp 138–142

  • Siavoshi S, Kavian YS, Sharif H (2016) Load-balanced energy efficient clustering protocol for wireless sensor networks. IET Wirel Sens Syst 6(3):67–73. https://doi.org/10.1049/iet-wss.2015.0069

    Article  Google Scholar 

  • Su W, Hu J, Lin C, Shen SX (2015) Sla-aware tenant placement and dynamic resource provision in SAAS. In: Miller JA, Zhu H (eds), 2015 IEEE international conference on web services, ICWS 2015, New York, June 27 - July 2, IEEE Computer Society, 2015, pp 615–622. https://doi.org/10.1109/ICWS.2015.87

  • Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller BW (2017) A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Intell 39(3):417–429. https://doi.org/10.1109/TPAMI.2016.2554555

    Article  Google Scholar 

  • Wang J-Y, Almasri I, Gao X (2012) Adaptive graph regularized nonnegative matrix factorization via feature selection. In: Pattern recognition (ICPR), 2012 21st international conference on, IEEE, pp 963–966

  • Wang H, Yang W, Guan N (2019) Cauchy sparse NMF with manifold regularization: a robust method for hyperspectral Unmixing. Knowl Based Syst 184:104898. https://doi.org/10.1016/j.knosys.2019.104898

    Article  Google Scholar 

  • Wright J, Ganesh A, Rao SR, Peng Y, Ma Y (2009d) Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in neural information processing systems 22: 23rd Annual conference on neural information processing systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, 2009, pp 2080–2088. http://papers.nips.cc/paper/3704-robust-principal-component-analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization

  • Wu Y, Shen B, Ling H (2014) Visual tracking via online nonnegative matrix factorization. IEEE Trans Circuits Syst Video Technol 24(3):374–383. https://doi.org/10.1109/TCSVT.2013.2278199

    Article  Google Scholar 

  • Xiong H, Kong D (2019) Elastic nonnegative matrix factorization. Pattern Recognit 90:464–475. https://doi.org/10.1016/j.patcog.2018.07.007

    Article  Google Scholar 

  • Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’03, ACM, New York, 2003, pp 267–273. https://doi.org/10.1145/860435.860485

  • Zhang S, Wang W, Ford J, Makedon F (2006) Learning from incomplete ratings using non-negative matrix factorization. In: Ghosh J, Lambert F, Skillicorn DB, Srivastava J (eds), SDM, SIAM, 2006, pp 549–553. http://dblp.uni-trier.de/db/conf/sdm/sdm2006.htmlZhangWFM06

  • Zhao N, Zhang L, Du B, Zhang Q, You J, Tao D (2017) Robust dual clustering with adaptive manifold regularization. IEEE Trans Knowl Data Eng 29(11):2498–2509. https://doi.org/10.1109/TKDE.2017.2732986

    Article  Google Scholar 

  • Zhao Y, Wang H, Pei J (2021) Deep non-negative matrix factorization architecture based on underlying basis images learning. IEEE Trans Pattern Anal Mach Intell 43(6):1897–1913. https://doi.org/10.1109/TPAMI.2019.2962679

    Article  Google Scholar 

  • Zhao Q, Meng D, Xu Z, Zuo W, Zhang L (2014) Robust principal component analysis with complex noise. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, 21–26 June, pp 55–63. http://jmlr.org/proceedings/papers/v32/zhao14.html

  • Zhu Z, Li X, Liu K, Li Q (2018) Dropping symmetry for fast symmetric nonnegative matrix factorization. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds), Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, pp 5160–5170. https://proceedings.neurips.cc/paper/2018/hash/d9ff90f4000eacd3a6c9cb27f78994cf-Abstract.html

Download references

Acknowledgments

This research was supported by the Anhui Province Excellent Talent Support Program for Universities (gxyq2020063) and the Key Projects of Natural Science Research in Anhui Universities (2022AH051915).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deguang Kong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1. Proof of Lemma 3

Proof

We utilize an auxiliary function to prove Lemma 1.

If \(R(\textbf{G}, \textbf{G}^\prime )\) satisfies the following conditions, it is an auxiliary function for \(P(\textbf{G})\).

$$\begin{aligned} P(\textbf{G})&\le R(\textbf{G}, \textbf{G}^\prime )\quad \forall \textbf{G}^\prime , \end{aligned}$$
(29)
$$\begin{aligned} P(\textbf{G})&=R(\textbf{G}, \textbf{G}). \end{aligned}$$
(30)

Define

$$\begin{aligned} \textbf{G}^{t+1}=\arg \min _{\textbf{G}}R(\textbf{G}, \textbf{G}^t), \end{aligned}$$
(31)

then we have

$$\begin{aligned} P(\textbf{G}^{t+1})\le R(\textbf{G}^{t+1}, \textbf{G}^t)\le R(\textbf{G}^t, \textbf{G}^t)=P(\textbf{G}^t). \end{aligned}$$
(32)

Equation (32) indicates that \(P(\textbf{G})\) doesn’t increase under the updating rule of Eq. (31).

Let

$$\begin{aligned} \begin{aligned} P(\textbf{G})=\,&\textrm{Tr}(\textbf{X}-\textbf{F}\textbf{G})\textbf{D}(\textbf{X}-\textbf{F}\textbf{G})^{{\textrm{T}}}\\&+\lambda \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G})\\ =\,&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}\textbf{D})\\&+\lambda \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G}),\\ \end{aligned} \end{aligned}$$
(33)

then Eq. (21) can be rewritten as:

$$\begin{aligned} P(\textbf{G}^{t+1})\le P(\textbf{G}^{t}). \end{aligned}$$
(34)

In the remainder of the proof, we have to find an auxiliary function for \(P(\textbf{G})\) and the global minima of the auxiliary function.

An auxiliary function for \(P(\textbf{G})\) is showed as follows:

$$\begin{aligned} \begin{aligned} R(\textbf{G}, \textbf{G}^\prime )=&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }\\&\quad +\sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G}). \end{aligned} \end{aligned}$$
(35)

The following matrix inequality in Ding et al. (2010) is utilized in our proof.

$$\begin{aligned} \textrm{Tr}(\textbf{H}{^{\textrm{T}}}\textbf{A}\textbf{H}\textbf{B})\le \sum _{ki}(\textbf{A}\textbf{H}^\prime \textbf{B})_{ki}\frac{H_{ki}^2}{H_{ki}^\prime }, \end{aligned}$$
(36)

where \(\textbf{A}, \textbf{B}, \textbf{H}\) are nonnegative matrices with appropriate sizes and \(\textbf{A}=\textbf{A}^{{\textrm{T}}}, \textbf{B}=\textbf{B}^{{\textrm{T}}}\). The equality holds when \(\textbf{H}=\textbf{H}^\prime\).

In the inequality Eq. (36), set \(\textbf{H}=\textbf{G}, \textbf{A}=\textbf{F}^{{\textrm{T}}}\textbf{F}, \textbf{B}=\textbf{D}\), then

$$\begin{aligned} \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}\textbf{D})\le \sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }. \end{aligned}$$
(37)

In the inequality Eq. (36), set \(\textbf{H}=\textbf{G}, \textbf{A}=\textbf{I}, \textbf{B}=\textbf{Q}\), then

$$\begin{aligned} \textrm{Tr}(\textbf{G}^{{\textrm{T}}}\textbf{G}\textbf{Q})\le \sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }. \end{aligned}$$
(38)

Based on Eqs. (37) and (38), \(R(\textbf{G}, \textbf{G}^\prime )\) of Eq. (35) is an auxiliary function for \(P(\textbf{G})\) of Eq. (33).

Now we should find the global minima for \(R(\textbf{G},\textbf{G}^\prime )\) under the constraint condition \(G_{ki}\ge 0\) and \(\sum _{k}G_{ki}=1\), the problem can be formulated as:

$$\begin{aligned} \min _{G_{ki}\ge 0,\sum _{k}G_{ki}=1}R(\textbf{G},\textbf{G}^\prime ). \end{aligned}$$
(39)

The Lagrangian function of the problem (39) is

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})\\ =\,&\textrm{Tr}(\textbf{X}\textbf{D}\textbf{X}^{{\textrm{T}}}-2\textbf{F}\textbf{G}\textbf{D}\textbf{X}^{{\textrm{T}}})+\sum _{ki}\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}^2}{G_{ki}^\prime }\\&+\sum _{ki}\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}^2}{G_{ki}^\prime }+\mu \textrm{Tr}((\textbf{G}^{{\textrm{T}}}\textbf{M})\otimes \textbf{N}^{{\textrm{T}}}\textbf{G})\\&+\alpha _i\sum _{k}(G_{ki}-1)+\beta _{ki}G_{ki}, \end{aligned} \end{aligned}$$
(40)

where \(\alpha _{i}\) and \(\beta _{ki}\ge 0\) are the Lagrangian multipliers.

The derivative of Eq. (40) w.r.t. \(G_{ki}\) is:

$$\begin{aligned} \begin{aligned} \frac{\partial {\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})}{\partial G_{ki}}&= -2(\textbf{F}^{{\textrm{T}}}\textbf{X}\textbf{D})_{ki}+2\frac{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}G_{ki}}{G_{ki}^\prime }\\&\quad +2\frac{(\textbf{G}^\prime \textbf{Q})_{ki}G_{ki}}{G_{ki}^\prime }+2\mu G_{ki}M_{kk}N_{ki}\\&\quad +\alpha _{i}+\beta _{ki}. \end{aligned} \end{aligned}$$
(41)

Set \(\frac{\partial {\mathcal {L}}(G_{ki},\alpha _{i},\beta _{ki})}{\partial G_{ki}}=0\) and note that \(\beta _{ki}G_{ki}=0\) according to the KKT condition, then we have

$$\begin{aligned} G_{ki}=\left( \frac{((\textbf{F}^{{\textrm{T}}}\textbf{X}\textbf{D})_{ki}-0.5\alpha _{i})G_{ki}^\prime }{(\textbf{F}^{{\textrm{T}}}\textbf{F}\textbf{G}^\prime \textbf{D})_{ki}+G_{ki}^\prime M_{kk}N_{ki}+(\textbf{G}^\prime \textbf{Q})_{ki}}\right) _{+}, \end{aligned}$$
(42)

where \((e)_{+}=\max (0,e)\).

According to the constraint condition \(\sum _{k}G_{ki}=1\), we have the following equation:

$$\begin{aligned} h_i(\alpha _i)=0, \end{aligned}$$
(43)

where \(h_i(\alpha _i)\) is defined as Eq. (19).

Therefore, the value of \(\alpha _i\) is the root of the above equation which can be obtained by Newton’s method. And \(G_{ki}\) is computed by substituting the value of \(\alpha _i\) into Eq. (42).

Noting \(G_{ki}^{t+1}\leftarrow G_{ki}\) and \(G_{ki}^t\leftarrow G_{ki}^\prime\), Eq. (42) recovers the updating rule of Eq. (17). Therefore, under the updating rule of Eq. (17), \(P(\textbf{G})\) doesn’t increase. \(\square\)

Appendix 2. Proof of Lemma 4

Proof

First of all, the following inequality is required:

$$\begin{aligned} \ln z\le z-1. \end{aligned}$$
(44)

We show the proof of Eq. (44) as follows.

Let \(f(z)=\ln z-z+1\), then the first-order and second-order gradient of f(z) are:

$$\begin{aligned} \frac{\partial f(z)}{\partial z}=\frac{1}{z}-1, \;\; \frac{\partial ^{2} f(z)}{\partial z^2}=-\frac{1}{z^2}. \end{aligned}$$
(45)

Therefore, the global maxima of f(z) is obtained by setting the gradient of f(z) to zero, i.e., \(z=1\).

$$\begin{aligned} f(z)\le f(1)=0. \end{aligned}$$
(46)

This implies that Eq. (44) holds.

Let

$$\begin{aligned} z=\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2+\gamma }{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }, \end{aligned}$$
(47)

then Eq. (44) turns into the following inequality:

$$\begin{aligned} \begin{aligned}&\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}||^2+\gamma )-\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma )\le \\&\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }-\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }. \end{aligned} \end{aligned}$$
(48)

On the other hand, Eq. (22) can be written as:

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^n\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2+\gamma )-\sum _{i=1}^n\ln (\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma )\le \\&\sum _{i=1}^n\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t+1}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}||^2+\gamma }-\sum _{i=1}^n\frac{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2}{\Vert \textbf{x}_i-\textbf{F}\textbf{g}_i^{t}\Vert ^2+\gamma }. \end{aligned} \end{aligned}$$
(49)

Equation (48) implies that in Eq. (49) each i-th term of the LHS is not more than the i-th term of the RHS. Thus, add all the terms together, Eq. (48) becomes into Eq. (49). \(\square\)

Appendix 3. Proof of Lemma 5

Proof

The following inequality holds based on cauchy inequality:

$$\begin{aligned} \left( \sum _{i} G_{ki}^{t+1}\right) ^2\le \sum _{i} \frac{(G_{ki}^{t+1})^2}{G_{ki}^t}\sum _{i} G_{ki}^t. \end{aligned}$$
(50)

Then we have

$$\begin{aligned} \begin{aligned} \sum _k\left( \sum _{i} G_{ki}^{t+1}\right) ^2&\le \sum _k\left( \sum _{i}\frac{(G_{ki}^{t+1})^2}{G_{ki}^t}\sum _{i} G_{ki}^t\right) \\&=\textrm{Tr}(((\textbf{G}^{t+1})^{\textrm{T}}\textbf{M})\otimes \textbf{N}^{\textrm{T}}\textbf{G}^{t+1}). \end{aligned} \end{aligned}$$
(51)

Note that

$$\begin{aligned} \sum _k\left( \sum _{i} G_{ki}^{t}\right) ^2=\textrm{Tr}(((\textbf{G}^{t})^{\textrm{T}}\textbf{M})\otimes \textbf{N}^{\textrm{T}}\textbf{G}^{t}). \end{aligned}$$
(52)

Based on Eqs. (51) and (52), Eq. (23) holds. \(\square\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, H., Kong, D. & Nie, F. Cauchy balanced nonnegative matrix factorization. Artif Intell Rev 56, 11867–11903 (2023). https://doi.org/10.1007/s10462-022-10379-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10379-y

Keywords

Navigation