Skip to main content
Log in

Inexact proximal DC Newton-type method for nonconvex composite functions

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider a class of difference-of-convex (DC) optimization problems where the objective function is the sum of a smooth function and a possibly nonsmooth DC function. The application of proximal DC algorithms to address this problem class is well-known. In this paper, we combine a proximal DC algorithm with an inexact proximal Newton-type method to propose an inexact proximal DC Newton-type method. We demonstrate global convergence properties of the proposed method. In addition, we give a memoryless quasi-Newton matrix for scaled proximal mappings and consider a two-dimensional system of semi-smooth equations that arise in calculating scaled proximal mappings. To efficiently obtain the scaled proximal mappings, we adopt a semi-smooth Newton method to inexactly solve the system. Finally, we present some numerical experiments to investigate the efficiency of the proposed method, which show that the proposed method outperforms existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availibility

The data and code that support the findings of this study are available from the corresponding author upon request.

Notes

  1. Conditions (24) and (29) hold with \({\underline{\nu }}=10^{-6}\), \({\bar{\nu }}=L+10^{-6}\), \({\underline{\gamma }}=\frac{{\underline{\nu }}}{(L+{\bar{\nu }})^2}\) and \({\overline{\gamma }}=\frac{1}{{\underline{\nu }}}\).

  2. In pDCAe, the constant L in (15) is computed via the MATLAB code “L=norm(A*A’)"; when \(m\le 2000\), and by “opts.issym = 1; L= eigs(A*A’,1,’LM’,opts);" otherwise.

  3. We implement Algorithm 4 in the supplemental of [17].

References

  1. Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)

    Book  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542

    Article  MathSciNet  Google Scholar 

  3. Becker, S., Fadili, J., Ochs, P.: On quasi-Newton forward-backward splitting: proximal calculus and convergence. SIAM J. Optim. 29(4), 2445–2481 (2019). https://doi.org/10.1137/18M1167152

    Article  MathSciNet  Google Scholar 

  4. Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165 (2011). https://doi.org/10.1007/s12532-011-0029-5

    Article  MathSciNet  Google Scholar 

  5. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Program. 157(2), 375–396 (2016). https://doi.org/10.1007/s10107-015-0941-y

    Article  MathSciNet  Google Scholar 

  6. Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008). https://doi.org/10.1007/s00041-008-9045-x

    Article  MathSciNet  Google Scholar 

  7. Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)

  8. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 1. Springer, New York (2003)

    Google Scholar 

  9. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, New York (2003)

    Google Scholar 

  10. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273

    Article  MathSciNet  Google Scholar 

  11. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12(8), 989–1000 (1981). https://doi.org/10.1080/00207728108963798

    Article  Google Scholar 

  12. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: International Conference on Machine Learning, pp. 37–45 (2013)

  13. Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. 169(1), 141–176 (2018). https://doi.org/10.1007/s10107-017-1181-0

    Article  MathSciNet  Google Scholar 

  14. Lee, C.P., Wright, S.J.: Inexact successive quadratic approximation for regularized optimization. Comput. Optim. Appl. 72(3), 641–674 (2019). https://doi.org/10.1007/s10589-019-00059-z

    Article  MathSciNet  Google Scholar 

  15. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014). https://doi.org/10.1137/130921428

    Article  MathSciNet  Google Scholar 

  16. Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 129, 15–35 (2001). https://doi.org/10.1016/S0377-0427(00)00540-9

    Article  MathSciNet  ADS  Google Scholar 

  17. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 379–387 (2015)

  18. Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017). https://doi.org/10.1007/s00186-016-0566-9

    Article  MathSciNet  Google Scholar 

  19. Liu, T., Takeda, A.: An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems. Comput. Optim. Appl. 82(1), 141–173 (2021). https://doi.org/10.1007/s10589-022-00357-z

    Article  MathSciNet  Google Scholar 

  20. Liu X., Hsieh C.J., Lee J.D., Sun Y.: An inexact subsampled proximal Newton-type method for large-scale machine learning (2017). arXiv preprint arXiv:1708.08552

  21. Lu, Z., Li, X.: Sparse recovery via partial regularization: models, theory, and algorithms. Math. Oper. Res. 43(4), 1290–1316 (2018). https://doi.org/10.1287/moor.2017.0905

    Article  MathSciNet  Google Scholar 

  22. Nakayama, S., Gotoh, J.: On the superiority of PGMs to PDCAs in nonsmooth nonconvex sparse regression. Optim. Lett. 15, 2831–2860 (2021). https://doi.org/10.1007/s11590-021-01716-1

    Article  MathSciNet  Google Scholar 

  23. Nakayama, S., Narushima, Y., Yabe, H.: Memoryless quasi-Newton methods based on spectral-scaling Broyden family for unconstrained optimization. J. Ind. Manag. Optim. 15(4), 1773–1793 (2019). https://doi.org/10.3934/jimo.2018122

    Article  MathSciNet  Google Scholar 

  24. Nakayama, S., Narushima, Y., Yabe, H.: Inexact proximal memoryless quasi-Newton methods based on the Broyden family for minimizing composite functions. Comput. Optim. Appl. 79(1), 127–154 (2021). https://doi.org/10.1007/s10589-021-00264-9

    Article  MathSciNet  Google Scholar 

  25. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980). https://doi.org/10.2307/2006193

    Article  MathSciNet  Google Scholar 

  26. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)

    Google Scholar 

  27. Patrinos, P., Stella, L., Bemporad, A.: Forward-backward truncated Newton methods for convex composite optimization (2014). arXiv:1402.6655

  28. Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993). https://doi.org/10.1287/moor.18.1.227

    Article  MathSciNet  Google Scholar 

  29. Qi, L., Sun, D.: A survey of some nonsmooth equations and smoothing Newton methods. In: Progress in Optimization, pp. 121–146. Springer (1999)

  30. Qi, L., Sun, D., Zhou, G.: A new look at smoothing Newton methods for nonlinear complementarity problems and box constrained variational inequalities. Math. Program. 87(1), 1–35 (2000). https://doi.org/10.1007/s101079900127

    Article  MathSciNet  Google Scholar 

  31. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993). https://doi.org/10.1007/BF01581275

    Article  MathSciNet  Google Scholar 

  32. Rakotomamonjy, A., Flamary, R., Gasso, G.: DC proximal Newton for nonconvex optimization problems. IEEE Trans. Neural Netw. Learn. Syst. 27(3), 636–647 (2015). https://doi.org/10.1109/TNNLS.2015.2418224

    Article  MathSciNet  PubMed  Google Scholar 

  33. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1), 495–529 (2016). https://doi.org/10.1007/s10107-016-0997-3

    Article  MathSciNet  Google Scholar 

  34. Sun, W., Yuan, Y.X.: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)

    Google Scholar 

  35. Tao, P.D., Hoai An, L.T.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)

    MathSciNet  Google Scholar 

  36. Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 69(2), 297–324 (2018). https://doi.org/10.1007/s10589-017-9954-1

    Article  MathSciNet  Google Scholar 

  37. Xiao, X., Li, Y., Wen, Z., Zhang, L.: A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76(1), 364–389 (2018). https://doi.org/10.1007/s10915-017-0624-3

    Article  MathSciNet  Google Scholar 

  38. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015). https://doi.org/10.1137/140952363

    Article  MathSciNet  Google Scholar 

  39. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010). https://doi.org/10.1214/09-AOS729

    Article  MathSciNet  Google Scholar 

  40. Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010)

    MathSciNet  ADS  Google Scholar 

Download references

Acknowledgements

We would like to thank anonymous referees for their valuable comments, which helped us improve this paper’s quality. This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

Funding

This research was supported in part by JSPS KAKENHI (Grant Numbers 18K11179, 20K11698, 20K14986, and 23K10999). All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shummin Nakayama.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of Lemma 1

Proof

It follows from \(\eta \in (0,1]\), \(x_k+\eta d_k = \eta x_k^+ + (1-\eta )x_k\) and the convexity of \(h_1\) that

$$\begin{aligned} h_1(x_k+\eta d_k) \le \eta h_1(x_k^+) +(1-\eta )h_1(x_k). \end{aligned}$$

On the other hand, \(\xi _k\in \partial h_2(x_k)\) implies

$$\begin{aligned} h_2(x_k)+\eta \xi _k^Td_k \le h_2(x_k+\eta d_k). \end{aligned}$$

From the inequalities and Assumption 1, we obtain

$$\begin{aligned} f(x_k + \eta d_k)-f(x_k)&= g(x_k + \eta d_k)-g(x_k) \\&\qquad + h_1(x_k + \eta d_k) -h_1(x_k)-h_2(x_k + \eta d_k)+h_2(x_k) \\&\le (\nabla g(x_k)-\xi _k)^T(\eta d_k) + \frac{L}{2}\Vert \eta d_k\Vert ^2 + \eta \left( h_1(x_k^+) - h_1(x_k)\right) \\&=\eta \left( (\nabla g(x_k)-\xi _k)^Td_k+h_1(x_k^+)-h_1(x_k)\right) +\frac{\eta ^2L}{2}\Vert d_k\Vert ^2. \end{aligned}$$

Therefore, (17) holds.

Since it follows from (4) and (11) that

$$\begin{aligned} r_k - \nabla g(x_k) + \xi _k -B_kd_k \in \partial h_1(x_k^+), \end{aligned}$$

we obtain

$$\begin{aligned} h_1(x_k^+) + (r_k - \nabla g(x_k) +\xi _k- B_k d_k)^T(-d_k) \le h_1(x_k). \end{aligned}$$

Hence, we have

$$\begin{aligned} (\nabla g(x_k)-\xi _k)^Td_k + h_1(x_k^+) - h_1(x_k) \le r_k^Td_k-\Vert d_k\Vert _{B_k}^2. \end{aligned}$$
(A1)

Using (7) and the Cauchy-Schwarz inequality, we get

$$\begin{aligned} r_k^Td_k-\Vert d_k\Vert ^2_{B_k}\le \Vert r_k\Vert _{H_k}\Vert d_k\Vert _{B_k} - \Vert d_k\Vert _{B_k}^2\le -{\bar{\theta }} \Vert d_k\Vert _{B_k}^2. \end{aligned}$$
(A2)

Combining (A1) with (A2), we obtain (18), completing the proof. \(\square \)

Appendix B Proof of Lemma 2

Proof

For any \(0<\eta \le \frac{2m}{L}{\bar{\theta }}(1-\delta )\), we have from (16) and (18),

$$\begin{aligned} \frac{L\eta }{2}\Vert d_k\Vert ^2&\le m{\bar{\theta }}(1-\delta )\Vert d_k\Vert ^2\\&\le (1-\delta ){\bar{\theta }} \Vert d_k\Vert _{B_k}^2\\&\le -(1-\delta )((\nabla g(x_k)-\xi _k)^Td_k + h_1(x_k^+) - h_1(x_k)). \end{aligned}$$

Hence, it follows from (17) that

$$\begin{aligned} f(x_k + \eta d_k)-f(x_k) \le \eta \delta ((\nabla g(x_k)-\xi _k)^Td_k + h_1(x_k^+) - h_1(x_k)). \end{aligned}$$

This means that the line search condition (12) is satisfied for all

$$\begin{aligned} 0<\eta \le \min \left\{ 1,\frac{2m}{L}{\bar{\theta }}(1-\delta )\right\} . \end{aligned}$$

Therefore, since we use the backtracking line search with \(\beta _k\in (0,1)\),

$$\begin{aligned} \beta _k\min \left\{ 1,\frac{2m}{L}{\bar{\theta }}(1-\delta )\right\} \le \eta _k\le 1 \end{aligned}$$

holds. It follows from the above and \(\beta _{min}\le \beta _k\) that we have (20). Hence, this lemma is proved. \(\square \)

Appendix C Proof of Theorem 4

To prove Theorem 4, we introduce the following theorem [3, Theorem 3.4].

Theorem 7

Let \(V=D\pm \sum _{i=1}^r u_iu_i^T \in {\mathbb {R}}^{n\times n}\) be symmetric positive definite, where \(D\in {\mathbb {R}}^{n\times n}\) is symmetric positive definite and \(u_i\in {\mathbb {R}}^n\). Let \(U=(u_1,\ldots ,u_r)\). If \(r\le n\), U is full rank and \(h_1\) is proper lsc convex, then

$$\begin{aligned} \textrm{Prox}_{h_1}^V({\bar{x}}) = \textrm{Prox}_{h_1}^D({\bar{x}} \mp D^{-1}U\alpha ^*), \end{aligned}$$

where the mapping \({\mathcal {L}}:{\mathbb {R}}^r\rightarrow {\mathbb {R}}^r\) is defined by

$$\begin{aligned} {\mathcal {L}}(\alpha ) = U^T({\bar{x}} - \textrm{Prox}_{h_1}^D({\bar{x}} \mp D^{-1}U\alpha )) + \alpha \end{aligned}$$

and \(\alpha ^*\in {\mathbb {R}}^r\) is the unique root of \({\mathcal {L}}(\alpha )=0\).

By using this theorem, we can prove Theorem 4.

Proof of Theorem 4

Let \(P =\tau I + u_1u_1^T\), \(B= P - u_2u_2^T\). Then, from Theorem 7 with \(V=B\) and \(D=P\), we have

$$\begin{aligned} \textrm{Prox}_{h_1}^{B}({\bar{x}}) = \textrm{Prox}_{h_1}^{P} ({\bar{x}} + \alpha _2^*P^{-1}u_2), \end{aligned}$$

where the mapping \({\mathcal {L}}_2:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} {\mathcal {L}}_2(\alpha _2) = u_2^T({\bar{x}} - \textrm{Prox}_{h_1}^{P} ({\bar{x}} + \alpha _2P^{-1}u_2)) + \alpha _2 \end{aligned}$$

and \(\alpha _2^*\in {\mathbb {R}}\) is the root of \({\mathcal {L}}_2(\alpha _2)=0\). We next consider \(\textrm{Prox}_{h_1}^{P} ({\bar{x}} + \alpha _2^*P^{-1}u_2)\). Applying Theorem 7 with \(D = \tau I \) and \(V = P\), we have

$$\begin{aligned} \textrm{Prox}_{h_1}^{P}({\bar{x}} + \alpha _2^*P^{-1}u_2) = \textrm{Prox}_{h_1}^{\tau I} \left( {\bar{x}} + \alpha _2^*P^{-1}u_2 - \frac{\alpha _1^*}{\tau } u_1\right) \end{aligned}$$

where the mapping \({\mathcal {L}}_1:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} {\mathcal {L}}_1(\alpha _1) = u_1^T({\bar{x}} + \alpha _2^*P^{-1}u_2 - \textrm{Prox}_{h_1}^{\tau I} ({\bar{x}} + \alpha _2^*P^{-1}u_2 - \frac{\alpha _1}{\tau }u_1)) + \alpha _1 \end{aligned}$$

and \(\alpha _1^*\in {\mathbb {R}}\) is the root of \({\mathcal {L}}_1(\alpha _1)=0\). We now note that

$$\begin{aligned}{} & {} \textrm{Prox}_{h_1}^{\tau I}(\cdot ) = \mathop {\textrm{argmin}}\limits _{x\in {\mathbb {R}}^n}~ h_1(x)+\frac{1}{2}\Vert x-\cdot \Vert _{\tau I}^2\\ {}{} & {} = \mathop {\textrm{argmin}}\limits _{x\in {\mathbb {R}}^n}~ \frac{1}{\tau }h_1(x)+\frac{1}{2}\Vert x-\cdot \Vert ^2=\textrm{Prox}_{\frac{1}{\tau }h_1}(\cdot ).\end{aligned}$$

Summarizing the above relations, we have (32).

We next aim to show the existence and the uniqueness of the solution \(\alpha ^*\). The existence is immediately guaranteed by Theorem 7. To show uniqueness, we choose any two solutions of \({\mathcal {L}}(\alpha )=0\), say \({\hat{\alpha }}=({\hat{\alpha }}_1,{\hat{\alpha }}_2)^T,\ {\bar{\alpha }}=({\bar{\alpha }}_1,{\bar{\alpha }}_2)^T\in {\mathbb {R}}^2\). Then, it follows from \({\mathcal {L}}({\hat{\alpha }})={\mathcal {L}}({\bar{\alpha }})\) that

$$\begin{aligned} \left\{ \begin{array}{l} u_1^T({\hat{\alpha }}_2(\tau I+u_1u_1^T)^{-1}u_2-\textrm{Prox}_ {\frac{1}{\tau }h_1}(\zeta ({\hat{\alpha }})))+{\hat{\alpha }}_1\\ \qquad =u_1^T({\bar{\alpha }}_2(\tau I+u_1u_1^T)^{-1}u_2-\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta ({\bar{\alpha }})))+{\bar{\alpha }}_1,\\ -u_2^T\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta ({\hat{\alpha }}))+{\hat{\alpha }}_2 =-u_2^T\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta ({\bar{\alpha }}))+{\bar{\alpha }}_2. \end{array} \right. \end{aligned}$$

Thus, the relations \(\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta ({\bar{\alpha }}))=\textrm{Prox}_{h_1}^{B}({\bar{x}}) =\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta ({\hat{\alpha }}))\) and the second equality yield \({\hat{\alpha }}_2={\bar{\alpha }}_2\). Further, the first equality implies \({\hat{\alpha }}_1={\bar{\alpha }}_1\). Therefore, we have \({\hat{\alpha }}={\bar{\alpha }}\), which implies that the solution of \({\mathcal {L}}(\alpha )=0\) is unique, completing the proof. \(\square \)

Appendix D Proof of Proposition 2

Proof

For simplicity, we set \({\hat{x}}=\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta (\alpha ))\). It follows from (30), (33) and \( u_1u_1^T(\tau I+u_1u_1^T)^{-1}=I-\tau (\tau I+u_1u_1^T)^{-1} \) that

$$\begin{aligned} U{\mathcal {L}}(\alpha )&= -u_1u_1^T{{\bar{x}}} - \alpha _2u_1u_1^T(\tau I+u_1u_1^T)^{-1} u_2+u_1u_1^T {\hat{x}} - \alpha _1 u_1 + u_2u_2^T\bar{x} -u_2u_2^T{\hat{x}} + \alpha _2 u_2 \nonumber \\&=(-u_1u_1^T+u_2u_2^T)({\bar{x}}-{\hat{x}}) - \alpha _2u_1u_1^T(\tau I+u_1u_1^T)^{-1} u_2 - \alpha _1 u_1 + \alpha _2 u_2 \nonumber \\&=(\tau I-B)({\bar{x}}-{\hat{x}}) - \alpha _2u_2 + \alpha _2\tau (\tau I+u_1u_1^T)^{-1} u_2 - \alpha _1 u_1 + \alpha _2 u_2 \nonumber \\&=B({\hat{x}}-{{\bar{x}}}) + \tau ({\bar{x}}-{\hat{x}}) - \alpha _1 u_1 + \tau \alpha _2(\tau I+u_1u_1^T)^{-1} u_2. \end{aligned}$$
(D3)

On the other hand, \({\hat{x}}=\textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta (\alpha ))\) implies

$$\begin{aligned} \tau (\zeta (\alpha )-{\hat{x}})\in \partial h_1({\hat{x}}). \end{aligned}$$

Therefore, it follows from (31), (D3), \({\bar{x}}=x-H(\nabla g(x)- \xi )\), and \(BH=I\) that

$$\begin{aligned} U{\mathcal {L}}(\alpha )&=B({\hat{x}}-{{\bar{x}}}) +\tau \left( {{\bar{x}}} -{\hat{x}}- \frac{ \alpha _1 }{\tau } u_1 + \alpha _2(\tau I+u_1u_1^T)^{-1} u_2 \right) \\&=B({\hat{x}}-{{\bar{x}}})+\tau ( \zeta (\alpha )-{\hat{x}})\\&=\nabla g(x)- \xi + B({\hat{x}}-x)+\tau (\zeta (\alpha )-{\hat{x}})\\&\in \nabla g(x) - \xi + B({\hat{x}}-x)+\partial h_1({\hat{x}}). \end{aligned}$$

This completes the proof. \(\square \)

Appendix E Proof of Theorem 5

To prove Theorem 5, we first give the following lemma.

Lemma 3

Assume that \(\textrm{Prox}_{\frac{1}{\tau }h_1}\) is B-differentiable. Let \({\bar{\alpha }}\in {\mathbb {R}}^2\) be a point such that \({\mathcal {L}}({\bar{\alpha }})\ne 0\) and any element of \(\partial ^C {\mathcal {L}}({\bar{\alpha }})\) is nonsingular. Then, there exist a positive constant \({{\bar{t}}}\) and a compact neighborhood \({\mathcal {N}}({\bar{\alpha }})\) of \({\bar{\alpha }}\) such that the following statements hold for any \(\alpha \in {\mathcal {N}}({\bar{\alpha }})\):

  1. (a)

    \({\mathcal {L}}(\alpha )\ne 0\) and any element of \(\partial ^C {\mathcal {L}}(\alpha )\) is nonsingular.

  2. (b)

    For \(p=-V^{-T}{\mathcal {L}}(\alpha )\) and \(V\in \partial ^C {\mathcal {L}}(\alpha )\) satisfying

    $$\begin{aligned} \Psi ^\prime (\alpha ;p)\le (V{\mathcal {L}}(\alpha ))^Tp, \end{aligned}$$
    (E4)

    the inequality

    $$\begin{aligned} \Psi (\alpha +t p)\le (1-2\sigma t)\Psi (\alpha ) \end{aligned}$$
    (E5)

    holds for any \(t\in (0,{{\bar{t}}}]\).

Proof

Since \(\textrm{Prox}_{\frac{1}{\tau }h_1}\) is local Lipschitz continuous, \({\mathcal {L}}\) is also local Lipschitz continuous, and so \(\partial ^C {\mathcal {L}}(\alpha )\) is compact for any \(\alpha \). Since any element of \(\partial ^C {\mathcal {L}}({\bar{\alpha }})\) is nonsingular, there exists a compact neighborhood \({\mathcal {T}}({\bar{\alpha }}) \supset \partial ^C {\mathcal {L}}({\bar{\alpha }})\) such that any element of \({\mathcal {T}}({\bar{\alpha }})\) is nonsingular. Because \(\partial ^C {\mathcal {L}}\) is upper semi-continuous and \(\partial ^C {\mathcal {L}}(\alpha )\) is compact for any \(\alpha \), we can choose \({\mathcal {T}}({\bar{\alpha }}) \supset \partial ^C {\mathcal {L}}({\bar{\alpha }})\) and a compact neighborhood \({\mathcal {N}}({\bar{\alpha }})\) of \({\bar{\alpha }}\) such that \({\mathcal {L}}(\alpha )\ne 0\) and \(\partial ^C {\mathcal {L}}(\alpha )\subset {\mathcal {T}}({\bar{\alpha }})\) hold for any \(\alpha \in {\mathcal {N}}({\bar{\alpha }})\). Thus, (a) is satisfied.

Next, we show (b). Since \(\textrm{Prox}_{\frac{1}{\tau }h_1}\) is local Lipschitz continuous and directionally differentiable, \({\mathcal {L}}\) is B-differentiable  [8, Definition 3.1.2]. Thus, it follows from (E4) and [8, Proposition 3.1.3] that the following relations hold for any \(t>0\):

$$\begin{aligned} \Psi (\alpha +t p)&=\Psi (\alpha )+\Psi ^\prime (\alpha ;tp)+o(\Vert t p\Vert )\nonumber \\&=\Psi (\alpha )+t\Psi ^\prime (\alpha ;p)+o(\Vert t p\Vert )\nonumber \\&\le \Psi (\alpha )+t(V{\mathcal {L}}(\alpha ))^T p+o(\Vert t p\Vert )\nonumber \\&=\Psi (\alpha )-t \Vert {\mathcal {L}}(\alpha )\Vert ^2+o(\Vert t p\Vert )\nonumber \\&=(1-2t)\Psi (\alpha )+o(\Vert t p\Vert ). \end{aligned}$$
(E6)

From the above arguments, for any \(\alpha \in {\mathcal {N}}(\bar{\alpha })\), it holds that \(\partial ^C {\mathcal {L}}(\alpha )\subset {\mathcal {T}}({\bar{\alpha }})\) and \({\mathcal {T}}({\bar{\alpha }})\) is compact. Hence, \(p=-V^{-T}{\mathcal {L}}(\alpha )\) is bounded. In addition, since \({\mathcal {N}}(\bar{\alpha })\) is compact and \({\mathcal {L}}(\alpha )\ne 0\) for any \(\alpha \in {\mathcal {N}}(\bar{\alpha })\), there exists a positive constant \({\tilde{\Psi }}\) such that \({\tilde{\Psi }}\le \Psi (\alpha )\) for any \(\alpha \in {\mathcal {N}}(\bar{\alpha })\). Therefore, it follows from \(\sigma \in (0,1/2)\) and (E6) that

$$\begin{aligned} \Psi (\alpha +t p)\le (1-2\sigma t)\Psi (\alpha ) -2t(1-\sigma )\tilde{\Psi }+o(t). \end{aligned}$$

Thus, there exists a positive constant \({\bar{t}}\) such that (E5) holds for any \(t\in (0,{{\bar{t}}}]\). \(\square \)

From Lemma 3, we immediately have the following property.

Remark 3

Consider Algorithm 2. If any element of \(\partial ^C {\mathcal {L}}(\alpha _j)\) is nonsingular and (40) holds, then the line search condition (38) is achieved for some finite number l.

By using Lemma 3, we prove Theorem 5.

Proof of Theorem 5

If \({\mathcal {L}}(\alpha _j)=0\) for some \(j\ge 0\), we have the desired result. Thus, we consider the case where \({\mathcal {L}}(\alpha _j)\ne 0\) for all \(j\ge 0\). It follows from Remark 3 and the line search condition (38) that \(\{\Psi (\alpha _j)\}\) is a nonincreasing sequence. Hence, \(\{\alpha _j\}\subset {\mathcal {S}}_0\) holds. Since the level set \({\mathcal {S}}_0\) is compact, \(\{\alpha _j\}\) has at least one accumulation point.

We show the theorem by contradiction. Assume that there exists an accumulation point \({\widehat{\alpha }}\) such that \({\mathcal {L}}({{\widehat{\alpha }}})\ne 0\) (namely, \(\Psi ({{\widehat{\alpha }}})>0\)), and consider a subsequence \(\{\alpha _{j_i}\}\) such that \(\{\alpha _{j_i}\}\rightarrow {\widehat{\alpha }}\ (i\rightarrow \infty )\). For sufficiently large i, the relation \(\{\alpha _{j_i}\}\subset {\mathcal {N}}({\widehat{\alpha }})\) holds, where \({\mathcal {N}}({\widehat{\alpha }})\) is the neighborhood appearing in Lemma 3 with \({\bar{\alpha }}={\widehat{\alpha }}\). Let \({\hat{l}}\) be the smallest nonnegative integer such that \(\rho ^{{\hat{l}}}\le {{\bar{t}}}\), where \({{\bar{t}}}\) is the positive constant appearing in Lemma 3. Then, it follows from (E5) that

$$\begin{aligned} \Psi \left( \alpha _{j_i}+\rho ^{{\hat{l}}} p_{j_i}\right) \le \left( 1-2\sigma \rho ^{{\hat{l}}}\right) \Psi (\alpha _{j_i}) \end{aligned}$$

holds for sufficiently large i. From the backtracking rule of the algorithm, \(\rho ^{{\hat{l}}}\le t_{j_i}\) is satisfied. Hence, taking into account \(j_i+1\le j_{i+1}\), we have

$$\begin{aligned} \Psi (\alpha _{j_{i+1}})\le \Psi (\alpha _{j_i+1})=\Psi (\alpha _{j_i}+t_{j_i} p_{j_i}) \le (1-2\sigma t_{j_i})\Psi (\alpha _{j_i})\le \left( 1-2\sigma \rho ^{{\hat{l}}}\right) \Psi (\alpha _{j_i}). \end{aligned}$$

Since \(1-2\sigma \rho ^{{\hat{l}}}\in (0,1)\) is a constant independent of i, we obtain

$$\begin{aligned} \Psi ({\widehat{\alpha }})=\lim _{i\rightarrow \infty }\Psi (\alpha _{j_i})=0. \end{aligned}$$

Since this contradicts the assumption \({\mathcal {L}}({{\widehat{\alpha }}})\ne 0\), any accumulation point of \(\{\alpha _j\}\) is a solution of (36). Moreover, from Theorem 4, problem (36) has a unique solution. Hence, the proof is complete. \(\square \)

Appendix F Proof of Theorem 6

Proof

It follows from Theorem 5, the sequence \(\{\alpha _j\}\) converges to the unique solution \(\alpha ^*\). In the same way as the proof of Lemma 3(a), we can show that there exists a compact neighborhood \({\mathcal {N}}^\prime (\alpha ^*)\) such that any element of \(\partial ^C {\mathcal {L}}(\alpha )\) is nonsingular for any \(\alpha \in {\mathcal {N}}^\prime (\alpha ^*)\). Since \({\mathcal {N}}(\alpha ^*)\) is a compact set, \(\partial ^C {\mathcal {L}}\) is upper semi-continuous, and \(\alpha _j\in {\mathcal {N}}(\alpha ^*)\) for sufficiently large j, there exists a positive constant \({\widehat{c}}_1\) such that

$$\begin{aligned} \Vert V_j^{-1}\Vert \le {\widehat{c}}_1 \quad \text{ for } \forall V_j\in \partial ^C {\mathcal {L}}(\alpha _j) \end{aligned}$$

holds. Therefore, the (strongly) semi-smoothness yields

$$\begin{aligned} \begin{aligned} \Vert \alpha _j+p_j-\alpha ^*\Vert&=\Vert \alpha _j-V_j^{-T}{\mathcal {L}}(\alpha _j)-\alpha ^*\Vert \\&\le {\widehat{c}}_1\Vert V_j^T(\alpha _j-\alpha ^*)-{\mathcal {L}}(\alpha _j) +{\mathcal {L}}(\alpha ^*)\Vert =o(\Vert \alpha _j-\alpha ^*\Vert )\\&(=O(\Vert \alpha _j-\alpha ^*\Vert ^2\ \ \text{ for } \text{ the } \text{ strongly } \text{ semi-smooth } \text{ case}). \end{aligned} \end{aligned}$$
(F7)

On the other hand, from the local Lipschitz continuity of \({\mathcal {L}}\) and [28, Theorem 3.1], there exist positive constants \({\widehat{c}}_2,\ {\widehat{c}}_3\) satisfying

$$\begin{aligned} {\widehat{c}}_2\Vert \alpha _j-\alpha ^*\Vert \le \Vert {\mathcal {L}}(\alpha _j)-{\mathcal {L}}(\alpha ^*)\Vert \le {\widehat{c}}_3 \Vert \alpha _j-\alpha ^*\Vert . \end{aligned}$$

Therefore, by (F7), we have

$$\begin{aligned} \Psi (\alpha _j+p_j)&=\frac{1}{2} \Vert {\mathcal {L}}(\alpha _j+p_j)-{\mathcal {L}}(\alpha ^*)\Vert ^2\\&=O(\Vert \alpha _j\!+\!p_j\!-\!\alpha ^*\Vert ^2)\!=\!o(\Vert \alpha _j\!-\!\alpha ^*\Vert ^2)\!=\!o(\Vert {\mathcal {L}}(\alpha _j)\Vert ^2)\!=\!o(\Psi (\alpha _j)), \end{aligned}$$

which implies that the line search condition (38) holds with \(l=0\), namely, \(t_j=1\). Thus, using (F7), we obtain

$$\begin{aligned} \Vert \alpha _{j+1}-\alpha ^*\Vert&=o(\Vert \alpha _j-\alpha ^*\Vert )\\&(=O(\Vert \alpha _j-\alpha ^*\Vert ^2\ \ \text{ for } \text{ the } \text{ strongly } \text{ semi-smooth } \text{ case}), \end{aligned}$$

and hence the proof is complete. \(\square \)

Appendix G Proof of Proposition 3

Proof

The definition (34) yields

$$\begin{aligned} u_2^Tu_2=\tau _k,\quad u_1^Tu_1=\frac{\gamma _k z_{k-1}^Tz_{k-1}}{s_{k-1}^Tz_{k-1}}, \quad u_1^Tu_2=\frac{\sqrt{\tau _k\gamma _ks_{k-1}^Tz_{k-1}}}{\Vert s_{k-1}\Vert }. \end{aligned}$$

It follows from \(s_{k-1}^Tz_{k-1}>0\) and the Cauchy–Schwarz inequality that

$$\begin{aligned} \frac{s_{k-1}^Tz_{k-1}}{s_{k-1}^Ts_{k-1}}\le \frac{z_{k-1}^Tz_{k-1}}{s_{k-1}^Tz_{k-1}}. \end{aligned}$$

Therefore, using (15), (23), (24), and (29), we have

$$\begin{aligned} \underline{\tau }\le u_2^Tu_2\le {\bar{\tau }},\quad \underline{\gamma }\underline{\nu } \le \frac{\gamma _k s_{k-1}^Tz_{k-1}}{s_{k-1}^Ts_{k-1}} \le u_1^Tu_1\le \frac{{\bar{\gamma }}(\bar{\nu }+L)^2}{\underline{\nu }}, \end{aligned}$$

and

$$\begin{aligned} \sqrt{\underline{\tau }\underline{\gamma }\underline{\nu }}\le u_1^Tu_2 \le \sqrt{\bar{\tau }\bar{\gamma }(\bar{\nu }+L)}. \end{aligned}$$

From \((\tau _k I+u_1u_1^T)^{-1}=\frac{1}{\tau _k}I-\frac{u_1u_1^T}{\tau _k^2+\tau _k\Vert u_1\Vert ^2}\), we get

$$\begin{aligned} u_2^T(\tau _k I+u_1u_1^T)^{-1}u_2=\frac{1}{\tau _k} u_2^Tu_2-\frac{(u_1^Tu_2)^2}{\tau _k^2+\tau _k\Vert u_1\Vert ^2} =1-\frac{(u_1^Tu_2)^2}{\tau _k^2+\tau _k\Vert u_1\Vert ^2}, \end{aligned}$$

which implies that

$$\begin{aligned} \frac{\underline{\tau }\underline{\gamma } \underline{\nu }}{\bar{\tau }^2+\bar{\tau }\frac{{\bar{\gamma }}(\bar{\nu }+L)^2}{\underline{\nu }}} \le 1-u_2^T(\tau _k I+u_1u_1^T)^{-1}u_2 \le \frac{(u_1^Tu_2)^2}{\tau _k^2} \le \frac{\bar{\tau }\bar{\gamma }(\bar{\nu }+L)}{\underline{\tau }^2}. \end{aligned}$$

By letting \(v=\tau _k(\zeta (\alpha ) - \textrm{Prox}_{\frac{1}{\tau _k}h_1}(\zeta (\alpha )))\in \partial h_1(\zeta (\alpha ))\), it follows from (31) and (33) that

$$\begin{aligned} {\mathcal {L}}(\alpha ) = \begin{pmatrix} \frac{1}{\tau _k}u_1^Tv+(1+\frac{1}{\tau _k}u_1^Tu_1)\alpha _1\\ \frac{1}{\tau _k}u_2^Tv+\frac{1}{\tau _k}u_1^Tu_2 \alpha _1+(1-u_2^T(\tau _k I+u_1u_1^T)^{-1}u_2)\alpha _2 \end{pmatrix}. \end{aligned}$$
(G8)

On the other hand, from the assumption (41) and the above evaluations, the following relations hold:

$$\begin{aligned} |u_1^Tv |\le {\bar{c}} \sqrt{\frac{{\bar{\gamma }}(\bar{\nu }+L)^2}{\underline{\nu }}},\quad |u_2^Tv |\le {\bar{c}}\sqrt{\bar{\tau }}. \end{aligned}$$

Therefore, it follows from the above evaluations, (29), and (G8) that there exist positive constants \({\widehat{c}}_4,{\widehat{c}}_5\), and \({\widehat{c}}_6\) satisfying

$$\begin{aligned} {\widehat{c}}_4\alpha _1^2+({\widehat{c}}_5\alpha _1+{\widehat{c}}_6\alpha _2)^2\le \frac{1}{2} \Vert {\mathcal {L}}(\alpha )\Vert ^2=\Psi (\alpha ) \end{aligned}$$

when \(\Vert \alpha \Vert \) is sufficiently large. Therefore, the proof is complete. \(\square \)

Appendix H Choice for \(V_j\)

Proposition 4

Suppose that \(h_1(x)=\lambda \Vert x\Vert _1\) \((\lambda >0)\). Let \(\zeta \) and \({\mathcal {L}}\) be given in (31) and (33), and let

$$\begin{aligned} V_j = \begin{pmatrix} 1+\frac{1}{\tau }u_1^TWu_1 &{}\frac{1}{\tau }u_2^TWu_1 \\ (u_1 - Wu_1)^T(\tau I+u_1u_1^T)^{-1}u_2 &{} ~1-u_2^TW(\tau I+u_1u_1^T)^{-1}u_2 \end{pmatrix}, \end{aligned}$$
(H9)

where

$$\begin{aligned} W = \begin{pmatrix} w_1 &{}\\ &{} \ddots &{} \\ &{} &{} w_n \end{pmatrix} \quad \text {and}\quad w_i= \left\{ \begin{array}{ll} 1~~ &{} \text {if}~ |(\zeta (\alpha _j))_i|> \frac{\lambda }{\tau }, \\ 0~~ &{}\text {otherwise}, \end{array} \right. \end{aligned}$$

for \(i=1,\dots ,n\). Then, \(V_j\in \partial ^C {\mathcal {L}}(\alpha _j)\) holds.

Proof

For simplicity, we omit the subscript j and set

$$\begin{aligned} {\bar{u}}_1 = \frac{1}{\tau }u_1\quad \text {and}\quad {\bar{u}}_2 = (\tau I+u_1u_1^T)^{-1}u_2. \end{aligned}$$

Then, we can rewrite \(\zeta (\alpha )\) and \({\mathcal {L}}(\alpha ) \) as

$$\begin{aligned} \zeta (\alpha ) = {\bar{x}} - \alpha _1 {\bar{u}}_1 + \alpha _2 {\bar{u}}_2 \end{aligned}$$

and

$$\begin{aligned} {\mathcal {L}}(\alpha ) = \begin{pmatrix} \alpha _1 + u_1^T{\bar{x}} + \alpha _2(u_1^T{\bar{u}}_2) - u_1^T \textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta (\alpha ))\\ \alpha _2 + u_2^T{\bar{x}} - u_2^T \textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta (\alpha )) \end{pmatrix}, \end{aligned}$$

respectively. When \(h_1(x)=\lambda \Vert x\Vert _1\) \((\lambda >0)\), the proximal mapping is given by

$$\begin{aligned} \left( \textrm{Prox}_{\frac{1}{\tau }h_1}(\zeta (\alpha ))\right) _i = \left\{ \begin{array}{ll} (\zeta (\alpha ))_i- \frac{\lambda }{\tau } &{} \text {if}~(\zeta (\alpha ))_i \ge \frac{\lambda }{\tau }, \\ 0 &{} \text {if}~|(\zeta (\alpha ))_i |< \frac{\lambda }{\tau },\\ (\zeta (\alpha ))_i + \frac{\lambda }{\tau } &{} \text {if}~ (\zeta (\alpha ))_i \le -\frac{\lambda }{\tau }. \end{array} \right. \end{aligned}$$

We now consider \({\mathcal {D}} = \{\alpha \vert {\mathcal {L}}(\alpha ) \text { is differenciable}\}\). For \(\forall \alpha \in {\mathcal {D}}\), we have

$$\begin{aligned} \nabla {\mathcal {L}}(\alpha ) = \begin{pmatrix}\displaystyle 1 + \sum _{i=1}^n (u_1)_i ({\bar{u}}_1)_i {\bar{\omega }}_{i} &{} \displaystyle \sum _{i=1}^n({\bar{u}}_1)_i (u_2)_i {\bar{\omega }}_{i} \\ \displaystyle u_1^T{\bar{u}}_2 - \sum _{i=1}^n({u}_1)_i({\bar{u}}_2)_i {\bar{\omega }}_{i} &{}\displaystyle 1 - \sum _{i=1}^n(u_2)_i ({\bar{u}}_2)_i {\bar{\omega }}_{i} \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} {\bar{\omega }}_i = {\left\{ \begin{array}{ll} 1 &{}{}\text{ if }~ |(\zeta (\alpha ))_i|> \frac{\lambda }{\tau },\\ 0 &{}{}\text{ if }~ |(\zeta (\alpha ))_i |< \frac{\lambda }{\tau }. \end{array}\right. } \end{aligned}$$

Thus, the Clarke differential of \({\mathcal {L}}\) is given by

$$\begin{aligned}{} & {} \partial ^C {\mathcal {L}}(\alpha ) = \left\{ \begin{pmatrix}\displaystyle 1 + \sum _{i=1}^n (u_1)_i ({\bar{u}}_1)_i {\hat{\omega }}_{i} &{} \displaystyle \sum _{i=1}^n({\bar{u}}_1)_i (u_2)_i {\hat{\omega }}_{i} \\ \displaystyle u_1^T{\bar{u}}_2 - \sum _{i=1}^n({u}_1)_i({\bar{u}}_2)_i {\hat{\omega }}_{i} &{}\displaystyle 1 - \sum _{i=1}^n(u_2)_i ({\bar{u}}_2)_i{\hat{\omega }}_{i} \end{pmatrix}\right. \\{} & {} \left. \left| ~ \hat{\omega }_i \left\{ \begin{array}{ll} =1 &{} \text {if}~|(\zeta (\alpha ))_i|> \frac{\lambda }{\tau }, \\ \in [0,1] &{} \text {if}~ |(\zeta (\alpha ))_i|= \frac{\lambda }{\tau }, \\ =0 &{} \text {if}~ |(\zeta (\alpha ))_i |< \frac{\lambda }{\tau }. \end{array} \right. \right. \right\} . \end{aligned}$$

Therefore, we obtain \(V\in \partial ^C {\mathcal {L}}(\alpha )\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakayama, S., Narushima, Y. & Yabe, H. Inexact proximal DC Newton-type method for nonconvex composite functions. Comput Optim Appl 87, 611–640 (2024). https://doi.org/10.1007/s10589-023-00525-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00525-9

Keywords

Navigation