Advertisement

Mathematical Programming

, Volume 166, Issue 1–2, pp 207–240 | Cite as

Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

  • Hongcheng Liu
  • Tao Yao
  • Runze LiEmail author
  • Yinyu Ye
Full Length Paper Series A

Abstract

This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (1) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (2) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S\(^3\)ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S\(^3\)ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (3) We apply (2) to the special case of FCPSLR with minimax concave penalty and show that under the restricted eigenvalue condition, any S\(^3\)ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S\(^3\)ONC admits FPTAS.

Keywords

Sparse recovery Non-convex programming NP-completeness Folded concave penalty Lasso 

Mathematics Subject Classification

90C26 90C90 62J05 62J07 68Q25 

Notes

Acknowledgements

The authors thank the AE and referees for their valuable comments, which significantly improve the paper. This work was supported by Penn State Grace Woodward Collaborative Research Grant, NSF grants CMMI 1300638 and DMS 1512422, NIH grants P50 DA036107 and P50 DA039838, Marcus PSU-Technion Partnership grant, Air Force Office of Scientific Research grant FA9550-12-1-0396, and Mid-Atlantic University Transportation Centers grant. This work was also partially supported by NNSFC grants 11690014 and 11690015. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF, the NIDA, the NIH, the AFOSR, the MAUTC or the NNSFC.

Supplementary material

10107_2017_1114_MOESM1_ESM.pdf (136 kb)
Supplementary material 1 (pdf 136 KB)

References

  1. 1.
    Adamczak, R., Litvak, A., Pajor, A., Tomczak-Jaegermann, N.: Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. J. Am. Math. Soc. 234, 535–561 (2010)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bertsimas, D., Mazumder, R.: Least quantile regression via modern optimization. Ann. Stat. 42, 2494–2525 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bian, W., Chen, X.: Optimality conditions and complexity for non-Lipschitz constrained optimization problems. http://www.polyu.edu.hk/ama/staff/xjchen/OCT26 (2014)
  4. 4.
    Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and non-convex minimization. Math. Program. A 149, 301–327 (2015)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Candés, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory. 51(12), 4203–4215 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Cartis, C., Gould, N.I.M., Toint, P.I.: Adaptive cubic regularization methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Prog. A 127, 245–295 (2011)CrossRefzbMATHGoogle Scholar
  8. 8.
    Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained L\(_2\)-L\(_{\mathbf{p}}\) minimization. Math. Prog. A. 143, 371–383 (2014)CrossRefGoogle Scholar
  9. 9.
    Chen, X., Xu, F., Ye, Y.: Lower bound theory of non-zero entries in solutions of L\(_2\)-L\(\mathbf{p}\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory 57, 5467–5484 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Fan, J., Lv, J., Qi, L.: Sparse high dimensional models in economics. Annu. Rev. Econo. 3, 291–317 (2011)CrossRefGoogle Scholar
  13. 13.
    Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42(3), 819–849 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Ge, D., Wang, Z., Ye, Y., Yin, H.: Strong NP-hardness result for regularized \(L_q\)-minimization problems with concave penalty functions. arxiv:1501.00622v1 (2015)
  15. 15.
    Hunter, D., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33, 1617–1642 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Hsu, D., Kakade, S.M., Zhang, T.: Random design analysis of ridge regression. arXiv:1106.2363v2. (2014)
  17. 17.
    Hsu, D., Kakade, S.M., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17(52), 1–6 (2012)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Huo, X., Chen, J.: Complexity of penalized likelihood estimation. J. Stat. Comput. Simul. 80(7), 747–759 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Liu, H., Yao, T., Li, R.: Global solutions for folded concave penalized nonconvex learning. Ann. Stat. 44(2), 629–659 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Liu, H., Yao, T., Li, R., Ye, Y.: Electronic Companion to: Folded Concave Penalized Sparse Linear Regression: Sparsity, Statistical Performance, and Algorithmic Theory for Local Solutions (2017)Google Scholar
  21. 21.
    Loh, P.-L., Wainwright, M.J.: Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 177–205 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Raskutti, G., Wainwright, M., Yu, B.: Restricted nullspace and eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Rudelson, M., Zhou, S.: Reconstruction from anisotropic random measurements. IEEE Trans. Inf. Theory 59(6), 3434–3447 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Raskutti, G., Wainwright, M.J., Yu, B.: Minimax rates of estimation for high-dimensional linear regression over \(\ell _q\)-balls. IEEE Trans. Inf. Theory 57(10), 6976–6994 (2011)CrossRefzbMATHGoogle Scholar
  27. 27.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  28. 28.
    van de Geer, S.A., Bühlmann, P.: On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3, 1360–1392 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Vavasis, S.A.: Quadratic programming is in NP. Inf. Process. Lett. 36, 73–77 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Vershynin, R.: How close is the sample covariance matrix to the actual covariance matrix. arXiv:1004.3484v2 (2010)
  31. 31.
    Wang, L., Kim, Y., Li, R.: Calibrating non-convex penalized regressioni in ultra-high dimension. Ann. Stat. 41(5), 2505–2536 (2013)CrossRefzbMATHGoogle Scholar
  32. 32.
    Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse non-convex learning problems. Ann. Stat. 42(6), 2164–2201 (2014)CrossRefzbMATHGoogle Scholar
  33. 33.
    Ye, Y.: On affine scaling algorithms for non-convex quadratic programming. Math. Program. 56, 285–300 (1992)CrossRefzbMATHGoogle Scholar
  34. 34.
    Ye, Y.: On the complexity of approximating a KKT point of quadratic programming. Math. Program. 80, 195–211 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 28, 894–942 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Zhang, Y., Wainwright, M.J., Jordan, M.I.: Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. JMLR: Worksh. Conf. Proc. 35, 1–18 (2014)Google Scholar
  37. 37.
    Zhang, C., Zhang, T.: A general theory of concave regularization for high dimensional sparse estimation problems. Stat. Sci. 27(4), 576–593 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Zhou, S.: Restricted Eigenvalue Conditions on Subgaussian Random Matrices. arXiv:0912.4045v2 (2009)
  39. 39.
    Zou, H., Li, R.: One-step sparse estimation in non-concave penalized likelihood models. Ann. Stat. 36, 1509–1533 (2008)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2017

Authors and Affiliations

  1. 1.Harold and Inge Marcus Department of Industrial and Manufacturing EngineeringThe Pennsylvania State UniversityUniversity ParkUSA
  2. 2.Department of Statistics and the Methodology CenterThe Pennsylvania State UniversityUniversity ParkUSA
  3. 3.Department of Management Science and EngineeringStanford UniversityStanfordUSA

Personalised recommendations