Safe Feature Elimination for Non-negativity Constrained Convex Optimization

  • James FolberthEmail author
  • Stephen Becker


Inspired by recent work on safe feature elimination for 1-norm regularized least-squares, we develop strategies to eliminate features from convex optimization problems with non-negativity constraints. Our strategy is safe in the sense that it will only remove features/coordinates from the problem when they are guaranteed to be zero at a solution. To perform feature elimination, we use an accurate, but not optimal, primal–dual feasible pair, making our methods robust and able to be used on ill-conditioned problems. We supplement our feature elimination problem with a method to construct an accurate dual feasible point from an accurate primal feasible point; this allows us to use a first-order method to find an accurate primal feasible point and then use that point to construct an accurate dual feasible point and perform feature elimination. Under reasonable conditions, our feature elimination strategy will eventually eliminate all zero features from the problem. As an application of our methods, we show how safe feature elimination can be used to robustly certify the uniqueness of nonnegative least-squares problems. We give numerical examples on a well-conditioned synthetic nonnegative least-squares problem and on a set of 40,000 extremely ill-conditioned problems arising in a microscopy application.


Feature elimination Dimension reduction Duality NNLS 

Mathematics Subject Classification

49N15 90C25 90C46 



Stephen Becker acknowledges the donation of a Tesla K40c GPU from NVIDIA.


  1. 1.
    Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination in sparse supervised learning. Pac. J. Optim. 8(4) (2012)Google Scholar
  2. 2.
    Thompson, G.L., Tonge, F.M., Zionts, S.: Techniques for removing nonbinding constraints and extraneous variables from linear programming problems. Manage. Sci. 12(7), 588–608 (1966)CrossRefGoogle Scholar
  3. 3.
    Fercoq, O., Gramfort, A., Salmon, J.: Mind the duality gap: safer rules for the lasso. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 333–342. PMLR, Lille, France (2015)Google Scholar
  4. 4.
    Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: GAP safe screening rules for sparse multi-task and multi-class models. In: Advances in Neural Information Processing Systems, pp. 811–819 (2015)Google Scholar
  5. 5.
    Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: GAP safe screening rules for sparse-group lasso. In: Advances in Neural Information Processing Systems, pp. 388–396 (2016)Google Scholar
  6. 6.
    Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18(1), 4671–4703 (2017)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Xiang, Z.J., Wang, Y., Ramadge, P.J.: Screening tests for lasso problems. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 1008–1027 (2017)CrossRefGoogle Scholar
  8. 8.
    Ogawa, K., Suzuki, Y., Takeuchi, I.: Safe screening of non-support vectors in pathwise SVM computation. In: Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 28, pp. 1382–1390. PMLR, Atlanta, Georgia, USA (2013)Google Scholar
  9. 9.
    Zimmert, J., de Witt, C.S., Kerg, G., Kloft, M.: Safe screening for support vector machines. In: NIPS Workshop on Optimization in Machine Learning (OPT) (2015)Google Scholar
  10. 10.
    Raj, A., Olbrich, J., Gärtner, B., Schölkopf, B., Jaggi, M.: Screening rules for convex problems. arXiv preprint arXiv:1609.07478 (2016)
  11. 11.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2017)CrossRefGoogle Scholar
  12. 12.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  13. 13.
    Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM (2017)Google Scholar
  14. 14.
    Slawski, M., Hein, M., et al.: Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electron. J. Stat. 7, 3004–3056 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Tibshirani, R.J.: The lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Zhang, H., Yin, W., Cheng, L.: Necessary and sufficient conditions of solution uniqueness in 1-norm minimization. J. Optim. Theory Appl. 164(1), 109–122 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Donoho, D.L., Elad, M.: Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell ^1\) minimization. Proc. Nat. Acad. Sci. 100(5), 2197–2202 (2003)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Alexeev, B., Cahill, J., Mixon, D.G.: Full spark frames. J. Fourier Anal. Appl. 18(6), 1167–1194 (2012)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Tillmann, A.M., Pfetsch, M.E.: The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing. IEEE Trans. Inf. Theory 60(2), 1248–1259 (2014)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems, vol. 15. SIAM, Philadelphia, PA (1995)CrossRefGoogle Scholar
  22. 22.
    Amelunxen, D., Lotz, M., McCoy, M.B., Tropp, J.A.: Living on the edge: phase transitions in convex programs with random data. Inf. Inference J. IMA 3(3), 224–294 (2014)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Yu, J.Y., Becker, S.R., Folberth, J., Wallin, B.F., Chen, S., Cogswell, C.J.: Achieving superresolution with illumination-enhanced sparsity. Opt. Express 26(8), 9850–9865 (2018)CrossRefGoogle Scholar
  24. 24.
    Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA (1999)CrossRefGoogle Scholar
  25. 25.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore, MD (1998)zbMATHGoogle Scholar
  26. 26.
    Goto, K., Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. (TOMS) 34(3), 12 (2008)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165 (2011)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Goebel, R., Rockafellar, R.T.: Local strong convexity and local Lipschitz continuity of the gradient of convex functions. J. Convex Anal. 15(2), 263 (2008)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Ferreau, H.J., Kirches, C., Potschka, A., Bock, H.G., Diehl, M.: qpOASES: a parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4), 327–363 (2014)MathSciNetCrossRefGoogle Scholar
  31. 31.
    van den Berg, E.: A hybrid quasi-Newton projected-gradient method with application to lasso and basis-pursuit denoising. Math. Program. Comput. (2019).

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Colorado at BoulderBoulderUSA

Personalised recommendations