Advanced Optimization Topics

  • Kenneth Lange
Part of the Statistics and Computing book series (SCO)


Our final chapter on optimization provides a concrete introduction to several advanced topics. The first vignette describes classical penalty and barrier methods for constrained optimization [22, 37, 45]. Penalty methods operate on the exterior and barrier methods on the interior of the feasible region. Fortunately, it is fairly easy to prove global convergence for both methods under reasonable hypotheses.


Feasible Region Nonnegative Matrix Factor Coordinate Descent Surrogate Function Barrier Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363-366MATHCrossRefGoogle Scholar
  2. 2.
    Boyle JP, Dykstra RL (1985) A method for finding projections onto the intersection of convex sets in Hilbert space. In Advances in Order Restricted Statistical Inference, Lecture Notes in Statistics, Springer, New York, 28-47Google Scholar
  3. 3.
    Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Soviet Math Doklady 6:688-692MATHGoogle Scholar
  4. 4.
    Candes EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Annals Stat 35:2313-2351MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Candes EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ1 minimization. J Fourier Anal Appl 14:877-905CrossRefMathSciNetGoogle Scholar
  6. 6.
    Censor Y, Reich S (1996) Iterations of paracontractions and firmly nonexpansive operators with applications to feasibility and optimization. Optimization 37:323-339MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optimization Theory Appl 73:451-464MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33-61CrossRefMathSciNetGoogle Scholar
  9. 9.
    Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826-844CrossRefGoogle Scholar
  10. 10.
    Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413-1457MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    de Leeuw J, Lange K (2007) Sharp quadratic majorization in one dimension.Google Scholar
  12. 12.
    Deutsch F (2001) Best Approximation in Inner Product Spaces. Springer, New YorkMATHGoogle Scholar
  13. 13.
    Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425-455MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Dykstra RL (1983) An algorithm for restricted least squares estimation. J Amer Stat Assoc 78:837-842MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279-285Google Scholar
  16. 16.
    Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Philosophical Magazine 25:184-191Google Scholar
  17. 17.
    Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Annals Stat 32:407-499MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Elsner L, Koltracht L, Neumann M (1992) Convergence of sequential and asynchronous nonlinear paracontractions. Numerische Mathematik 62:305-319MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Fang S-C, Puthenpura S (1993) Linear Optimization and Extensions: Theory and Algorithms. Prentice-Hall, Englewood Cliffs, NJMATHGoogle Scholar
  20. 20.
    Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proceedings American Control Conference 3:2156-2162Google Scholar
  21. 21.
    Ferguson TS (1996) A Course in Large Sample Theory. Chapman & Hall, LondonMATHGoogle Scholar
  22. 22.
    Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Review 44:523-597CrossRefMathSciNetGoogle Scholar
  23. 23.
    Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302-332MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Stanford University Department of StatisticsGoogle Scholar
  25. 25.
    Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comp Graph Stat 7:397-416CrossRefGoogle Scholar
  26. 26.
    Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. Studies in Classification, Data Analysis, and Knowledge Organization, Lenz HJ, Decker R, Springer, Heidelberg-Berlin, pp 149-161Google Scholar
  27. 27.
    Hestenes MR (1981) Optimization Theory: The Finite Dimensional Case. Robert E Krieger Publishing, Huntington, NYGoogle Scholar
  28. 28.
    Hunter DR, Lange K (2004) A tutorial on MM algorithms. Amer Statistician 58:30-37CrossRefMathSciNetGoogle Scholar
  29. 29.
    Hunter DR, Li R (2005) Variable selection using MM algorithms. Annals Stat 33:1617-1642MATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Lange K (1994) An adaptive barrier method for convex programming. Methods Applications Analysis 1:392-402MATHGoogle Scholar
  31. 31.
    Lange, K (2004) Optimization. Springer, New YorkMATHGoogle Scholar
  32. 32.
    Lange K, Wu T (2007) An MM algorithm for multicategory vertex discriminant analysis. J Computational Graphical Stat 17:527-544CrossRefMathSciNetGoogle Scholar
  33. 33.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788-791CrossRefGoogle Scholar
  34. 34.
    Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13:556-562Google Scholar
  35. 35.
    Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245-263MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Applied Signal Proc 2004:1762-1769CrossRefMathSciNetGoogle Scholar
  37. 37.
    Luenberger DG (1984) Linear and Nonlinear Programming, 2nd ed. Addison-Wesley, Reading, MAMATHGoogle Scholar
  38. 38.
    Meng X-L, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm, J Amer Stat Assoc 86: 899-909CrossRefGoogle Scholar
  39. 39.
    Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex in Rn. J Optimization Theory Applications 50:195-200MATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30-50MATHCrossRefGoogle Scholar
  41. 41.
    Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Applications 416:29-47MATHCrossRefMathSciNetGoogle Scholar
  42. 42.
    Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279-300MATHMathSciNetGoogle Scholar
  43. 43.
    Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307-1330MATHCrossRefMathSciNetGoogle Scholar
  44. 44.
    Silvey SD (1975) Statistical Inference. Chapman & Hall, LondonMATHGoogle Scholar
  45. 45.
    Ruszczynski A (2006) Nonlinear Optimization. Princeton University Press, Princeton, NJMATHGoogle Scholar
  46. 46.
    Schölkopf B, Smola AJ (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MAGoogle Scholar
  47. 47.
    Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ1 norm. Geophysics 44:39-52CrossRefGoogle Scholar
  48. 48.
    Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Operations Research 17:670-690MATHCrossRefMathSciNetGoogle Scholar
  49. 49.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc, Series B 58:267-288MATHMathSciNetGoogle Scholar
  50. 50.
    Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, New YorkMATHGoogle Scholar
  51. 51.
    Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. Proceedings of the Sixth International Conference on Data Mining (ICDM’06). IEEE Computer Society, pp 690-700Google Scholar
  52. 52.
    Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148-159CrossRefGoogle Scholar
  53. 53.
    Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zeronorm with linear models and kernel methods. J Machine Learning Research 3:1439-1461MATHCrossRefGoogle Scholar
  54. 54.
    Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224-244MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer New York 2010

Authors and Affiliations

  1. 1.Departments of Biomathematics, Human Genetics, and Statistics David Geffen School of MedicineUniversity of California, Los AngelesLos AngelesUSA

Personalised recommendations