Skip to main content

A generalized Newton algorithm for quantile regression models

Abstract

This paper formulates the quadratic penalty function for the dual problem of the linear programming associated with the \(L_1\) constrained linear quantile regression model. We prove that the solution of the original linear programming can be obtained by minimizing the quadratic penalty function, with the formulas derived. The obtained quadratic penalty function has no constraint, thus could be minimized efficiently by a generalized Newton algorithm with Armijo step size. The resulting algorithm is easy to implement, without requiring any sophisticated optimization package other than a linear equation solver. The proposed approach can be generalized to the quantile regression model in reproducing kernel Hilbert space with slight modification. Extensive experiments on simulated data and real-world data show that, the proposed Newton quantile regression algorithms can achieve performance comparable to state-of-the-art.

This is a preview of subscription content, access via your institution.

Notes

  1. The quadratic penalty function was called asymptotic exterior penalty function in Fung and Mangasarian (2004) and Mangasarian (2006). However, we found that “quadratic penalty function” is a more standard terminology, see Ruszczyński (2006, Sect. 6.2.2) and Bertsekas (1999, Sect. 4.2.1).

  2. The penalty function method in optimization explicitly absorbs the constraints, such that an unconstrained optimization problem is obtained.

  3. This relation is easy to check: if \(a>0,\,(a)_+=a\) while \((-a)_+ = 0\); if \(a<0,\,(a)_+=0\) and \((-a)_+=-a\). Therefore, for any \(a\in {\mathbb {R}}\), we always have \((a)_+ - (-a)_+=a\).

  4. The matrix identity is \(\left( {\mathbf{A}}+{\mathbf{UCV}} \right) ^{-1} = {\mathbf{A}}^{-1} - {\mathbf{A}}^{-1}{\mathbf{U}} \left( {\mathbf{C}}^{-1}+{\mathbf{V}}{\mathbf{A}}^{-1}{\mathbf{U}} \right) ^{-1} {\mathbf{V}}{\mathbf{A}}^{-1}\), where \(\mathbf{A},\,\mathbf U,\,\mathbf C\), and \(\mathbf V\) denote matrices of appropriate sizes. If \({\mathbf{A}}^{-1}\) is easy to calculate and \(\mathbf C\) has a much smaller dimension than \(\mathbf A\), using this formula is more efficient than inverting \(\mathbf{A} + \mathbf{UCV}\) directly. See Higham (2002) for more details.

  5. Algorithm 1 is similar in spirit to an algorithm for Support Vector Machine studied in Fung and Mangasarian (2004), which is for pattern recognition problem, while our proposed algorithm is for regression problem.

  6. In our implementation, which is based on the toolbox from Gunn (1997), we assign a very large value (e.g., 5,000) to the penalty parameter \(C\) in support vector regression.

  7. In IP-QReg and MM-QReg algorithms, there is a step which needs to invert a \(p\times p\) matrix, whose rank is at most \(n\), where \(p\) is the dimensionality of the data, and \(n\) is the training set size. When the dimensionality is greater than the sample size (\(p>n\)), the matrix is not invertible, thus the downloaded software package for IP-QReg and MM-QReg gives error message. Therefore, the performances of IP-QReg and MM-QReg are not provided.

References

  • Armijo L (1966) Minimization of functions having Lipschitz-continuous first partial derivatives. Pac J Math 16:1–3

    Article  MATH  MathSciNet  Google Scholar 

  • Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont, MA

    MATH  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York, NY

    Book  MATH  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

    Article  Google Scholar 

  • Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553

    Article  Google Scholar 

  • Fung G, Mangasarian OL (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28(2):185–202

    Article  MATH  MathSciNet  Google Scholar 

  • Gunn SR (1997) Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM

  • Higham N (2002) Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA

    Book  MATH  Google Scholar 

  • Hiriart-Urruty J-B, Strodiot J-J, Nguyen VH (1984) Generalized Hessian matrix and second-order optimality conditions for problems with \(C^{1,1}\) data. Appl Math Optim 11(1):43–56

    Article  MATH  MathSciNet  Google Scholar 

  • Hunter DR, Lange K (2000) Quantile regression via an MM Algorithm. J Comput Graph Stat 9(1):60–77

    MathSciNet  Google Scholar 

  • Hwang C, Shim J (2005) A simple quantile regression via support vector machine. In: Lecture notes in computer science, vol 3610, pp 512–520

  • Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, MA

    Google Scholar 

  • Kimeldorf GS, Wahba G (1970) Some results on Tchebycheffian spline functions. J Math Anal Appl 33(1):82–95

    Article  MathSciNet  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, New York, NY

    Book  MATH  Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    Article  MATH  MathSciNet  Google Scholar 

  • Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81:673–680

    Article  MATH  MathSciNet  Google Scholar 

  • Koenker R, Park BJ (1996) An interior point algorithm for nonlinear quantile regression. J Econ 71:265–283

    Article  MATH  MathSciNet  Google Scholar 

  • Langford J, Oliveira R, Zadrozny B (2006) Predicting conditional quantiles via reduction to classification. In: Proceedings of the uncertainty in artificical intelligence. Cambridge, MA

  • Li C, Wei Y, Chappell R, He X (2011) Bent line quantile regression with application to an allometric study of land Mammals’ speed and mass. Biometrics 67(1):242–249

    Article  MATH  MathSciNet  Google Scholar 

  • Li Y, Liu Y, Zhu J (2007) Quantile regression in reproducing kernel Hilbert spaces. J Am Stat Assoc 102:255–268

    Article  MATH  MathSciNet  Google Scholar 

  • Li Y, Zhu J (2008) \(L_1\)-norm quantile regression. J Comput Graph Stat 17(1):163–185

    Article  Google Scholar 

  • Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530

    MATH  MathSciNet  Google Scholar 

  • Mangasarian OL, Meyer RR (1979) Nonlinear perturbation of linear programs. SIAM J Control Optim 17(6):745–752

    Article  MATH  MathSciNet  Google Scholar 

  • Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999

    MATH  MathSciNet  Google Scholar 

  • Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton, NJ

    MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Sohn I, Kim S, Hwang C, Lee JW (2008) New normalization methods using support vector machine quantile regression approach in microarray analysis. Comput Stat Data Anal 52(8):4104–4115

    Article  MATH  MathSciNet  Google Scholar 

  • Sohn I, Kim S, Hwang C, Lee JW, Shim J (2008) Support vector machine quantile regression for detecting differentially expressed genes in microarray analysis. Methods Inf Med 47(5):459–467

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    MATH  MathSciNet  Google Scholar 

  • Takeuchi I, Le QV, Sears TD, Smola AJ (2006) Nonparametric quantile estimation. J Mach Learn Res 7:1231–1264

    MATH  MathSciNet  Google Scholar 

  • Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19:801–817

    MATH  Google Scholar 

  • Yuan M (2006) GACV for quantile smoothing splines. Comput Stat Data Anal 50(3):813–829

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The author would like to extend his sincere gratitude to the anonymous reviewers and editors for their constructive suggestions and comments, which have greatly helped improve the quality of this paper. This work was supported by a Faculty Research Grant from Missouri State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songfeng Zheng.

Appendix: The proof of Eq. (25)

Appendix: The proof of Eq. (25)

If \(a>\lambda >0\), then

$$\begin{aligned} a-\lambda >0, \quad -a-\lambda <0,\quad \text {and}\quad |a|-\lambda >0, \end{aligned}$$

thus, from the definition of \((\cdot )_*\),

$$\begin{aligned} (a-\lambda )_*+(-a-\lambda )_* = 1 = (|a|-\lambda )_*. \end{aligned}$$

If \(a<-\lambda <0\), we have

$$\begin{aligned} a-\lambda <0, \quad -a-\lambda >0,\quad \text {and}\quad |a|-\lambda >0, \end{aligned}$$

thus,

$$\begin{aligned} (a-\lambda )_*+(-a-\lambda )_* = 1 = (|a|-\lambda )_*. \end{aligned}$$

If \(-\lambda <a<\lambda \), it implies that

$$\begin{aligned} a-\lambda <0, \quad -a-\lambda <0,\quad \text {and}\quad |a|-\lambda <0, \end{aligned}$$

thus,

$$\begin{aligned} (a-\lambda )_*+(-a-\lambda )_* = 0 = (|a|-\lambda )_*. \end{aligned}$$

In summary, it is checked for all the cases that \((a-\lambda )_*+(-a-\lambda )_* = (|a|-\lambda )_*\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, S. A generalized Newton algorithm for quantile regression models. Comput Stat 29, 1403–1426 (2014). https://doi.org/10.1007/s00180-014-0498-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-014-0498-x

Keywords

  • Linear programming
  • Quadratic penalty function
  • Armijo step
  • \(L_1\) constrained model