Abstract
This paper formulates the quadratic penalty function for the dual problem of the linear programming associated with the \(L_1\) constrained linear quantile regression model. We prove that the solution of the original linear programming can be obtained by minimizing the quadratic penalty function, with the formulas derived. The obtained quadratic penalty function has no constraint, thus could be minimized efficiently by a generalized Newton algorithm with Armijo step size. The resulting algorithm is easy to implement, without requiring any sophisticated optimization package other than a linear equation solver. The proposed approach can be generalized to the quantile regression model in reproducing kernel Hilbert space with slight modification. Extensive experiments on simulated data and real-world data show that, the proposed Newton quantile regression algorithms can achieve performance comparable to state-of-the-art.
This is a preview of subscription content, access via your institution.
Notes
The penalty function method in optimization explicitly absorbs the constraints, such that an unconstrained optimization problem is obtained.
This relation is easy to check: if \(a>0,\,(a)_+=a\) while \((-a)_+ = 0\); if \(a<0,\,(a)_+=0\) and \((-a)_+=-a\). Therefore, for any \(a\in {\mathbb {R}}\), we always have \((a)_+ - (-a)_+=a\).
The matrix identity is \(\left( {\mathbf{A}}+{\mathbf{UCV}} \right) ^{-1} = {\mathbf{A}}^{-1} - {\mathbf{A}}^{-1}{\mathbf{U}} \left( {\mathbf{C}}^{-1}+{\mathbf{V}}{\mathbf{A}}^{-1}{\mathbf{U}} \right) ^{-1} {\mathbf{V}}{\mathbf{A}}^{-1}\), where \(\mathbf{A},\,\mathbf U,\,\mathbf C\), and \(\mathbf V\) denote matrices of appropriate sizes. If \({\mathbf{A}}^{-1}\) is easy to calculate and \(\mathbf C\) has a much smaller dimension than \(\mathbf A\), using this formula is more efficient than inverting \(\mathbf{A} + \mathbf{UCV}\) directly. See Higham (2002) for more details.
Algorithm 1 is similar in spirit to an algorithm for Support Vector Machine studied in Fung and Mangasarian (2004), which is for pattern recognition problem, while our proposed algorithm is for regression problem.
In our implementation, which is based on the toolbox from Gunn (1997), we assign a very large value (e.g., 5,000) to the penalty parameter \(C\) in support vector regression.
In IP-QReg and MM-QReg algorithms, there is a step which needs to invert a \(p\times p\) matrix, whose rank is at most \(n\), where \(p\) is the dimensionality of the data, and \(n\) is the training set size. When the dimensionality is greater than the sample size (\(p>n\)), the matrix is not invertible, thus the downloaded software package for IP-QReg and MM-QReg gives error message. Therefore, the performances of IP-QReg and MM-QReg are not provided.
References
Armijo L (1966) Minimization of functions having Lipschitz-continuous first partial derivatives. Pac J Math 16:1–3
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont, MA
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York, NY
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553
Fung G, Mangasarian OL (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28(2):185–202
Gunn SR (1997) Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM
Higham N (2002) Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia, PA
Hiriart-Urruty J-B, Strodiot J-J, Nguyen VH (1984) Generalized Hessian matrix and second-order optimality conditions for problems with \(C^{1,1}\) data. Appl Math Optim 11(1):43–56
Hunter DR, Lange K (2000) Quantile regression via an MM Algorithm. J Comput Graph Stat 9(1):60–77
Hwang C, Shim J (2005) A simple quantile regression via support vector machine. In: Lecture notes in computer science, vol 3610, pp 512–520
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, MA
Kimeldorf GS, Wahba G (1970) Some results on Tchebycheffian spline functions. J Math Anal Appl 33(1):82–95
Koenker R (2005) Quantile regression. Cambridge University Press, New York, NY
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81:673–680
Koenker R, Park BJ (1996) An interior point algorithm for nonlinear quantile regression. J Econ 71:265–283
Langford J, Oliveira R, Zadrozny B (2006) Predicting conditional quantiles via reduction to classification. In: Proceedings of the uncertainty in artificical intelligence. Cambridge, MA
Li C, Wei Y, Chappell R, He X (2011) Bent line quantile regression with application to an allometric study of land Mammals’ speed and mass. Biometrics 67(1):242–249
Li Y, Liu Y, Zhu J (2007) Quantile regression in reproducing kernel Hilbert spaces. J Am Stat Assoc 102:255–268
Li Y, Zhu J (2008) \(L_1\)-norm quantile regression. J Comput Graph Stat 17(1):163–185
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530
Mangasarian OL, Meyer RR (1979) Nonlinear perturbation of linear programs. SIAM J Control Optim 17(6):745–752
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton, NJ
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Sohn I, Kim S, Hwang C, Lee JW (2008) New normalization methods using support vector machine quantile regression approach in microarray analysis. Comput Stat Data Anal 52(8):4104–4115
Sohn I, Kim S, Hwang C, Lee JW, Shim J (2008) Support vector machine quantile regression for detecting differentially expressed genes in microarray analysis. Methods Inf Med 47(5):459–467
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288
Takeuchi I, Le QV, Sears TD, Smola AJ (2006) Nonparametric quantile estimation. J Mach Learn Res 7:1231–1264
Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19:801–817
Yuan M (2006) GACV for quantile smoothing splines. Comput Stat Data Anal 50(3):813–829
Acknowledgments
The author would like to extend his sincere gratitude to the anonymous reviewers and editors for their constructive suggestions and comments, which have greatly helped improve the quality of this paper. This work was supported by a Faculty Research Grant from Missouri State University.
Author information
Authors and Affiliations
Corresponding author
Appendix: The proof of Eq. (25)
Appendix: The proof of Eq. (25)
If \(a>\lambda >0\), then
thus, from the definition of \((\cdot )_*\),
If \(a<-\lambda <0\), we have
thus,
If \(-\lambda <a<\lambda \), it implies that
thus,
In summary, it is checked for all the cases that \((a-\lambda )_*+(-a-\lambda )_* = (|a|-\lambda )_*\).
Rights and permissions
About this article
Cite this article
Zheng, S. A generalized Newton algorithm for quantile regression models. Comput Stat 29, 1403–1426 (2014). https://doi.org/10.1007/s00180-014-0498-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0498-x
Keywords
- Linear programming
- Quadratic penalty function
- Armijo step
- \(L_1\) constrained model