Abstract
The penalized maximum likelihood estimator (PMLE) has been widely used for variable selection in high-dimensional data. Various penalty functions have been employed for this purpose, e.g., Lasso, weighted Lasso, or smoothly clipped absolute deviations. However, the PMLE can be very sensitive to outliers in the data, especially to outliers in the covariates (leverage points). In order to overcome this disadvantage, the usage of the penalized maximum trimmed likelihood estimator (PMTLE) is proposed to estimate the unknown parameters in a robust way. The computation of the PMTLE takes advantage of the same technology as used for PMLE but here the estimation is based on subsamples only. The breakdown point properties of the PMTLE are discussed using the notion of \(d\)-fullness. The performance of the proposed estimator is evaluated in a simulation study for the classical multiple linear and Poisson linear regression models.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression. Ann Appl Stat 7:226–248. doi:10.1214/12-AOAS575
Antoniadis A, Gijbels I, Nikolova M (2011) Penalized likelihood regression for generalized linear models with non-quadratic penalties. Ann Inst Stat Math 63:585–615. doi:10.1007/s10463-009-0242-4
Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5:232–253. doi:10.1214/10-AOAS388
Bühlmann P, van der Geer S (2011) Statistics for high dimensional data: methods theory and applications. Springer, New York
Čížek P (2008) General trimmed estimation: robust approach to nonlinear and limited dependent variable models. Econom Theory 24:1500–1529
Čížek P (2010) Reweighted least trimmed squares: an alternative to one-step estimators. Center Discussion Paper 2010/91, Tilburg University, The Netherlands
Croux C, Haesbroeck G (2010) Robust scatter regularization. Compstat, Book of Abstracts, Paris: Conservatoire National des Arts et Metiers (CNAM) and the French National Institute for Research in Computer Science and Control (INRIA)
Dimova R, Neykov NM (2004) Generalized d-fullness technique for breakdown point study of the trimmed likelihood estimator with applications. In: Hubert M, Pison G, Struyf A, Van Aelst S (eds) Theory and applications of recent robust methods. Birkhäuser, Basel, pp 83–91
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Statist 32:407–499
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J Royal Stat Soc B 70:849–911
Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20:101–148
Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional variable selection: beyond the linear model. J Mach Learn Res 10:1829–1853
Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. Ann Stat 38:3567–3604
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics 35:109–148
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22. http://www.jstatsoft.org/v33/i01/
Gervini D, Yohai VJ (2002) A class of robust and fully efficient regression estimators. Ann Stat 30:583–616
Green PJ (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. J Royal Stat Soc Ser B 46:149–192
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York
Huber PJ (1981) Robust statistics. Wiley, New York
Khan JA, Van Aelst S, Zamar RH (2007) Robust linear model selection based on least angle regression. J Am Stat Assoc 102:1289–1299
Markatou M, Basu A, Lindsay B (1997) Weighted likelihood estimating equations: the discrete case with applications to logistic regression. J Stat Plan Inference 57:215–232
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, New York
Mizera I, Müller CH (1999) Breakdown points and variation exponents of robust M-estimators in linear models. Ann Statist 27:1164–1177
Müller CH (1995) Breakdown points for designed experiments. J Stat Plan Inference 45:413–427
Müller CH, Neykov NM (2003) Breakdown points of the trimmed likelihood and related estimators in generalized linear models. J Stat Plan Inference 116:503–519
Neykov NM, Neytchev P (1990) A robust alternative of the maximum likelihood estimators. COMPSTAT’90—Short Communications, Dubrovnik, Yugoslavia, pp 99–100
Neykov NM, Müller CH (2003) Breakdown point and computation of trimmed likelihood estimators in generalized linear models. In: Dutter R, Filzmoser P, Gather U, Rousseeuw PJ (eds) Developments in robust statistics. Physica-Verlag, Heidelberg, pp 277–286
Neykov NM, Filzmoser P, Neytchev PN (2012a) Robust joint modeling of mean and dispersion through trimming. Comput Stat Data Anal 56:34–48. doi:10.1016/j.csda.2011.07.007
Neykov NM, Cizek P, Filzmoser P, Neytchev PN (2012b) The least trimmed quantile regression. Comput Stat Data Anal 56:1757–1770. doi:10.1016/j.csda.2011.10.023
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:851–857
Rousseeuw PJ, Van Driessen K (1999) Computing least trimmed of squares regression for large data sets. Estadistica 54:163–190
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Royal Stat Soc Ser B 58:267–288
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395
Vandev DL, Neykov NM (1993) Robust maximum likelihood in the Gaussian case. In: Ronchetti E, Stahel WA (eds) New directions in data analysis and robustness. Birkhäuser Verlag, Basel, pp 259–264
Vandev DL, Neykov NM (1998) About regression estimators with high breakdown point. Statistics 32:111–129
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-lasso. J Bus & Econ Stat 25:347–355
Zhang CH (2008) Discussion of one-step sparse estimates in nonconcave penalized likelihood models by H. Zou and R. Li. Ann Stat 36:1553–1560
Zou H (2006) The Adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann Stat 36:1509–1533
Acknowledgments
The authors are thankful to the Vienna University of Technology and the ESF (COST Action IC0702) for supporting the stay of N. Neykov and P. Neytchev in Vienna.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Neykov, N.M., Filzmoser, P. & Neytchev, P.N. Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Papers 55, 187–207 (2014). https://doi.org/10.1007/s00362-013-0516-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0516-z