Abstract
Outlier detection has become an important and challenging issue in high-dimensional data analysis due to the coexistence of data contamination and high-dimensionality. Most existing widely used penalized least squares methods are sensitive to outliers due to the l2 loss. In this paper, we proposed a Robust Moderately Clipped LASSO (RMCL) estimator, that performs simultaneous outlier detection, variable selection and robust estimation. The RMCL estimator can be efficiently solved using the coordinate descent algorithm in a convex-concave procedure. Our numerical studies demonstrate that the RMCL estimator possesses superiority in both variable selection and outlier detection and thus can be advantageous in difficult prediction problems with data contamination.
Similar content being viewed by others
References
Alfons, A., Croux, C. and Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals of Applied Statistics 7, 226–248.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association96, 1348–1360.
Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics 1, 302–332.
Gao, X. and Huang, J. (2010). Asymptotic analysis of high-dimensional LAD regression with lasso. Statistica Sinica 20, 1485–1506.
Gijbels, I., Leuven, K. and Vrinssen, I. (2015). Robust nonnegative garrote variable selection in linear regression. Computational Statistics & Data Analysis 85, 1–22.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986). Robust statistics. Wiley Online Library, New York.
Huber, P.J. (1992). Robust estimation of a location parameter, in Breakthroughs in statistics. Springer, Berlin, p. 492–518.
Kong, D., Bondell, H.D. and Wu, Y. (2018). Fully efficient robust estimation, outlier detection and variable selection via penalized regression. Statistica Sinica 28, 1031–1052.
Kwon, S., Lee, S. and Kim, Y. (2015). Moderately clipped lasso. Computational Statistics & Data Analysis 92, 53–67.
Lambert-Lacroix, S., Zwald, L. et al (2011). Robust regression through the huber’s criterion and adaptive lasso penalty. Electronic Journal of Statistics5, 1015–1053.
Luo, B. and Gao, X. (2021). A high-dimensional m-estimator framework for bi-level variable selection. Annals of the Institute of Statistical Mathematics 1–21.
Ma, S. and Wu, C. (2014). A selective review of robust variable selection with applications in bioinformatics, Briefings in Bioinformatics (1). https://doi.org/10.1093/bib/bbu046.
Nguyen, N.H. and Tran, T.D. (2012). Robust lasso with missing and grossly corrupted observations. IEEE Transactions on Information Theory 59, 2036–2058.
Oshima, R.G., Baribault, H. and Caulín, C. (1996). Oncogenic regulation and function of keratins 8 and 18. Cancer and Metastasis Reviews 15, 4, 445–471.
Shankavaram, U.T., Reinhold, W.C., Nishizuka, S., Major, S., Morita, D., Chary, K.K., Reimers, M.A., Scherf, U., Kahn, A., Dolginow, D., Cossman, J., Kaldjian, E.P., Scudiero, D.A., Petricoin, E., Liotta, L., Lee, J.K. and Weinstein, J.N. (2007). Transcript and protein expression profiles of the NCI-60 cancer cell panel: An integromic microarray study. Molecular Cancer Therapeutics6, 820–832.
She, Y. and Owen, A. (2011). Outlier detection using nonconvex penalize regression. J. Amer. Stat. Assoc. 106, 626–639.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.
Wang, L. (2013). The ℓ1 penalized LAD estimator for high dimensional linear regression. Journal of Multivariate Analysis 120, 135–151.
Wang, X., Jiang, Y., Huang, M. and Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association 108, 632–643.
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S. and Ma, Y. (2008). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 210–227.
Yuille, A.L. and Rangarajan, A. (2003). The concave-convex procedure. Neural Computation 15, 915–936.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894–942.
Zhang, C.-H., Zhang, T. et al (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science 27, 576–593.
Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2567.
Zhao, X., Zhao, K., Gao, X. and Deng, J. (2015). Leveraging big data analytics to investigate online sellers’ pricing strategies, Web 2015, 14th Workshop on e-Business.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of The American Statistical Association 101, 1418–1429.
Funding
Xiaoli Gao is partially supported by Simons Foundation (SF359337) and Simons Foundation (SF854940).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The corresponding author states that there is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Partially supported by Simons Foundation (SF359337) and Simons Foundation (SF854940).
Rights and permissions
About this article
Cite this article
Peng, Y., Luo, B. & Gao, X. Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection. Sankhya B 84, 694–707 (2022). https://doi.org/10.1007/s13571-022-00279-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-022-00279-0