Skip to main content
Log in

Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

Outlier detection has become an important and challenging issue in high-dimensional data analysis due to the coexistence of data contamination and high-dimensionality. Most existing widely used penalized least squares methods are sensitive to outliers due to the l2 loss. In this paper, we proposed a Robust Moderately Clipped LASSO (RMCL) estimator, that performs simultaneous outlier detection, variable selection and robust estimation. The RMCL estimator can be efficiently solved using the coordinate descent algorithm in a convex-concave procedure. Our numerical studies demonstrate that the RMCL estimator possesses superiority in both variable selection and outlier detection and thus can be advantageous in difficult prediction problems with data contamination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1

Similar content being viewed by others

References

  • Alfons, A., Croux, C. and Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals of Applied Statistics 7, 226–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association96, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics 1, 302–332.

    Article  MathSciNet  MATH  Google Scholar 

  • Gao, X. and Huang, J. (2010). Asymptotic analysis of high-dimensional LAD regression with lasso. Statistica Sinica 20, 1485–1506.

    MathSciNet  MATH  Google Scholar 

  • Gijbels, I., Leuven, K. and Vrinssen, I. (2015). Robust nonnegative garrote variable selection in linear regression. Computational Statistics & Data Analysis 85, 1–22.

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986). Robust statistics. Wiley Online Library, New York.

    MATH  Google Scholar 

  • Huber, P.J. (1992). Robust estimation of a location parameter, in Breakthroughs in statistics. Springer, Berlin, p. 492–518.

    Google Scholar 

  • Kong, D., Bondell, H.D. and Wu, Y. (2018). Fully efficient robust estimation, outlier detection and variable selection via penalized regression. Statistica Sinica 28, 1031–1052.

    MathSciNet  MATH  Google Scholar 

  • Kwon, S., Lee, S. and Kim, Y. (2015). Moderately clipped lasso. Computational Statistics & Data Analysis 92, 53–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Lambert-Lacroix, S., Zwald, L. et al (2011). Robust regression through the huber’s criterion and adaptive lasso penalty. Electronic Journal of Statistics5, 1015–1053.

    Article  MathSciNet  MATH  Google Scholar 

  • Luo, B. and Gao, X. (2021). A high-dimensional m-estimator framework for bi-level variable selection. Annals of the Institute of Statistical Mathematics 1–21.

  • Ma, S. and Wu, C. (2014). A selective review of robust variable selection with applications in bioinformatics, Briefings in Bioinformatics (1). https://doi.org/10.1093/bib/bbu046.

  • Nguyen, N.H. and Tran, T.D. (2012). Robust lasso with missing and grossly corrupted observations. IEEE Transactions on Information Theory 59, 2036–2058.

    Article  MathSciNet  MATH  Google Scholar 

  • Oshima, R.G., Baribault, H. and Caulín, C. (1996). Oncogenic regulation and function of keratins 8 and 18. Cancer and Metastasis Reviews 15, 4, 445–471.

    Article  Google Scholar 

  • Shankavaram, U.T., Reinhold, W.C., Nishizuka, S., Major, S., Morita, D., Chary, K.K., Reimers, M.A., Scherf, U., Kahn, A., Dolginow, D., Cossman, J., Kaldjian, E.P., Scudiero, D.A., Petricoin, E., Liotta, L., Lee, J.K. and Weinstein, J.N. (2007). Transcript and protein expression profiles of the NCI-60 cancer cell panel: An integromic microarray study. Molecular Cancer Therapeutics6, 820–832.

    Article  Google Scholar 

  • She, Y. and Owen, A. (2011). Outlier detection using nonconvex penalize regression. J. Amer. Stat. Assoc. 106, 626–639.

    Article  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Wang, L. (2013). The 1 penalized LAD estimator for high dimensional linear regression. Journal of Multivariate Analysis 120, 135–151.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, X., Jiang, Y., Huang, M. and Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association 108, 632–643.

    Article  MathSciNet  MATH  Google Scholar 

  • Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S. and Ma, Y. (2008). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 210–227.

    Article  Google Scholar 

  • Yuille, A.L. and Rangarajan, A. (2003). The concave-convex procedure. Neural Computation 15, 915–936.

    Article  MATH  Google Scholar 

  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894–942.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, C.-H., Zhang, T. et al (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science 27, 576–593.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2567.

    MathSciNet  MATH  Google Scholar 

  • Zhao, X., Zhao, K., Gao, X. and Deng, J. (2015). Leveraging big data analytics to investigate online sellers’ pricing strategies, Web 2015, 14th Workshop on e-Business.

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of The American Statistical Association 101, 1418–1429.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

Xiaoli Gao is partially supported by Simons Foundation (SF359337) and Simons Foundation (SF854940).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoli Gao.

Ethics declarations

Conflict of Interests

The corresponding author states that there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Partially supported by Simons Foundation (SF359337) and Simons Foundation (SF854940).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Y., Luo, B. & Gao, X. Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection. Sankhya B 84, 694–707 (2022). https://doi.org/10.1007/s13571-022-00279-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-022-00279-0

Keywords

PACS Nos

Navigation