Abstract
We propose a novel double sparse regularized modelling paradigm for Generalized Linear Model in high-dimensional setting, where we allow the underlying population distribution to be possibly heavy-tailed or heteroscedastic. Differing from the existing approaches that seek to find a robust estimate against potential heterogeneity in the training samples via robustification mechanisms, the proposed regularized modelling paradigm can identify and quantify the heterogeneous-specific effect simultaneously, and it contributes to a more consistent and optimal estimate, and is flexible enough to accommodate latent heterogeneity both in individual and subgroup levels. The proposed method has three popular applications, i.e., heterogeneous analysis, outlier detection and image restoration. We devise an efficient learning algorithm, referring to Proximal Alternating Linearized Minimization (PALM), to implement the proposed approach. The PALM algorithm proceeds by leveraging Linearization and Alternation Minimization techniques, and works well in general regularized (possibly nonconvex) bi-block optimizations. We discuss the Algorithmic Generalization Ability of the PALM for the regularized generalized linear regression, which demonstrates the asymptotic consistency between iterative sequences and true parameter vectors with a high probability guarantee. The computational efficiency, performance in estimation, and predictive validation are empirically verified with several simulations and real data applications, and results indicate that the proposed approach is competitive with the existing state-of-the-art methods.
Similar content being viewed by others
Data Availability and Access
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Wright J (2009) Yang AY, Ganesh A, Sastry SS, Ma Y: Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Yan M (2013) Restoration of images corrupted by impulse noise and mixed gaussian impulse noise using blind inpainting. SIAM J Imaging Sci 6(3):1227–1245
Li H, Liu Y, Luo Y (2020) Double penalized quantile regression for the linear mixed effects model. J Syst Sci Complex 33:2080–2102
Nguyen NH, Tran TD (2013) Robust lasso with missing and grossly corrupted observations. J Am Stat Assoc 59(4):2036–2058
Katayama S, Fujisawa H (2017) Sparse and robust linear regression: An optimization algorithm and its statistical properties. Stat Sinica 27(3):1243–1264
Vasaikar SV, Savage AK, Gong Q, Swanson E, Talla A, Lord C, Heubeck AT, Reading J, Graybuck LT, Meijer P et al (2023) A comprehensive platform for analyzing longitudinal multi-omics data. Nat Commun 14(1):1684
Li Y, Sun H (2023) Safe sample screening for robust twin support vector machine. Appl Intell 1–17
Liu L, Chen L, Chen CP, Tang YY et al (2016) Weighted joint sparse representation for removing mixed noise in image. IEEE transactions on cybernetics 47(3):600–611
Shao-hong Y, Jia-yang N, Tai-long C, Qiu-tong L, Cen Y, Jia-qing C, Zhi-zhen F, Jie L (2022) Location algorithm of transfer stations based on density peak and outlier detection. Appl Intell 52(12):13520–13532
She Y, Wang Z, Shen J (2022) Gaining outlier resistance with progressive quantiles: fast algorithms and theoretical studies. J Am Stat Assoc 117(539):1282–1295
Cai S, Li L, Li Q, Li S, Hao S, Sun R (2020) Uwfp-outlier: an efficient frequent-pattern-based outlier detection method for uncertain weighted data streams. Appl Intell 50:3452–3470
Li J, Shi P, Hu Q, Zhang Y (2023) Qgore: Quadratic-time guaranteed outlier removal for point cloud registration. IEEE Trans Pattern Anal Mach Intell 1–16
Ali U, Choi J, Min K, Choi Y-K, Mahmood MT (2023) Boundary-constrained robust regularization for single image dehazing. Pattern Recogn 140:109522
Xiong Z-Y, Gao Q-Q, Gao Q, Zhang Y-F, Li L-T, Zhang M (2022) Add: a new average divergence difference-based outlier detection method with skewed distribution of data objects. Appl Intell 1–25
Li M, Kong L (2019) Double fused lasso penalized lad for matrix regression. Appl Math Comput 357:119–138
Li M, Kong L, Su Z (2021) Double fused lasso regularized regression with both matrix and vector valued predictors. Electron J Stat 15:1909–1950
Li M, Guo Q, Zhai W, Chen B (2020) The linearized alternating direction method of multipliers for low-rank and fused lasso matrix regression model. J Appl Stat 47:2623–2640
He Q, Kong L, Wang Y, Wang S, Chan TA, Holland E (2016) Regularized quantile regression under heterogeneous sparsity with application to quantitative genetic traits. Comput Stat Data Anal 95:222–239
Yu D, Zhang L, Mizera I, Jiang B, Kong L (2019) Sparse wavelet estimation in quantile regression with multiple functional predictors. Comput Stat Data Anal 136:12–29
Shi K, Li L (2013) High performance genetic algorithm based text clustering using parts of speech and outlier elimination. Appl Intell 38:511–519
Yu Q, Luo Y, Chen C, Ding X (2016) Outlier-eliminated k-means clustering algorithm based on differential privacy preservation. Appl Intell 45:1179–1191
Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program 146(1–2):459–494
Beck A (2017) First-order methods in optimization. Society for Industrial and Applied Mathematics, Philadelphia
Beck A (2009) Teboulle M : A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Li H, Lin Z (2015) Accelerated proximal gradient methods for nonconvex programming. Adv Neural Inf Process Syst 28:379–387
Loh P-L, Wainwright MJ (2015) Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J Mach Learn Res 16(19):559–616
She Y, Wang Z, Shen J (2021) Gaining outlier resistance with progressive quantiles: Fast algorithms and theoretical studies. J Am Stat Assoc. https://doi.org/10.1080/01621459.2020.1850460
Huang J, Jiao Y, Liu Y, Lu X (2018) A constructive approach to \(l_0\) penalized regression. J Mach Learn Res 19(10):1–37
Jing L, Cosman PC, Rao BD (2018) Robust linear regression via \(\ell _0\) regularization. IEEE Trans Signal Process 66(3):698–713
Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman and Hall, New York
She Y, Owen AB (2011) Outlier detection using nonconvex penalized regression. J Am Stat Assoc 106(494):626–639
Rockafellar RT, Wets RJ-B (2009) Variational analysis. Springer Science and Business Media
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann. Appl Stat 1(2):302–332
Attouch H, Bolte J, Svaiter BF (2013) Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Math Program 137:91–129
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Wang Z, Liu H, Zhang T (2014) Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann Stat 42(6):2164–2201
Fan J, Fan Y, Barut E (2014) Adaptive robust variable selection. Ann Stat 42:324–351
Kallummil S, Kalyani S (2019) Noise statistics oblivious gard for robust regression with sparse outliers. IEEE Trans Signal Process 67(2):383–398
Li Y, Zhu J (2008) \(l_1\)-norm quantile regression. J Comput Graph Stat 17(1):163–185
Cassotti M, Ballabio D, Todeschini R, Consonni V (2015) A similarity-based qsar model for predicting acute toxicity towards the fathead minnow (pimephales promelas). SAR QSAR Environ Res 26(3):217–243
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Acknowledgements
The authors would like to thank the editors and reviewers for their helpful suggestions and comments on this paper. We thank Dr. Yafei Wang, Prof. Xiaodong Yan, and Prof. Bei Jiang from the University of Alberta who provide comments that greatly improved the manuscript. We would like to acknowledge the support of the National Natural Science Foundation of China (12071022), the 111 Project of China (B16002), the State Scholarship Fund from the China Scholarship Council (No: 202007090162), and the support and resources from the Center for High-Performance Computing at Beijing Jiaotong University (http://hpc.bjtu.edu.cn).
Author information
Authors and Affiliations
Contributions
Mei Li(First Author): Conceptualization, Metho-dology, Software, Data Curation, Investigation, Formal Analysis, Writing-Original Draft; Lingchen Kong: Methodology, Writing-Review & Editing, Funding Acquisition, Supervision; Bo Pan: Visualization, Investigation, Writing-Review & Editing; Linglong Kong: Resources, Supervision, Writing-Review & Editing.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no conflict of interest.
Ethical and Informed Consent for Data Used
The Ethical and informed consent for data used are not applicable for this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, M., Kong, L., Pan, B. et al. Algorithmic generalization ability of PALM for double sparse regularized regression. Appl Intell 53, 30566–30579 (2023). https://doi.org/10.1007/s10489-023-05031-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05031-3