Abstract
It has been well documented that the presence of outliers and/or extreme data can strongly affect smoothing via splines. This work proposes an alternative for accommodating outliers in penalized splines considering the maximum penalized likelihood estimation under the class of scale mixture of normal distributions. This family of distributions has been an interesting alternative to produce robust estimates, keeping the elegancy and simplicity of the maximum likelihood theory. The aim of this paper is to apply a variant of the EM algorithm for computing efficiently the penalized maximum likelihood estimates in the context of penalized splines. To highlight some aspects of the robustness of the proposed penalized estimators we consider the assessment of influential observations through case deletion and local influence methods. Numerical experiments were carried out to illustrate the good performance of the proposed technique.
Similar content being viewed by others
References
Abramowitz, M., and Stegun, I. A. (1970). Handbook of mathematical functions. New York: Dover.
Andrews, D. F., and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B, 36, 99–102.
Billor, N., and Loynes, R. M. (1999). An application of the local influence approach to ridge regression. Journal of Applied Statistics, 26, 177–183.
Butler, R. J., McDonald, J. B., Nelson, R. D., and White, S. B. (1990). Robust and partially adaptive estimation of regression models. The Review of Economics and Statistics, 72, 321–327.
Cantoni, E., and Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines. Statistics and Computing, 11, 141–146.
Cook, R. D. (1986). Assessment of local influence (with discussion). Journal of the Royal Statistical Society, Series B, 48, 133–169.
Cook, R. D., and Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
Cook, R. D., Holschuh, N., and Weisberg, S. (1982). A note on an alternative outlier model. Journal of the Royal Statistical Society, Series B, 44, 370–376.
Craven, P., and Wahba, G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik, 31, 377–403.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.
Eilers, P. H. C., and Marx, B. D. (1996). Flexible smoothing using B-splines and penalties (with discussion). Statistical Science, 11, 89–121.
Eilers, P. H. C., and Marx, B. D. (2010). Splines, knots, and penalties. WIREs Computational Statistics, 2, 637–653. doi:10.1002/wics.125.
Elmi, A., Ratcliffe, S. J., Parry, S., and Guo, W. (2011). A B-spline based semiparametric nonlinear mixed effects model. Journal of Computational and Graphical Statistics, 20, 492–509.
Escobar, L. A., and Meeker, W. Q. (1992). Assessing influence in regression analysis with censored data. Biometrics, 48, 507–528.
Eubank, R. L. (1984). The hat matrix for smoothing splines. Statistics & Probability Letters, 2, 9–14.
Eubank, R. L. (1985). Diagnostics for smoothing splines. Journal of the Royal Statistical Society, Series B, 47, 332–341.
Eubank, R. L., and Gunst, R. F. (1986). Diagnostics for penalized least-squares estimators. Statistics & Probability Letters, 4, 265–272.
Fernández, C., and Steel, M. F. J. (1999). Multivariate Student-t regression models: Pitfalls and inference. Biometrika, 86, 153–167.
Green, P. J. (1990). On use of the EM algorithm for penalized likelihood. Journal of the Royal Statistical Society, Series B, 52, 443–452.
Gu, C. (1992). Cross-validating non-Gaussian data. Journal of Computational and Graphical Statistics, 1, 169–179.
Gu, C., and Xiang, D. (2001). Cross-validating non-gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581–591.
Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.
Huber, P. J. (1979). Robust smoothing. In R. L. Launer and G. N. Wilkinson (Eds.), Robustness in statistics (pp. 33–48). New York: Academic Press.
Ibacache-Pulgar, G., and Paula, G. A. (2011). Local influence for Student-t partially linear models. Computational Statistics & Data Analysis, 55, 1462–1478.
Jamshidian, M. (1999). Adaptive robust regression by using a nonlinear regression program. Journal of Statistical Software, 4(6), 1–25.
Kent, J. T., Tyler, D. E., and Vardi, Y. (1994). A curious likelihood identity for the multivariate t-distribution. Communications in Statistics: Simulation and Computation, 23, 441–453.
Kim, C. (1996). Cook’s distance in splines smoothing. Statistics & Probability Letters, 31, 139–144.
Kim, C., Park, B. U., and Kim, W. (2002). Influence diagnostics in semiparametric regression models. Statistics & Probability Letters, 60, 49–58.
Koenker, R., Ng, P., and Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81, 673–680.
Lange, K., and Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198.
Lange, K., Little, R. J. A., and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896.
Lee, J. S., and Cox, D. D. (2009). Robust smoothing: Smoothing parameter selection and applications to fluorescence spectroscopy. Computational Statistics & Data Analysis, 54, 3131–3143.
Lee, T. C. M., and Oh, H. S. (2007). Robust penalized regression spline fitting with application to additive mixed modeling. Computational Statistics, 22, 159–171.
Leinhardt, S., and Wasserman, S. S. (1979). Teaching regression: An exploratory approach. The American Statistician, 33, 196–203.
Lin, T. I., and Lee, J. C. (2006). A robust approach to t linear mixed models applied to multiple sclerosis data. Statistics in Medicine, 25, 1397–1412.
Little, R. J. A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.
Lucas, A. (1997). Robustness of the Student t based M-estimator. Communications in Statistics: Theory and Methods, 26, 1165–1182.
Manchester, L. (1996). Empirical influence for robust smoothing. Australian Journal of Statistics, 38, 275–290.
Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4, 51–67.
Mateos, G., and Giannakis, G. B. (2012). Robust nonparametric regression via sparsity control with application to load curve data cleansing. IEEE Transactions on Signal Processing, 60, 1571–1584.
McLachlan, G. L., and Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.
Meza, C., Osorio, F., and De la Cruz, R. (2012). Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing, 22, 121–139.
Moore, R. J. (1982). Algorithm AS 187: Derivatives of the incomplete gamma integral. Applied Statistics, 31, 330–335.
Oh, H. S., Brown, T., and Charbonneau, P. (2004). Period analysis of variable stars by robust smoothing. Applied Statistics, 53, 15–30.
Oh, H. S., Lee, J., and Kim, D. (2008). A recipe for robust estimation using pseudo data. Journal of the Korean Statistical Society, 37, 63–72.
O’Sullivan, F., Yandell, B. S., and Raynor, W. J. (1986). Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association, 81, 96–103.
Osorio, F. (2014). heavy: Package for robust estimation using heavy-tailed distributions. R package version 0.2-35. URL: CRAN.R-project.org/package=heavy.
Phillips, R. F. (2002). Least absolute deviations estimation via the EM algorithm. Statistics and Computing, 12, 281–285.
Pinheiro, J., Liu, C., and Wu, Y. (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics, 10, 249–276.
Poon, W., and Poon, Y. S. (1999). Conformal normal curvature and assessment of local influence. Journal of the Royal Statistical Society, Series B, 61, 51–61.
R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org.
Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11, 735–757.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression. Cambridge: Cambridge University Press.
Shi, L., and Wang, X. (1999). Local influence in ridge regression. Computational Statistics & Data Analysis, 31, 341–353.
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). Journal of the Royal Statistical Society, Series B, 47, 1–52.
Staudenmayer, J., Lake, E. E., and Wand, M. P. (2009). Robustness for general design mixed models using the t-distribution. Statistical Modelling, 9, 235–255.
Tharmaratnam, K., Claeskens, G., Croux, C., and Salibián-Barrera, M. (2010). S-estimation for penalized regression splines. Journal of Computational and Graphical Statistics, 19, 609–625.
Thomas, W. (1991). Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. Journal of the American Statistical Association, 86, 693–698.
Utreras, F. I. (1981). On computing robust splines and applications. SIAM Journal on Scientific and Statistical Computing, 2, 153–163.
Walker, E., and Birch, J. B. (1988). Influence measures in ridge regression. Technometrics, 30, 221–227.
Wei, W. H. (2004). Derivatives diagnostics and robustness for smoothing splines. Computational Statistics & Data Analysis, 46, 335–356.
Wei, W. H. (2005). The smoothing parameter, confidence interval and robustness for smoothing splines. Journal of Nonparametric Statistics, 17, 613–642.
Wei, B. C., and Shih, J. Q. (1994). On statistical models for regression diagnostics. Annals of the Institute of Statistical Mathematics, 46, 267–278.
Wei, B. C., Hu, Y. Q., and Fung, W. K. (1998). Generalized leverage and its applications. Scandinavian Journal of Statistics, 25, 25–37.
Xiang, D., and Wahba, G. (1996). A generalized approximate cross validation for smoothing splines with non-Gaussian data. Statistica Sinica, 6, 675–692.
Zhu, H., and Lee, S. Y. (2001). Local influence for incomplete-data models. Journal of the Royal Statistical Society, Series B, 63, 111–126.
Zhu, H., Lee, S. Y., Wei, B. C., and Zhou, J. (2001). Case-deletion measures for models with incomplete data. Biometrika, 88, 727–737.
Acknowledgments
I would like to thank the reviewers for their constructive comments, which helped to substantially improve this manuscript. I am grateful to Victor Leiva for his careful reading and comments on an earlier version of this paper. I also thank Ronny Vallejos and Patricio Videla for their valuable suggestions. The author was partially supported by Grants CONICYT 791100007 and FONDECYT 1140580.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Osorio, F. Influence diagnostics for robust P-splines using scale mixture of normal distributions. Ann Inst Stat Math 68, 589–619 (2016). https://doi.org/10.1007/s10463-015-0506-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-015-0506-0