Skip to main content

Influence diagnostics for robust P-splines using scale mixture of normal distributions

Abstract

It has been well documented that the presence of outliers and/or extreme data can strongly affect smoothing via splines. This work proposes an alternative for accommodating outliers in penalized splines considering the maximum penalized likelihood estimation under the class of scale mixture of normal distributions. This family of distributions has been an interesting alternative to produce robust estimates, keeping the elegancy and simplicity of the maximum likelihood theory. The aim of this paper is to apply a variant of the EM algorithm for computing efficiently the penalized maximum likelihood estimates in the context of penalized splines. To highlight some aspects of the robustness of the proposed penalized estimators we consider the assessment of influential observations through case deletion and local influence methods. Numerical experiments were carried out to illustrate the good performance of the proposed technique.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  • Abramowitz, M., and Stegun, I. A. (1970). Handbook of mathematical functions. New York: Dover.

  • Andrews, D. F., and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B, 36, 99–102.

  • Billor, N., and Loynes, R. M. (1999). An application of the local influence approach to ridge regression. Journal of Applied Statistics, 26, 177–183.

  • Butler, R. J., McDonald, J. B., Nelson, R. D., and White, S. B. (1990). Robust and partially adaptive estimation of regression models. The Review of Economics and Statistics, 72, 321–327.

  • Cantoni, E., and Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines. Statistics and Computing, 11, 141–146.

  • Cook, R. D. (1986). Assessment of local influence (with discussion). Journal of the Royal Statistical Society, Series B, 48, 133–169.

    MathSciNet  MATH  Google Scholar 

  • Cook, R. D., and Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.

  • Cook, R. D., Holschuh, N., and Weisberg, S. (1982). A note on an alternative outlier model. Journal of the Royal Statistical Society, Series B, 44, 370–376.

  • Craven, P., and Wahba, G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik, 31, 377–403.

  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.

  • Eilers, P. H. C., and Marx, B. D. (1996). Flexible smoothing using B-splines and penalties (with discussion). Statistical Science, 11, 89–121.

  • Eilers, P. H. C., and Marx, B. D. (2010). Splines, knots, and penalties. WIREs Computational Statistics, 2, 637–653. doi:10.1002/wics.125.

  • Elmi, A., Ratcliffe, S. J., Parry, S., and Guo, W. (2011). A B-spline based semiparametric nonlinear mixed effects model. Journal of Computational and Graphical Statistics, 20, 492–509.

  • Escobar, L. A., and Meeker, W. Q. (1992). Assessing influence in regression analysis with censored data. Biometrics, 48, 507–528.

  • Eubank, R. L. (1984). The hat matrix for smoothing splines. Statistics & Probability Letters, 2, 9–14.

    MathSciNet  Article  MATH  Google Scholar 

  • Eubank, R. L. (1985). Diagnostics for smoothing splines. Journal of the Royal Statistical Society, Series B, 47, 332–341.

    MathSciNet  MATH  Google Scholar 

  • Eubank, R. L., and Gunst, R. F. (1986). Diagnostics for penalized least-squares estimators. Statistics & Probability Letters, 4, 265–272.

  • Fernández, C., and Steel, M. F. J. (1999). Multivariate Student-t regression models: Pitfalls and inference. Biometrika, 86, 153–167.

  • Green, P. J. (1990). On use of the EM algorithm for penalized likelihood. Journal of the Royal Statistical Society, Series B, 52, 443–452.

    MathSciNet  MATH  Google Scholar 

  • Gu, C. (1992). Cross-validating non-Gaussian data. Journal of Computational and Graphical Statistics, 1, 169–179.

    Google Scholar 

  • Gu, C., and Xiang, D. (2001). Cross-validating non-gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581–591.

  • Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.

  • Huber, P. J. (1979). Robust smoothing. In R. L. Launer and G. N. Wilkinson (Eds.), Robustness in statistics (pp. 33–48). New York: Academic Press.

  • Ibacache-Pulgar, G., and Paula, G. A. (2011). Local influence for Student-t partially linear models. Computational Statistics & Data Analysis, 55, 1462–1478.

  • Jamshidian, M. (1999). Adaptive robust regression by using a nonlinear regression program. Journal of Statistical Software, 4(6), 1–25.

    Article  Google Scholar 

  • Kent, J. T., Tyler, D. E., and Vardi, Y. (1994). A curious likelihood identity for the multivariate t-distribution. Communications in Statistics: Simulation and Computation, 23, 441–453.

  • Kim, C. (1996). Cook’s distance in splines smoothing. Statistics & Probability Letters, 31, 139–144.

    MathSciNet  Article  MATH  Google Scholar 

  • Kim, C., Park, B. U., and Kim, W. (2002). Influence diagnostics in semiparametric regression models. Statistics & Probability Letters, 60, 49–58.

  • Koenker, R., Ng, P., and Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81, 673–680.

  • Lange, K., and Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198.

  • Lange, K., Little, R. J. A., and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896.

  • Lee, J. S., and Cox, D. D. (2009). Robust smoothing: Smoothing parameter selection and applications to fluorescence spectroscopy. Computational Statistics & Data Analysis, 54, 3131–3143.

  • Lee, T. C. M., and Oh, H. S. (2007). Robust penalized regression spline fitting with application to additive mixed modeling. Computational Statistics, 22, 159–171.

  • Leinhardt, S., and Wasserman, S. S. (1979). Teaching regression: An exploratory approach. The American Statistician, 33, 196–203.

  • Lin, T. I., and Lee, J. C. (2006). A robust approach to t linear mixed models applied to multiple sclerosis data. Statistics in Medicine, 25, 1397–1412.

  • Little, R. J. A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.

    MathSciNet  Article  MATH  Google Scholar 

  • Lucas, A. (1997). Robustness of the Student t based M-estimator. Communications in Statistics: Theory and Methods, 26, 1165–1182.

    MathSciNet  Article  MATH  Google Scholar 

  • Manchester, L. (1996). Empirical influence for robust smoothing. Australian Journal of Statistics, 38, 275–290.

    MathSciNet  Article  MATH  Google Scholar 

  • Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4, 51–67.

    MathSciNet  Article  MATH  Google Scholar 

  • Mateos, G., and Giannakis, G. B. (2012). Robust nonparametric regression via sparsity control with application to load curve data cleansing. IEEE Transactions on Signal Processing, 60, 1571–1584.

  • McLachlan, G. L., and Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.

  • Meza, C., Osorio, F., and De la Cruz, R. (2012). Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing, 22, 121–139.

  • Moore, R. J. (1982). Algorithm AS 187: Derivatives of the incomplete gamma integral. Applied Statistics, 31, 330–335.

    Article  MATH  Google Scholar 

  • Oh, H. S., Brown, T., and Charbonneau, P. (2004). Period analysis of variable stars by robust smoothing. Applied Statistics, 53, 15–30.

  • Oh, H. S., Lee, J., and Kim, D. (2008). A recipe for robust estimation using pseudo data. Journal of the Korean Statistical Society, 37, 63–72.

  • O’Sullivan, F., Yandell, B. S., and Raynor, W. J. (1986). Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association, 81, 96–103.

  • Osorio, F. (2014). heavy: Package for robust estimation using heavy-tailed distributions. R package version 0.2-35. URL: CRAN.R-project.org/package=heavy.

  • Phillips, R. F. (2002). Least absolute deviations estimation via the EM algorithm. Statistics and Computing, 12, 281–285.

    MathSciNet  Article  Google Scholar 

  • Pinheiro, J., Liu, C., and Wu, Y. (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics, 10, 249–276.

  • Poon, W., and Poon, Y. S. (1999). Conformal normal curvature and assessment of local influence. Journal of the Royal Statistical Society, Series B, 61, 51–61.

  • R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org.

  • Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11, 735–757.

    MathSciNet  Article  Google Scholar 

  • Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression. Cambridge: Cambridge University Press.

  • Shi, L., and Wang, X. (1999). Local influence in ridge regression. Computational Statistics & Data Analysis, 31, 341–353.

  • Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). Journal of the Royal Statistical Society, Series B, 47, 1–52.

    MathSciNet  MATH  Google Scholar 

  • Staudenmayer, J., Lake, E. E., and Wand, M. P. (2009). Robustness for general design mixed models using the t-distribution. Statistical Modelling, 9, 235–255.

  • Tharmaratnam, K., Claeskens, G., Croux, C., and Salibián-Barrera, M. (2010). S-estimation for penalized regression splines. Journal of Computational and Graphical Statistics, 19, 609–625.

  • Thomas, W. (1991). Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. Journal of the American Statistical Association, 86, 693–698.

    MathSciNet  Article  Google Scholar 

  • Utreras, F. I. (1981). On computing robust splines and applications. SIAM Journal on Scientific and Statistical Computing, 2, 153–163.

    MathSciNet  Article  MATH  Google Scholar 

  • Walker, E., and Birch, J. B. (1988). Influence measures in ridge regression. Technometrics, 30, 221–227.

  • Wei, W. H. (2004). Derivatives diagnostics and robustness for smoothing splines. Computational Statistics & Data Analysis, 46, 335–356.

    MathSciNet  Article  MATH  Google Scholar 

  • Wei, W. H. (2005). The smoothing parameter, confidence interval and robustness for smoothing splines. Journal of Nonparametric Statistics, 17, 613–642.

    MathSciNet  Article  MATH  Google Scholar 

  • Wei, B. C., and Shih, J. Q. (1994). On statistical models for regression diagnostics. Annals of the Institute of Statistical Mathematics, 46, 267–278.

  • Wei, B. C., Hu, Y. Q., and Fung, W. K. (1998). Generalized leverage and its applications. Scandinavian Journal of Statistics, 25, 25–37.

  • Xiang, D., and Wahba, G. (1996). A generalized approximate cross validation for smoothing splines with non-Gaussian data. Statistica Sinica, 6, 675–692.

  • Zhu, H., and Lee, S. Y. (2001). Local influence for incomplete-data models. Journal of the Royal Statistical Society, Series B, 63, 111–126.

  • Zhu, H., Lee, S. Y., Wei, B. C., and Zhou, J. (2001). Case-deletion measures for models with incomplete data. Biometrika, 88, 727–737.

Download references

Acknowledgments

I would like to thank the reviewers for their constructive comments, which helped to substantially improve this manuscript. I am grateful to Victor Leiva for his careful reading and comments on an earlier version of this paper. I also thank Ronny Vallejos and Patricio Videla for their valuable suggestions. The author was partially supported by Grants CONICYT 791100007 and FONDECYT 1140580.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Osorio.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 875 KB)

About this article

Verify currency and authenticity via CrossMark

Cite this article

Osorio, F. Influence diagnostics for robust P-splines using scale mixture of normal distributions. Ann Inst Stat Math 68, 589–619 (2016). https://doi.org/10.1007/s10463-015-0506-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-015-0506-0

Keywords

  • Cook distance
  • Local influence
  • Penalized EM algorithm
  • Scale mixtures of normal distributions