Influence diagnostics for robust P-splines using scale mixture of normal distributions

Article

Abstract

It has been well documented that the presence of outliers and/or extreme data can strongly affect smoothing via splines. This work proposes an alternative for accommodating outliers in penalized splines considering the maximum penalized likelihood estimation under the class of scale mixture of normal distributions. This family of distributions has been an interesting alternative to produce robust estimates, keeping the elegancy and simplicity of the maximum likelihood theory. The aim of this paper is to apply a variant of the EM algorithm for computing efficiently the penalized maximum likelihood estimates in the context of penalized splines. To highlight some aspects of the robustness of the proposed penalized estimators we consider the assessment of influential observations through case deletion and local influence methods. Numerical experiments were carried out to illustrate the good performance of the proposed technique.

Keywords

Cook distance Local influence Penalized EM algorithm  Scale mixtures of normal distributions 

Supplementary material

10463_2015_506_MOESM1_ESM.pdf (876 kb)
Supplementary material 1 (pdf 875 KB)

References

  1. Abramowitz, M., and Stegun, I. A. (1970). Handbook of mathematical functions. New York: Dover.Google Scholar
  2. Andrews, D. F., and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B, 36, 99–102.Google Scholar
  3. Billor, N., and Loynes, R. M. (1999). An application of the local influence approach to ridge regression. Journal of Applied Statistics, 26, 177–183.Google Scholar
  4. Butler, R. J., McDonald, J. B., Nelson, R. D., and White, S. B. (1990). Robust and partially adaptive estimation of regression models. The Review of Economics and Statistics, 72, 321–327.Google Scholar
  5. Cantoni, E., and Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines. Statistics and Computing, 11, 141–146.Google Scholar
  6. Cook, R. D. (1986). Assessment of local influence (with discussion). Journal of the Royal Statistical Society, Series B, 48, 133–169.MathSciNetMATHGoogle Scholar
  7. Cook, R. D., and Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.Google Scholar
  8. Cook, R. D., Holschuh, N., and Weisberg, S. (1982). A note on an alternative outlier model. Journal of the Royal Statistical Society, Series B, 44, 370–376.Google Scholar
  9. Craven, P., and Wahba, G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik, 31, 377–403.Google Scholar
  10. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.Google Scholar
  11. Eilers, P. H. C., and Marx, B. D. (1996). Flexible smoothing using B-splines and penalties (with discussion). Statistical Science, 11, 89–121.Google Scholar
  12. Eilers, P. H. C., and Marx, B. D. (2010). Splines, knots, and penalties. WIREs Computational Statistics, 2, 637–653. doi:10.1002/wics.125.
  13. Elmi, A., Ratcliffe, S. J., Parry, S., and Guo, W. (2011). A B-spline based semiparametric nonlinear mixed effects model. Journal of Computational and Graphical Statistics, 20, 492–509.Google Scholar
  14. Escobar, L. A., and Meeker, W. Q. (1992). Assessing influence in regression analysis with censored data. Biometrics, 48, 507–528.Google Scholar
  15. Eubank, R. L. (1984). The hat matrix for smoothing splines. Statistics & Probability Letters, 2, 9–14.MathSciNetCrossRefMATHGoogle Scholar
  16. Eubank, R. L. (1985). Diagnostics for smoothing splines. Journal of the Royal Statistical Society, Series B, 47, 332–341.MathSciNetMATHGoogle Scholar
  17. Eubank, R. L., and Gunst, R. F. (1986). Diagnostics for penalized least-squares estimators. Statistics & Probability Letters, 4, 265–272.Google Scholar
  18. Fernández, C., and Steel, M. F. J. (1999). Multivariate Student-t regression models: Pitfalls and inference. Biometrika, 86, 153–167.Google Scholar
  19. Green, P. J. (1990). On use of the EM algorithm for penalized likelihood. Journal of the Royal Statistical Society, Series B, 52, 443–452.MathSciNetMATHGoogle Scholar
  20. Gu, C. (1992). Cross-validating non-Gaussian data. Journal of Computational and Graphical Statistics, 1, 169–179.Google Scholar
  21. Gu, C., and Xiang, D. (2001). Cross-validating non-gaussian data: Generalized approximate cross-validation revisited. Journal of Computational and Graphical Statistics, 10, 581–591.Google Scholar
  22. Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.Google Scholar
  23. Huber, P. J. (1979). Robust smoothing. In R. L. Launer and G. N. Wilkinson (Eds.), Robustness in statistics (pp. 33–48). New York: Academic Press.Google Scholar
  24. Ibacache-Pulgar, G., and Paula, G. A. (2011). Local influence for Student-t partially linear models. Computational Statistics & Data Analysis, 55, 1462–1478.Google Scholar
  25. Jamshidian, M. (1999). Adaptive robust regression by using a nonlinear regression program. Journal of Statistical Software, 4(6), 1–25.CrossRefGoogle Scholar
  26. Kent, J. T., Tyler, D. E., and Vardi, Y. (1994). A curious likelihood identity for the multivariate t-distribution. Communications in Statistics: Simulation and Computation, 23, 441–453.Google Scholar
  27. Kim, C. (1996). Cook’s distance in splines smoothing. Statistics & Probability Letters, 31, 139–144.MathSciNetCrossRefMATHGoogle Scholar
  28. Kim, C., Park, B. U., and Kim, W. (2002). Influence diagnostics in semiparametric regression models. Statistics & Probability Letters, 60, 49–58.Google Scholar
  29. Koenker, R., Ng, P., and Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81, 673–680.Google Scholar
  30. Lange, K., and Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198.Google Scholar
  31. Lange, K., Little, R. J. A., and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association, 84, 881–896.Google Scholar
  32. Lee, J. S., and Cox, D. D. (2009). Robust smoothing: Smoothing parameter selection and applications to fluorescence spectroscopy. Computational Statistics & Data Analysis, 54, 3131–3143.Google Scholar
  33. Lee, T. C. M., and Oh, H. S. (2007). Robust penalized regression spline fitting with application to additive mixed modeling. Computational Statistics, 22, 159–171.Google Scholar
  34. Leinhardt, S., and Wasserman, S. S. (1979). Teaching regression: An exploratory approach. The American Statistician, 33, 196–203.Google Scholar
  35. Lin, T. I., and Lee, J. C. (2006). A robust approach to t linear mixed models applied to multiple sclerosis data. Statistics in Medicine, 25, 1397–1412.Google Scholar
  36. Little, R. J. A. (1988). Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics, 37, 23–38.MathSciNetCrossRefMATHGoogle Scholar
  37. Lucas, A. (1997). Robustness of the Student t based M-estimator. Communications in Statistics: Theory and Methods, 26, 1165–1182.MathSciNetCrossRefMATHGoogle Scholar
  38. Manchester, L. (1996). Empirical influence for robust smoothing. Australian Journal of Statistics, 38, 275–290.MathSciNetCrossRefMATHGoogle Scholar
  39. Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4, 51–67.MathSciNetCrossRefMATHGoogle Scholar
  40. Mateos, G., and Giannakis, G. B. (2012). Robust nonparametric regression via sparsity control with application to load curve data cleansing. IEEE Transactions on Signal Processing, 60, 1571–1584.Google Scholar
  41. McLachlan, G. L., and Krishnan, T. (1997). The EM algorithm and extensions. New York: Wiley.Google Scholar
  42. Meza, C., Osorio, F., and De la Cruz, R. (2012). Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing, 22, 121–139.Google Scholar
  43. Moore, R. J. (1982). Algorithm AS 187: Derivatives of the incomplete gamma integral. Applied Statistics, 31, 330–335.CrossRefMATHGoogle Scholar
  44. Oh, H. S., Brown, T., and Charbonneau, P. (2004). Period analysis of variable stars by robust smoothing. Applied Statistics, 53, 15–30.Google Scholar
  45. Oh, H. S., Lee, J., and Kim, D. (2008). A recipe for robust estimation using pseudo data. Journal of the Korean Statistical Society, 37, 63–72.Google Scholar
  46. O’Sullivan, F., Yandell, B. S., and Raynor, W. J. (1986). Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association, 81, 96–103.Google Scholar
  47. Osorio, F. (2014). heavy: Package for robust estimation using heavy-tailed distributions. R package version 0.2-35. URL: CRAN.R-project.org/package=heavy.Google Scholar
  48. Phillips, R. F. (2002). Least absolute deviations estimation via the EM algorithm. Statistics and Computing, 12, 281–285.MathSciNetCrossRefGoogle Scholar
  49. Pinheiro, J., Liu, C., and Wu, Y. (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics, 10, 249–276.Google Scholar
  50. Poon, W., and Poon, Y. S. (1999). Conformal normal curvature and assessment of local influence. Journal of the Royal Statistical Society, Series B, 61, 51–61.Google Scholar
  51. R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org.
  52. Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11, 735–757.MathSciNetCrossRefGoogle Scholar
  53. Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric regression. Cambridge: Cambridge University Press.Google Scholar
  54. Shi, L., and Wang, X. (1999). Local influence in ridge regression. Computational Statistics & Data Analysis, 31, 341–353.Google Scholar
  55. Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). Journal of the Royal Statistical Society, Series B, 47, 1–52.MathSciNetMATHGoogle Scholar
  56. Staudenmayer, J., Lake, E. E., and Wand, M. P. (2009). Robustness for general design mixed models using the t-distribution. Statistical Modelling, 9, 235–255.Google Scholar
  57. Tharmaratnam, K., Claeskens, G., Croux, C., and Salibián-Barrera, M. (2010). S-estimation for penalized regression splines. Journal of Computational and Graphical Statistics, 19, 609–625.Google Scholar
  58. Thomas, W. (1991). Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. Journal of the American Statistical Association, 86, 693–698.MathSciNetCrossRefGoogle Scholar
  59. Utreras, F. I. (1981). On computing robust splines and applications. SIAM Journal on Scientific and Statistical Computing, 2, 153–163.MathSciNetCrossRefMATHGoogle Scholar
  60. Walker, E., and Birch, J. B. (1988). Influence measures in ridge regression. Technometrics, 30, 221–227.Google Scholar
  61. Wei, W. H. (2004). Derivatives diagnostics and robustness for smoothing splines. Computational Statistics & Data Analysis, 46, 335–356.MathSciNetCrossRefMATHGoogle Scholar
  62. Wei, W. H. (2005). The smoothing parameter, confidence interval and robustness for smoothing splines. Journal of Nonparametric Statistics, 17, 613–642.MathSciNetCrossRefMATHGoogle Scholar
  63. Wei, B. C., and Shih, J. Q. (1994). On statistical models for regression diagnostics. Annals of the Institute of Statistical Mathematics, 46, 267–278.Google Scholar
  64. Wei, B. C., Hu, Y. Q., and Fung, W. K. (1998). Generalized leverage and its applications. Scandinavian Journal of Statistics, 25, 25–37.Google Scholar
  65. Xiang, D., and Wahba, G. (1996). A generalized approximate cross validation for smoothing splines with non-Gaussian data. Statistica Sinica, 6, 675–692.Google Scholar
  66. Zhu, H., and Lee, S. Y. (2001). Local influence for incomplete-data models. Journal of the Royal Statistical Society, Series B, 63, 111–126.Google Scholar
  67. Zhu, H., Lee, S. Y., Wei, B. C., and Zhou, J. (2001). Case-deletion measures for models with incomplete data. Biometrika, 88, 727–737.Google Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2015

Authors and Affiliations

  1. 1.Departamento de MatemáticaUniversidad Técnica Federico Santa MaríaValparaísoChile

Personalised recommendations