# Smoothing noisy data with spline functions

Estimating the correct degree of smoothing by the method of generalized cross-validation

Article

- 2.8k Downloads
- 1.5k Citations

## Summary

Smoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline.

We consider the model. We provide an estimate\(\hat \lambda\), called the generalized cross-validation estimate, for the minimizer of

*y*_{ i }(*t*_{ i })+ε_{ i },*i*=1, 2, ...,*n*,*t*_{i}∈[0, 1], where*g*∈*W*_{2}^{(m)}={*f*:*f*,*f*′, ...,*f*^{(m−1)}abs. cont.,*f*^{(m)}∈ℒ_{2}[0,1]}, and the {ε_{ i }} are random errors with*E*ε_{ i }=0,*E*ε_{ i }ε_{ j }=σ^{2}δ_{ ij }. The error variance σ^{2}may be unknown. As an estimate of*g*we take the solution*g*_{n, λ}to the problem: Find*f∈W*_{2}^{(m)}to minimize\(\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 + \lambda \int\limits_0^1 {(f^{(m)} (u))^2 du} }\). The function*g*_{n, λ}is a smoothing polynomial spline of degree 2*m*−1. The parameter λ controls the tradeoff between the “roughness” of the solution, as measured by\(\int\limits_0^1 {[f^{(m)} (u)]^2 du}\), and the infidelity to the data as measured by\(\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 }\), and so governs the average square error*R(λ; g)=R(λ)*defined by$$R(\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g_{n,\lambda } (t_j ) - g(t_j ))^2 }$$

*R(λ)*. The estimate\(\hat \lambda\) is the minimizer of*V*(λ) defined by\(V(\lambda ) = \frac{1}{n}\parallel (I - A(\lambda ))y\parallel ^2 /\left[ {\frac{1}{n}{\text{Trace(}}I - A(\lambda ))} \right]^2\), where*y=(y*_{1}, ...,*y*_{n})^{t}and*A*(λ) is the*n*×*n*matrix satisfying*(g*_{n, λ}(*t*_{1}), ...,*g*_{n, λ}(*t*_{n}))^{t}=*A (λ) y*. We prove that there exist a sequence of minimizers\(\tilde \lambda = \tilde \lambda (n)\) of*EV(λ)*, such that as the (regular) mesh*{t*_{i}}_{i=1}^{n}becomes finer,\(\mathop {\lim }\limits_{n \to \infty } ER(\tilde \lambda )/\mathop {\min }\limits_\lambda ER(\lambda ) \downarrow 1\). A Monte Carlo experiment with several smooth*g*'s was tried with*m*=2,*n*=50 and several values of σ^{2}, and typical values of\(R(\hat \lambda )/\mathop {\min }\limits_\lambda R(\lambda )\) were found to be in the range 1.01–1.4. The derivative*g*′ of*g*can be estimated by\(g'_{n,\hat \lambda } (t)\). In the Monte Carlo examples tried, the minimizer of\(R_D (\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g'_{n,\lambda } (t_j ) - } g'(t_j ))\) tended to be close to the minimizer of*R(λ)*, so that\(\hat \lambda\) was also a good value of the smoothing parameter for estimating the derivative.## Subject Classifications

MOS:65D10 CR:5.17 MOS:65D25## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Abramowitz, M., Stegun, I.: Handbook of mathematical functions with formulas, graphs, and mathematical tables. U.S. Department of Commerce, National Bureau of Standards Applied Mathematics Series No.
**55**, pp. 803–819, 1964Google Scholar - 2.Aronszajn, N.: Theory of reproducing kernels. Trans. Amer. Math. Soc.
**68**, 337–404 (1950)Google Scholar - 3.Golomb, M.: Approximation by periodic spline interpolants on uniform meshes. J. Approximation Theory
**1**, 26–65 (1968)Google Scholar - 4.Golub, G., Heath, M., Wahba, G.: Generalized cross validation as a method for choosing a good ridge parameter, to appear, TechnometricsGoogle Scholar
- 5.Golub, G., Reinsch, C.: Singular value decomposition and least squares solutions. Numer. Math.
**14**, 403–420 (1970)Google Scholar - 6.Hudson, H.M.: Empirical Bayes estimation. Technical Report #58, Stanford University, Department of Statistics, Stanford, Cal., 1974Google Scholar
- 7.Kimeldorf, G., Wahba, G.: A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Inst. Statist. Math.
**41**, 495–502 (1970)Google Scholar - 8.
- 9.Reinsch, C.M.: Smoothing by spline functions. Numer. Math.
**10**, 177–183 (1967)Google Scholar - 10.Reinsch, C.M.: Smoothing by spline functions, II. Numer. Math.
**16**, 451–454 (1971)Google Scholar - 11.Schoenberg, I.J.: Spline functions and the problem of graduation. Proc. Nat. Acad. Sci. (USA)
**52**, 947–950 (1964)Google Scholar - 12.Wahba, G.: Convergence rates for certain approximate solutions to first kind integral equations. J. Approximation Theory
**7**, 167–185 (1973)Google Scholar - 13.Wahba, G.: Smoothing noisy data with spline functions. Numer. Math.
**24**, 383–393 (1975)Google Scholar - 14.Wahba, G.: Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal.
**14**, 651–667 (1977)Google Scholar - 15.Wahba, G., Wold, S.: A completely automatic French curve: Fitting spline functions by crossvalidation. Comm. Statist.
**4**, 1–17 (1975)Google Scholar - 16.Wahba, G., Wold, S.: Periodic splines for spectral density estimation: The use of cross-validation for determining the degree of smoothing. Comm. Statist.
**4**, 125–141 (1975)Google Scholar - 17.Wahba, G.: A survey of some smoothing problems and the method of generalized cross validation for solving them. University of Wisconsin-Madison, Statistics Dept., Technical Report #457. In: Proceedings of the Conference on Applications of Statistics, Dayton, Ohio (P.R. Krishnaiah, ed.) June 14–18, 1976Google Scholar
- 18.Wahba, G.: Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc., Ser. B. To appearGoogle Scholar

## Copyright information

© Springer-Verlag 1979