Numerische Mathematik

, Volume 31, Issue 4, pp 377–403

Smoothing noisy data with spline functions

Estimating the correct degree of smoothing by the method of generalized cross-validation
  • Peter Craven
  • Grace Wahba

DOI: 10.1007/BF01404567

Cite this article as:
Craven, P. & Wahba, G. Numer. Math. (1978) 31: 377. doi:10.1007/BF01404567


Smoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline.

We consider the modelyi(ti)+εi,i=1, 2, ...,n,ti∈[0, 1], wheregW2(m)={f:f,f′, ...,f(m−1) abs. cont.,f(m)∈ℒ2[0,1]}, and the {εi} are random errors withEεi=0,Eεiεj2δij. The error variance σ2 may be unknown. As an estimate ofg we take the solutiongn, λ to the problem: Findf∈W2(m) to minimize\(\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 + \lambda \int\limits_0^1 {(f^{(m)} (u))^2 du} }\). The functiongn, λ is a smoothing polynomial spline of degree 2m−1. The parameter λ controls the tradeoff between the “roughness” of the solution, as measured by\(\int\limits_0^1 {[f^{(m)} (u)]^2 du}\), and the infidelity to the data as measured by\(\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 }\), and so governs the average square errorR(λ; g)=R(λ) defined by
$$R(\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g_{n,\lambda } (t_j ) - g(t_j ))^2 }$$
. We provide an estimate\(\hat \lambda\), called the generalized cross-validation estimate, for the minimizer ofR(λ). The estimate\(\hat \lambda\) is the minimizer ofV(λ) defined by\(V(\lambda ) = \frac{1}{n}\parallel (I - A(\lambda ))y\parallel ^2 /\left[ {\frac{1}{n}{\text{Trace(}}I - A(\lambda ))} \right]^2\), wherey=(y1, ...,yn)t andA(λ) is then×n matrix satisfying(gn, λ (t1), ...,gn, λ (tn))t=A (λ) y. We prove that there exist a sequence of minimizers\(\tilde \lambda = \tilde \lambda (n)\) ofEV(λ), such that as the (regular) mesh{ti}i=1n becomes finer,\(\mathop {\lim }\limits_{n \to \infty } ER(\tilde \lambda )/\mathop {\min }\limits_\lambda ER(\lambda ) \downarrow 1\). A Monte Carlo experiment with several smoothg's was tried withm=2,n=50 and several values of σ2, and typical values of\(R(\hat \lambda )/\mathop {\min }\limits_\lambda R(\lambda )\) were found to be in the range 1.01–1.4. The derivativeg′ ofg can be estimated by\(g'_{n,\hat \lambda } (t)\). In the Monte Carlo examples tried, the minimizer of\(R_D (\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g'_{n,\lambda } (t_j ) - } g'(t_j ))\) tended to be close to the minimizer ofR(λ), so that\(\hat \lambda\) was also a good value of the smoothing parameter for estimating the derivative.

Subject Classifications


Copyright information

© Springer-Verlag 1979

Authors and Affiliations

  • Peter Craven
    • 1
  • Grace Wahba
    • 2
  1. 1.The Computer LaboratoryThe University of LiverpoolLiverpoolEngland
  2. 2.Department of StatisticsUniversity of WisconsinMadisonUSA