Wavelet Shrinkage on Paths for Denoising of Scattered Data

We propose a new algorithm for denoising of multivariate function values given at scattered points in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb{R}^{d}}$$\end{document} . The method is based on the one-dimensional wavelet transform that is applied along suitably chosen path vectors at each transform level. The idea can be seen as a generalization of the relaxed easy path wavelet transform by Plonka (Multiscale Model Simul 7:1474–1496, 2009) to the case of multivariate scattered data. The choice of the path vectors is crucial for the success of the algorithm. We propose two adaptive path constructions that take the distribution of the scattered points as well as the corresponding function values into account. Further, we present some theoretical results on the wavelet transform along path vectors in order to indicate that the wavelet shrinkage along path vectors can really remove noise. The numerical results show the efficiency of the proposed denoising method.


Introduction
Within the last years, wavelet threshold methods have been shown to be a suitable tool for denoising of functions and images. In particular, using a translation-invariant filter bank, the visual artifacts in the neighborhood of discontinuities are well supressed [6]. The basic idea of wavelet denoising methods is a suitable separation of frequencies of a given noisy signal f . Supposing that the noise corresponds to wavelet coefficients with a small amplitude, the application of a thresholding procedure to the wavelet expansion of f removes it from the signal. Compared to other approaches to image/signal denoising (as nonlinear diffusion or variational methods), one advantage of the wavelet denoising method is its numerical efficiency.
However, the traditional (tensor-product) wavelet transform only applies for equidistant grids. There have been several attempts to generalize the wavelet transform to functions that are sampled on non-equidistant or scattered points. In the one-dimensional case, some of these methods apply a certain approximation in a first step in order to get back the equidistant design situation [1,3,10,12]. Other approaches that particularly apply in two or higher dimensions are based on the construction of second generation wavelets [17] by the lifting scheme, see e.g. [2,7,11,19]. These wavelet constructions adaptively depend on the scattered points and the corresponding function values and therefore lose much of the simplicity and efficiency of the traditional wavelet transform.
In [13], the easy path wavelet transform (EPWT) has been introduced by one of the authors for sparse image approximation. The EPWT employs the one-dimensional wavelet transform along path vectors through the image values. The path vectors contain all indices (i, j) corresponding to the image values f (i, j) and are determined in a way such that neighboring indices in the path are also neighboring in the two-dimensional grid and that the corresponding image values are well correlated. Obviously, the EPWT can be simply transferred to the setting of scattered data x j ∈ R d with corresponding function values f (x j ) ∈ R, where we have just to generalize the notion of a neighborhood for x j . A similar idea has been already applied in [14] for data on the sphere. The recently proposed generalized tree-based wavelet transform (GTBWT) [16] is closely related to the EPWT [13] and generalizes the Haar-like transform in [9]. In particular, the tree-based wavelet transform has been combined with a suitable sub-image averaging scheme for image denoising.
In this paper, we want to propose a new adaptive wavelet threshold scheme for scattered data denoising. The basic idea of our scheme is very similar to the EPWT, namely to employ the usual one-dimensional wavelet transform along suitable path vectors at each level. But we need to determine the path vectors differently in order to obtain good denoising results. Further, for resembling the cycle spinning method [6], we apply the wavelet threshold scheme several times along different path vectors and compute the average of the results.
The paper is organized as follows. In Sect. 2 we describe the general denoising algorithm along given path vectors. In Sect. 3 we propose two different methods for determining the path vectors, a deterministic and a random path construction. The two methods are both adaptive; the choice of the next path component adaptively depends on the spatial distance of the involved points x j and the distance of the involved function values f (x j ) (resp. the low-pass values at further levels of the wavelet transform). In Sect. 4, we explore some theoretical results that support our approach for scattered data denoising. Finally, in Sect. 5 we present numerical results of our data denoising method in the two-dimensional case. For the special case of images, we also present some comparisons with other image denoising methods.

Description of the Algorithm
Let Γ = {x 1 , . . . , x N } be the data set with scattered data points x j ∈ R d , and let f : R d → R be a function sampled on Γ ⊂ Ω, where Ω is a connected subset of R d consisting of a finite union of convex domains. This means, the measured valuesf (x j ) are given, where we assume that and z j denotes additive Gaussian noise with zero mean and an (unknown) variance σ 2 . For the distribution of the scattered data set Γ, we suppose quasiuniformity, i.e., considering the maximal density we assume that δ(Γ) < C · μ(Γ), with a global constant C. Further, we suppose that f is a piecewise smooth function. Now, the basic idea of the new algorithm is the application of the classical wavelet shrinkage procedure along one-dimensional path vectors thereby generalizing the easy path wavelet transform (EPWT), see [13].
The EPWT was introduced for image approximation. Therefore, the path construction used in [13] is based on the strong correlation of function values corresponding to neighboring points in the path vector. A new point in the path vector is taken from the set of neighbor points, such that the difference of the corresponding function values is minimal. In case of noisy data, this approach for the path vector construction needs to be changed since the correlation of function values is now influenced by noise. In our algorithm, the construction of suitable path vectors turns out to be crucial for a good denoising performance.
Similarly to [9] and [16], we resemble the "cycle spinning" method [6] in order to improve the denoising procedure. For image denoising, it is usual to apply the tensor product wavelet shrinkage procedure not only to the noisy image itself but also to all images obtained by up to seven cyclic shifts in xand in y-direction. After application of the wavelet transform to each of the 64 images obtained in this way, an averaging is applied that greatly improves the denoising result. In our approach, we will just apply the path wavelet shrinkage procedure repeatedly along different path vectors and then average the result.
We summarize the complete algorithm before describing the path construction in more detail.
. . , N, N = 2 t , a biorthogonal wavelet filter bank with decomposition filtersh,g and reconstruction filters h, g Iteration: Perform the following 4 steps for l = 0, 1, . . . , L − 1 with L < t: 1. Find a suitable path vector p l ∈ R N/2 l consisting of a permutation of {1, . . . , N/2 l } that describes a fixed order of points x l p l (j) and of the corresponding function values c l p l (j) . 2. Apply the (periodic) low-pass filterh to (c l p l (j) ) N/2 l j=1 followed by downsampling by two to obtain the low-pass data (c l+1 j ) . Apply the (periodic) high-pass filterg to (c l p l (j) ) N/2 l j=1 followed by downsampling by two to obtain the vector of wavelet coefficients (d l+1 j ) Reconstruct the values f (x j ) by the following iteration, where (c L j ) . Iteration: Perform the following three steps for l = L, L − 1, . . . , 1: Vol. 62 (2012) Wavelet Shrinkage on Paths for Denoising of Scattered Data 341 5. Apply an upsampling by two first and then the low-pass filter h to (c l j ) N/2 l j=1 . 6. Apply an upsampling by two first and then the high-pass filter g to (d l j ) N/2 l j=1 . 7. Add the results of the previous two steps to obtain (c l−1 and apply the reverse permutation to get (c l−1 j ) . Output: (c 0 j ) N j=1 the smoothed function values at scattered points x j ∈ Γ.
As already remarked before, analogously to the cycle spinning approach, we apply the above algorithm several times and average the results in order to improve the denoising method.
1. Observe that, similarly as for the EPWT in [13], the considered algorithm is usually not just a one-dimensional transform since the path vectors change at each level of the wavelet transform. This point is crucial for the performance of the denoising method. In particular, the set of scattered data points related to the computed low-pass values changes at each level according to step 3 of the algorithm. For a one-dimensional wavelet transform at equidistant points x j on a line, step 3 of the algorithm just transforms the point set again into a (scaled) equidistant point set of half length on the line and needs therefore not to be considered.

Construction of Path Vectors
The main challenge in the above denoising algorithm is to construct path vectors through the point sets Γ l . As we will see in the sequel, the choice of the path vectors is crucial for the success of the denoising algorithm. For the path construction, we have to answer the following questions: • How should one determine the neighborhood of scattered points as a tool for the determination of the path vectors? • How should one determine the distance between scattered points and between the corresponding function values? How do we have to weight the distance x l j − x l k 2 and the difference of corresponding function/lowpass values |c l j − c l k |? • Should the path vectors be chosen in a deterministic way or randomly, based on the point sets Γ l and the corresponding data c l j ? In the remainder of this section, we shall propose two path vector constructions that we will use in our numerical examples.

Adaptive Deterministic Path Construction
In R d , let us apply a neighborhood definition of the form where C 1 depends on the distribution of the original point set Γ. Observe that the path vector at level l of the wavelet transform is a permutation of the indices of the points {x l j : j = 1, . . . , N/2 l }. In particular, each point has to occur exactly once in the path vector. In our numerical examples for d = 2, we take C 1 such that N (x 0 j ) with x 0 j := x j ∈ Γ contains at least 8 entries for each j ∈ {1, . . . , N}. The path vectors at level l of the wavelet transform are determined by the following procedure. 1. For a fixed shrinkage parameter θ, compute the subset of points that have not been used in the path vector yet and whose corresponding function/low-pass values differ at most by θ, i.e., 2. Choose the next component in the path vector as follows.
• If N θ (x l p l (k) ) is not empty, then choose the next path component p l (k + 1) such that where we put x l p l (1) − x l p l (0) 2 := 0 for k = 1. For k > 1, one may alternatively choose p l (k + 1) such that is the empty set, we randomly choose the next path component fromÑ (x l p l (k) ). IfÑ (x l p l (k) ) is also empty, we choose the next index in the path vector randomly from the remaining indices of the point set Output: Path vector p l = (p l (k)) N/2 l k=1 . Remark 3.2. The parameter θ that has to be chosen for determining the neighborhood N θ (x l p l (k) ) can be taken of the same size as the thresholding parameter for the wavelet transform in Algorithm 2.1.
Determining the p l (k + 1) by (3.3) forces that scattered points corresponding to neighboring components in the path have similar distance, while (3.4) ensures the path to be continued in a similar direction as before, i.e., the angle between x l p l (k−1) −x l p l (k) and x l p l (k) −x l p l (k+1) is as small as possible. Theoretically, these requirements are supported by the fact that the polynomial reproduction property of low-pass filters (and the vanishing moment property of wavelet filters) can be exploited best for equidistantly sampled functions on a line, see Sect. 4. Of course, the two conditions (3.3) and (3.4) can also be coupled.

Adaptive Random Path Construction
Instead of the adaptive path construction considered above, we may also apply a procedure, where the next path component is taken randomly from the remaining indices, and where the probability to choose the next index depends on the spatial distance of the points on the one hand and on the distance of the corresponding function values on the other hand.
For the path construction at the l-th level, we now consider the vectors ) and define a symmetric weight matrix W l = (w(y l i , y l j )) N/2 l i,j=1 , where the weights in the product w(y l i , y l j ) = w 1 (x l i , x l j ) · w 2 (c l i , c l j ) may be chosen differently, depending on the range of scattered points x l i and of the given (noisy) function values resp. low-pass values c l i . A possible weight function, also used in the context of bilateral filtering [18], is 344 D. Heinen and G. Plonka Results. Math.
where σ 1 and σ 2 need to be chosen appropriately. The normalization by 2 2l/d in the definition of the weight is due to the fact that each level of the wavelet transform involves a decimation of the number of scattered points by two, and the remaining points have larger distances of each other. Regarding the low-pass values, the range of these values grows by √ 2 due to the filter normalization l∈Zh l = √ 2. While the proposed weight in (3.5) leads to a huge fully occupied weight matrix, we can strongly reduce the numerical effort by cutting the spatial weight at a suitable distance, i.e.
where we have to choose D 1 appropriately to ensure a sufficiently large neighborhood for each point x l i . Similarly, the weight w l 2 (c l i , c l j ) may be just put to zero if the distance |c l i − c l j | is greater than 2 l/2 D 2 with a suitable constant D 2 . We propose now the following algorithm. =f (x j ) and the low-pass filtered values c l j and hence is adaptive. One may alternatively construct completely non-adaptive path vectors by taking into account only the neighborhoods of the given scattered points x l j at each level l (or only the spatial weights w 1 (x l i , x l j )) and not the data values c l j . A non-adaptive denoising procedure possesses the advantage that all path vectors can be computed beforehand, such that a wavelet shrinkage procedure is similarly fast as a tensor product wavelet shrinkage in the regular case. However, the results of the non-adaptive denoising approach are worse than the performance using the adaptively determined path vectors, see Sect. 5. 2. Our proposed adaptive path constructions significantly differ from the generalized tree constructions in [9] that depend only on the set of scattered points x j and are hence non-adaptive. 3. The idea for determining the distribution for the random path vector construction is slightly related with the ideas for graph Laplacian constructions, see [5].

Properties of Wavelet Transform on Paths
Usual wavelet filter banks possess a lot of favorable properties as polynomial reproduction of low-pass filters resp. vanishing moments of wavelet filters. For a given one-dimensional function f with a certain smoothness, a corresponding decay of wavelet coefficients d l k (f ) in the wavelet expansion can be shown. Within the last years, this property has been extensively exploited for function space characterizations by means of wavelet expansions.
Conversely, wavelet coefficients of high magnitude indicate discontinuities (or points of lower smoothness). The success of wavelet denoising algorithms by thresholding of wavelet coefficients is based on these properties, since for (piecewise) smooth functions, small wavelet coefficients are related to noise. Considering now the case of scattered data {x 1 , . . . , x N } in the multivariate case and the wavelet transform along certain path vectors, we may ask, whether the favorable properties of the wavelet transform at least partially transfer to this setting. Regarding the polynomial reproduction of the low-pass filters along path vectors of scattered data, we find Theorem 4.1. Let (h k ) k∈Z be the low-pass filter of the decomposition step in a wavelet filter bank of perfect reconstruction with k∈Zh k = 1. Further, let p 1 = (p 1 (j)) N j=1 be the path vector for ordering the scattered data points j=1 obtained by one decomposition step of the wavelet filter bank along the path p 1 reproduces f at the points ( Proof. Application of the filter (h k ) k∈Z to the sequence of function values along the path p 1 gives Hence the assertion of the theorem follows using the new sequence of scattered points x 1 j with where (x 1 j ) N/2 l=1 is obtained by (componentwisely) applying the low-pass filter h and downsampling to the sequence of scattered points (x p 1 (j) ) N j=1 ordered along the path p 1 .
We suppose that the wavelet filterg = (g k ) k∈Z in the decomposition step of the wavelet filter bank satisfies the moment conditions k∈Z g k = 0 and k∈Z kg k = 0. Considering the wavelet coefficients obtained by applying the wavelet filterg along the path vector p 1 followed by downsampling, we observe that a constant function f (x) = c, c ∈ R, yields the wavelet coefficients while for a linear polynomial f (x) = a T x + b with a ∈ R d and b ∈ R, the wavelet coefficients Regarding the decay of wavelet coefficients obtained by a wavelet transform along paths (using a different path at each level), we can apply the results of [15] for two-dimensional piecewise Hölder continuous functions of order α ∈ (0, 1] thereby supporting our procedure of data denoising via wavelet transforms along paths. Let where Ω i is assumed to be a connected subset of [0, 1] 2 for each i = 1, . . . , K. Further, let us assume that the bivariate function f satisfies a Hölder condition in each region for some α ∈ (0, 1] and C > 0 independent of i. Let the function f be uniformly sampled on Ω, i.e., we have given f (x j1,j2 ) with x j1,j2 ∈ I 2J := {( j1 2 J , j2 2 J ), j 1 , j 2 = 0, . . . 2 J − 1}, and N = 2 2J . We can perform up to L = 2J levels of the wavelet transform along path vectors (EPWT).
For such "cartoon-like" functions, it has been shown in [15] that the EPWT wavelet coefficients corresponding to one region Ω i satisfy the estimate where C is the Hölder constant and α the Hölder exponent. The constant D measures the uniformity of the scattered points after each level, where we assume that min |x l j − x l k | < D2 l/2 . This condition is described as "diameter condition" in [15] and can be interpreted as a condition on the choice of the path vectors. Moreover, it has been shown in [15] that for J → ∞, the EPWT leads to an asymptotically optimal N -term approximation f N of f satisfying the estimate where f N is the two-dimensional function that is reconstructed from the N most significant EPWT wavelet coefficients. The above optimal approximation result holds under the assumption that the path vectors p l at each level of the EPWT satisfy (besides the diameter condition) a second condition, termed "region condition" [15]. The region condition ensures that the next index in the path vector p l should be taken from the same region Ω i if possible, before crossing to another region. The two conditions on the path vectors required in [15] are both contained in our algorithm. In the adaptive deterministic path construction, the neighborhood definition in (3.1) ensures the diameter condition while the neighborhood definition N θ (x) in (3.2) provides the region condition.
In the random path construction, the two path conditions are included by the choice of the weight that produces higher probabilities for those indices to be the next path component whose spatial distance x l p l (k) − x l p l (k+1) 2 as well as distance of function values/low-pass values |c l p l (k) − c l p l (k+1) | are small. Hence, the N -term approximation result in [15] can be taken as a further evidence, that small wavelet coefficients are rather due to noise than to important signal structure.

Numerical Results
We want to apply the proposed scattered data wavelet denoising algorithm along path vectors to 256 × 256 pixel images. The noisy images in this subsection are generated by adding synthetic white Gaussian noise to the original natural images. The quality of the denoising results is measured by the peak signal to noise ratio (PSNR) given by PSNR = 10 log 10 max j=1,..., where f denotes the original andf the noisy image. In our experiments, we have taken max j=1,...,N f (x j ) = 255. In Fig. 1, we present the original pepper image (a), the noisy pepper image (b) with PSNR = 19.97, and the denoising results using several different algorithms. In particular, we consider the 7-9 biorthogonal tensor-product wavelet shrinkage (c) with a PSNR of 24.91, the 7-9 biorthogonal wavelet shrinkage with cycle spinning (d) with 64 shifts and a PSNR of 28.11, the four-pixel scheme by Welk et al. [20] (e) using 76 iterations and a step size τ = 0.001 providing a PSNR of 28.26, and the curvelet shrinkage (e) with best threshold parameter 80 yielding a PSNR of 26.36. The curvelet denoising result is obtained with the help of the CurveLab software that is available at http://www.curvelet.org, see also [4]. The results of our algorithm (using a 7-9 biorthogonal wavelet filter bank) with a deterministic path vector (g) and a random path vector (h) are given in Fig. 1g, h. For the tensor-product wavelet shrinkage we have used the shrinkage parameter θ = 74. For   .2) we have taken C 1 = 1.3 and shrinkage parameter θ = 89. For the random path vectors that perform slightly worse than the deterministic vectors, we have taken the weight with σ 1 = σ 2 = 0.2, D 1 = 5.0 and θ = 71. Further, Algorithm 2.1 has been applied 64 times and the results have been averaged. The parameters for the curvelet transform and for the four-pixel scheme have been taken such that they perform optimally. In Fig. 2, we present the denoising results for the noisy pepper image with a PSNR of 16.45. Again, we compare our denoising results with tensor-product wavelet shrinkage, the four-pixel scheme and with curvelet shrinkage. For the tensor-product case with the biorthogonal 7-9 filter bank we have used the optimal threshold θ = 140 (without cycle spinning) and θ = 115 (with cycle spinning). The four-pixel scheme works best with a time step of 0.001 and 124 iterations. The curvelet shrinkage uses the optimal threshold θ = 106. For our method with the deterministic path we have taken the threshold θ = 127 and C 1 = 1.1; for the random path the same threshold and a weight with σ 1 = σ 2 = 0.2 and with D 1 = 5.
The denoising results for the pepper image and for the cameraman image of size 256 × 256 are summarized in Table 1, where we have chosen optimal parameters for each method.
In Fig. 3, we illustrate the remaining scattered points obtained after different levels of our deterministic path algorithm that is applied to the cameraman image. The results show that the points are rather well distributed. These points are at a time used for constructing the next path vector.
Finally, we show a denoising experiment, where we no longer consider a rectangular region but a region with an L-form, see Fig. 4. Here we have used deterministic path vectors with threshold θ = 89 and with C 1 = 1.3. The main advantage of our new scattered data denoising algorithm is that we can apply it to denoise functions on any connected region, where we have to adjust step 3 of Algorithm 2.1 suitably as e.g. given in Remark 2.2.