Singular value decomposition of noisy data: noise filtering

The singular value decomposition (SVD) and proper orthogonal decomposition are widely used to decompose velocity field data into spatiotemporal modes. For noisy experimental data, the lower SVD modes remain relatively clean, which suggests the possibility for data filtering by retaining only the lower modes. Herein, we provide a method to (1) estimate the noise level in a given noisy dataset, (2) estimate the root mean square error (rmse) of the SVD modes, and (3) filter the noise using only the SVD modes that have low enough rmse. We show through both analytic and PIV examples that this method yields nearly the most accurate possible SVD-based reconstruction of the clean data. Moreover, we provide an analytic estimate of the accuracy of this reconstruction.

The SVD and POD are very attractive for de-noising experimental velocity field data, because no other rank r approximation captures more of the kinetic energy 2 in the data as the sum of the first r SVD modes (Schmidt 1907). However, when the data are corrupted by noise, it is a priori unclear how badly the SVD modes are corrupted by the noise and which modes to use when noise filtering.
Motivating questions for this article include the following: How effective is the SVD for filtering out noise and reconstructing the clean data? What method yields the most accurate reconstruction, and what are the limits to its accuracy? Can the magnitude of the noise be inferred from the noisy data themselves?
To define our goals mathematically, consider the following definitions: Let ∈ ℝ T×D be a matrix of clean data, with T time steps and D data sites ( T < D ). The reduced singular value decomposition of is where matrices ∈ ℝ T×T and ∈ ℝ D×T are each orthogonal and ∈ ℝ T×T is diagonal. The kth SVD mode consists of a left singular vector (temporal mode shape) k ≡ U 1∶T,k , a singular value s k ≡ S kk , and a right singular vector (spatial mode shape) k ≡ V 1∶D,k . Let Ã ∈ ℝ T×D be a matrix of noisy data: where ∈ ℝ T×D contains random noise with zero mean and standard deviation . 3 A reconstruction of the clean data can be formed by summing the first r SVD modes: where these barred variables could be the noisy (tilde) variables or any estimate of the clean ones. To evaluate such a reconstruction, we use the reconstruction loss where ‖ ⋅ ‖ F is the Frobenius norm, so (4) is a measure of the difference between the clean data and the reconstruction.
Herein, we consider being given noisy data Ã but not knowing and . Our goals are (1) to make an accurate estimate ̄ of the measurement error; and (2) to form a minimum-loss estimate Ā r of the clean data.

Existing noise filtering methods
A number of methods exist to choose which SVD modes to use for noise filtering. Epps and Techet (2010) reconstruct an estimate of the clean data using all noisy SVD modes for which where s k is the singular value of noisy mode k, is the root mean square error in the data, T and D are the number of timesteps and data sites, respectively. Raiola et al. (2015) use all noisy SVD modes for which Note that (6) can not be used in practice, since one does not know the clean singular values (if only given a noisy dataset). However, (6) can be put into usable form by noting that Raiola et al. (2015) derived (6) assuming s 2 k ≈ s 2 k + 2 D , so the spirit of (6) is equivalent to F k ≡ s 2 k + 2 D s 2 k−1 + 2 D < 0.999 .
(7) F k ≡s 2 k s 2 k−1 Brindise and Vlachos (2017) do not sum noisy modes k = 1, … , r but instead choose the r "smoothest" modes, as identified by the Shannon entropy. Their so called entropy line fit (ELF) method is a six step procedure: 1. Compute the discrete cosine transform (DCT-II) of each right singular vector, for d = 1, … , D and k = 1, … , T. 2. Square and normalize the DCT coefficients: 3. Compute the Shannon entropy of the normalized DCT signal 4 4. Sort the minimum entropy into ascending order, Ȳ k . 5. Use a "two-line fit" procedure to find the the optimal reconstruction rank, r B , where the sorted entropy Ȳ k nearly reaches a plateau. 6. Use sorted modes 1, … , r B for data reconstruction.
Note that while the Schmidt theorem guarantees that this reconstruction Ā r will yield more loss to the noisy data ‖Ã −Ā r ‖ 2 F than that using unsorted modes 1, … , r , this reconstruction might (or might not) yield less loss to the clean data ‖ −Ā r ‖ 2 F . Shabalin and Nobel (2013) reconstruct the clean data using all the modes, but they discount the singular values in such a way as to suppress the more noisy modes.
Note that the above references (as well as this manuscript) assume a complete dataset with no large outliers. Other papers have considered POD for reconstruction of gappy data or data with extreme outliers (Venturi and Karniadakis 2004;Wang et al. 2015;Higham et al. 2016).

Proposed 'E15' noise filtering method
The present work builds upon (Epps and Krivitzky 2019), wherein we derived and validated a theoretical prediction (13) of the root mean square error of the SVD modes, which is defined as Further, we showed that for modes constituting random noise, the rmse(ṽ k ) reaches a maximum value of √ 2∕D . The key idea of our proposed noise filtering method is to use only the modes for which the rmse(ṽ k ) is sufficiently below this noise ceiling.
Herein, we propose the following method for minimumloss noise filtering. where ̃k ≡s 2 k . This formula was derived and validated in (Epps and Krivitzky 2019). 5. Estimate the rank for minimum-loss reconstruction as follows: The parameter t k in (14) quantifies the cleanliness of a mode, where t 1 = 1 for the first (cleanest) mode, and t k = 0 for modes at the noise ceiling (rmse(ṽ k ) = √ 2∕D) . Modes that are sufficiently below this noise ceiling (i.e. that have a large enough t k ) are deemed clean enough to be useful for data reconstruction. The threshold in (15) r min ≡ maximum k such that t k > 5% .
was set at 5% , because we empirically found that level to yield accurate and robust results. 6. Reconstruct an estimate of the clean singular values: where ′ŝ k is a Marchenko-Pastur distribution (see "Appendix 1"), and k c is the minimum index k such that s k < ′ŝ k (Epps 2015). This cutoff index ensures that (16) yields a real number, and it sets s k to zero for modes in the tail of the distribution, which are obliterated by noise. Equation (16) follows from the observation that s 2 k ≈ s 2 k + ( �ŝ k ) 2 . 7. Reconstruct an estimate of the clean data via with r =r min from (15) and s k from (16). We will refer to (15)/(16)/(17) as the 'E15' reconstruction.

Illustrative example
To illustrate this procedure by way of example, consider a laminar, unsteady, 2D flow past a cylinder at Reynolds number Re = V ∞ d∕ = 100 . A clean (noise-free) dataset is extracted from CFD simulations, with D = 33,284 data sites (x and y velocities at 16,642 grid points) and T = 455 timesteps. The time-mean were not removed from these data. A noisy dataset Ã is created by adding independent, identically-distributed (i.i.d.) Gaussian noise with standard deviation = 10 −4 . Since the noise is i.i.d., w = f = 1. Figure 1 provides a snapshot in time of the velocity magnitude, showing the clean data, noisy data, and a data set formed by the 'E15' reconstruction procedure (15)/ (16)/(17). This reconstruction filters out much of the noise and faithfully represents the clean data. Figure 2a illustrates the singular values of the clean and noisy datasets, as well as the 'E15' reconstruction (16) and a best-fit Marchenko-Pastur distribution. Equation (31) uses this best-fit Marchenko-Pastur distribution to predict ̄= � = 1.0019 × 10 −4 , just 0.19 percent greater than the true . Figure 2b shows the root mean square error of the spatial mode shapes. The theoretical prediction (13) agrees well with the numerically-computed rmse. The rmse is very low for the first mode and reaches √ 2∕D for modes that are saturated with noise. The optimum reconstruction rank r min = 9 (15) is illustrated as the index for which the rmse approaches 5% of this noise ceiling, since modes with higher rmse constitute noise. Figure 2c presents the loss r (4) versus rank r for several reconstruction methods. For example, using the noisy singular values and vectors, As expected, the loss for the ' ũsṽ ⊺ ' reconstruction (red curve) has a pronounced minimum, because the lower modes contain most of the signal, and the higher modes contain mostly noise. The proposed reconstruction method (15)-(17) is labeled ' ũsṽ ⊺ E15' in Fig. 2c. Figure 2c shows the loss for the 'E15' reconstructions at all ranks (green dashed curve), but note that the 'E15' method selects rank r min = 9 (per 15). There are two key points to emphasize: First, the 'E15' loss at r min is slightly lower than the minimum loss achievable using the standard ' ũsṽ ⊺ ' reconstruction (red curve). Moreover, the 'E15' loss at r min is only 1.7% higher than the minimum 'E15' loss (which actually occurs at r min = 11).
Second, the main advantage of the ' ũsṽ ⊺ E15' reconstruction, which uses reconstructed singular values s per (16), is that its loss is much less sensitive to the choice of rank r min (for ranks greater than the actual r min ), because the singular values for the higher (noisier) modes are suppressed. In contrast, the loss of the standard ' ũsṽ ⊺ ' approach increases (dramatically in some cases) for ranks r > r min . Thus, we will see in Sect. 3 that the 'E15' method achieves lower loss and less variability than the standard ' ũsṽ ⊺ ' method. Figure 2c also shows the Ā r = ∑ r =1ũ s cṽ ⊺ reconstruction loss, which we show in "Appendix 2" to be the minimum possible loss if reconstructing with the noisy singular vectors. Note that the 'E15' loss nearly overlays on this theoretical minimum. Reconstruction methods from the prior literature result in higher losses than that achieved herein. Raiola et al. (2015) suggest the ũsṽ ⊺ reconstruction at rank r F = 12 (per 6) or rF = 19 (per 7). Epps and Techet (2010) suggest rank r = 3 . The ELF method can not be used in this example, because the CFD grid was unstructured, and the 2D DCT-II step in the ELF method requires data on a plaid grid.

Outline
The following sections provide examples that illustrate additional details of the proposed methods. In Sect. 3, we consider analytic examples in order to provide a "controlled environment" within which to present and validate some finer details of the present theory. In Sects. 4 and 5, we consider application to synthetic PIV data, which raises "real world" issues yet still affords us the ability to validate our results using clean data. Finally, in Sect. 6, we demonstrate the methods on real PIV data.

Analytic minimal working example
Consider data constructed via = ⊺ , with with T = 40 and D = 100.
(18) In this example, a Monte Carlo simulation of N = 10,000 trials was used to evaluate the performance of various noise filtering schemes. In each trial, we generated a noisy dataset Ã = + (with containing i.i.d. Gaussian noise with = 10 −3 ), made estimates ̄ , s k , and Ā r , and evaluated the reconstruction loss. Figure 3 shows the mean and standard deviation of the results for the N trials.

Estimates of the clean singular values
First, consider estimates of the clean singular values. Figure 3a shows good agreement between the clean singular values (black curve) and the 'E15' reconstruction s k (16) (green dashed curve). The reason for this good agreement is that the noisy singular values (red curve) are well approximated by s k ≈ √ s 2 k + ( �ŝ k ) 2 (green dashed curve, labeled ' ⟨s k ⟩ E15'), which is the theoretical basis of the 'E15' reconstruction. This approximation works well here, because the clean singular values decay rapidly as compared to the noisy ones.

Root mean square error (rmse)
In order to help orient the reader as to the levels of noise in the modes, we find it useful to define the following mode indices: Since rmse(ṽ k ) ≈̄∕s k , these three indices correspond to modes with rmse(ṽ k ) ≈ 1∕  (22) is marked " ̄r approx.". The symbols marked r B , r F , and rF indicate the Brindise ELF method and the Raiola criteria (6) and (7), respectively. Here, T = 100 , D = 2000 , = 10 −5 , � = 1.23 , �� = 0.96 , and ̄= 1.23

1∕
√ D , respectively. Moreover, index k F marks the first mode that fails the (Epps and Techet 2010) criterion (5), k 2 provides a rough approximation for the minimum-loss reconstruction rank, and k is the minimum index for which the noisy singular values overlay on the Marchekno-Pastur noise distribution. Figure 3b shows the root mean square error of the spatial modes, comparing the theoretical prediction (13) to numerical results. Here, the theory slightly over-predicts the rmse, because the rmse is proportional to the noise level ̄ , and the estimated ̄= 1.18 is too large. Nevertheless, the form of the rmse is correct, and the optimum index r min (15) is predicted within ± 2 indices of what it would be if ̄ was more accurate. Figure 3c shows the reconstruction loss for the 'E15' method (16)/(17) and the 'EK18' method (19)/(17) with and without c (41). For reference, we also show the

Reconstruction loss
⊺ methods, which are the best-and worse-case scenario reconstructions. The variable c is derived in "Appendix 2", where we show that among all reconstructions that use the noisy singular vectors, the reconstruction that minimizes the loss r is the one with s = s c , where This c accounts for the projections of the noisy singular vectors in each of the clean singular vector directions. The inset of Fig. 3c shows that (for all five methods) the loss is very sensitive to choice of rank for r < k 2 but is relatively insensitive to rank for r > k 2 . The reason for this behavior is that modes k < k 2 contain most of the signal content, whereas modes k > k 2 contain mostly noise.
Note that for the ' ũsṽ ⊺ E15' method, the rank r min = 19 (from 15) yields r min = 2.60 × 10 −3 , which is just 2.5% larger than the minimum of the 'E15' loss curve r min = 2.53 × 10 −3 , where r min = 21 . Also, note that at r min the 'E15' loss is nearly equal to the loss of the ' ũ s cṽ ⊺ ' method, which is the theoretical minimum. As shown in Fig. 3d, the 'E15' method also has the least variation in loss, almost as low as the hypothetical ' ũ s cṽ ⊺ ' method.

Estimate of reconstruction loss
One motivating question of this article is What is the limit to the accuracy of SVD-based data reconstruction? We can answer this question mathematically with the following estimate of the loss: where the s k are the reconstructed singular values (16). Equation (22) is an approximation of the hypothetical loss prediction (61) derived in "Appendix 3". This approximation (22) has the correct asymptotic behavior at k ≈ 1 and correctly equals ̄2TD at k = T (since 2 TD is the loss of the original noisy data). Figure 3c shows this " ̄r approx." is reasonable; at rank r min , Eq. (22) predicts ̄̄r min = 2.89 × 10 −3 , which is just 11% higher than the actual 'E15' loss r min .
Clearly, the loss predicted by Eq. (22) has a minimum, since ∑ T k=r+1s 2 k monotonically decreases with r whereas the ̄2Dr monotonically increases with r. Thus, the minimum ̄r forms a rough limit to the accuracy of the data reconstruction. More usefully, ̄̄r min forms an estimate of the actual loss of the 'E15' data reconstruction. 5

Figure of merit for reconstruction accuracy
A figure of merit for the reconstruction can be formed by comparing the estimated reduction in loss to the loss of the original noisy data: This non-dimensional ratio can be used to gauge the improvement in accuracy. 6 The parameter M can also be loosely interpreted as the fraction of noise that has been removed. For the cylinder example of Sect. 2.1, M = 98% (also ̄̄r min = 0.0034 and r min = 0.0031 ). For the present analytic example (Sect. 3.1), M = 48%.

Comparison to literature methods
Other reconstruction methods were examined as well. The (Brindise and Vlachos 2017) ELF method on average yields low loss, ⟨ r ⟩ = 2.77E − 3 , although it has a much larger variability in loss ( r = 1.09E − 4 ) than the 'E15' method (see Fig. 4c, d). The reason for this larger variability is that the ELF method has a larger variability in rank ( r B = 0.905 ) than the 'E15' method ( r min = 0.494 ), so the ELF method more often chooses the rank too low, resulting in large losses.
The (Raiola et al. 2015) F and F methods yield significantly higher ranks and losses than the present 'E15' method. For this example, the F k method always predicted r F = 40 , because the F k parameter was always less than 0.999. The reason that F k < 0.999 here is because F k asymptotes to 1 only when the data matrix has a very high aspect ratio ( D ≫ T ) such that the tail of the noisy singular values is nearly constant.
Not shown in the figures are the results of the (Shabalin and Nobel 2013) method, since they nearly overlay on the present ' ũscṽ ⊺ EK18' method, although their theoretical development and final equations differ from those presented herein.

Effect of the distribution of singular values
In order to illustrate the effect of the distribution of singular values, consider constructing the analytic data using the singular vectors from (18) but now using the following singular values: The noise data were constructed by first drawing from a normal distribution with standard deviation √ w and then performing uniform spatial smoothing over a window of width w. This two-step process yields spatially-correlated noise with standard deviation . Results of a Monte Carlo simulation with = 10 −5 , w = 5 , T = 200 , D = 2000 , and N = 1000 are shown in Fig. 4.
The results in Fig. 4 generally agree with those of in Fig. 3. One key difference in the results of these examples is that here the ' ũscṽ ⊺ EK18' reconstruction has unacceptably poor performance in both loss (Fig. 4c) and variability (Fig. 4d). The reason is that the estimated c attenuates modes 56 and higher much more than the exact c , because the root mean square errors of these modes are large (see Fig. 4b). Since using c might lead to a more lossy reconstruction than the original noisy data, we recommend not using c in data reconstruction. Although the two 'EK18' methods have a more theoretically-sophisticated foundation than the 'E15' method, they again are found to be less accurate and more variable than the 'E15' method; therefore, the 'EK18' methods are not recommended. Figures 3 and 4 show that the 'E15' method (15)/(16)/ (17) yields the least mean loss ⟨ r ⟩ and the least variation r of all the practical methods shown. Moreover, the loss from the 'E15' method is very close to that from the theoretical minimum-loss method ' ũ s cṽ ⊺ '. Here, the optimum reconstruction rank is predicted to be r min = 66 , while the actual minimum of the 'E15' loss curve occurs at r min = 69 . However, the 'E15' loss curve is very flat, and the loss r min = 1.50 × 10 −5 is just 0.9% higher than r min = 1.48 × 10 −5 . Again, Eq. (22) provides an acceptable estimate of the loss. Here, (22) predicts ̄̄r min = 2.08 × 10 −5 , about 40% larger than the actual r min . Here, the reconstruction merit (23) is M = 32%.

Generation of datasets
In this section, we consider a synthetic PIV data set generated using Sect. 2.1 CFD data. As illustrated in Fig. 5, a clean image pair was created for each time step, with particles placed randomly in the initial image, and then advected with the local flowfield (as interpolated from the clean CFD data) for the second image. The noisy image color intensity was obtained by reducing that of the corresponding clean image and adding uniformly-distributed noise to each pixel. PIV processing was performed in DaVis 8.3.1, using several multi-pass approaches as summarized in Table 1. For the figures, PIV processing was performed with an initial interrogation window size of 64 × 64 px with 50% overlap, then three passes on a final window size of 32 × 32 px with 75% overlap. Since the noisy data at the edge of the PIV domain have non-Gaussian error, these data were cropped prior to SVD analysis. The resulting vector field contained 126 × 84 vectors ( D = 21, 168 ) and T = 455 timesteps. Further details regarding generation and processing of the synthetic PIV images are given in (Epps and Krivitzky 2019).
Here, we introduce the following notation: = clean CFD data, * = clean PIV data, Ã * = noisy PIV data, and Ā * = reconstructed PIV data (which should agree with * ). We also redefine the reconstruction loss from (4) as

Results overview
As in previous examples, the proposed methods prove successful: • The error estimation procedure of "Appendix 1" produces an accurate estimate ̄= 1.85 × 10 −5 of the actual rms error = 1.90 × 10 −5 , which is the rms difference between the clean and noisy PIV data. • Equation (16) yields an accurate reconstruction of the clean singular values for k <r min , which are the important modes (see Fig. 6a).
• The 'E15' noise filtering method (16)/(17) yields losses nearly as low as the theoretical minimum-loss ' ̃ s c̃ ⊺ ' curve (see Fig. 6c). • The Brindise method yields rank and loss similar to those of the present approach, whereas the Raiola methods yield much higher rank and loss. • The estimated loss ̄̄r min = 3.0 × 10 −4 from (22) is within a factor of 2 of the actual r min = 1.6 × 10 −4 . • Here, the reconstruction merit (23) is M = 91%. Figure 7 shows a snapshot of the vorticity field of the clean PIV data, noisy PIV data, 'E15' reconstruction at r min = 16 , and the standard ' ̃ s̃ ⊺ ' reconstruction at r = k F − 1 = 7 .
Note that while the original noisy-data vorticity field contains significant noise, the character of the vorticity field is restored by the 'E15' reconstruction. For example, the 'E15' reconstruction recovers the thin blue 'stripe' in the middle of the frame and the green 'spike' near the right side. Choosing a lower rank, such as r = 7 , results in additional smoothing but loss of some of these fine details of the flowfield.

On the use of w = 1 to predict r min
The important new feature of this example is that it uses PIV data, which contain spatially-correlated noise. Consequently, the noisy singular values in Fig. 6a are best fit by at Marchenko-Pastur distribution that has spatialcorrelation parameter f > 1 (here, f = 7.1 ). Inserting this f = 7.1 into (12) yields an effective smoothing window width of w = 13.7 , which is reasonable considering that the data were smoothed three times using the 9 nearest actual rms error between noisy and clean PIV data, ̄ error estimate ("Appendix 1"), r min and r min actual rank and value of minimum 'E15' loss, r min and r min estimated optimum rank (15) [with w = 1 in (13)] and resulting loss, r B and r B Brindise ELF rank and loss, r F and r F Raiola F rank and loss, rF and rF Raiola F rank and loss PIV processing approach ×10 5̄× 10 5 r minrmin r B r F rF k 2 neighbors. Also, the data contain spatial correlation due to being acquired in overlapping interrogation windows. A question arises as to whether the value w = 13.7 should be used to estimate the root mean square error (rmse) of the SVD modes (and subsequently the optimal reconstruction rank). Figure 6b compares the rmse predictions (13) using both w = 13.7 and w = 1 , as well as the numerically-computed rmse (which is that between the noisy and clean PIV modes). The optimal reconstruction rank (15) predicted using the w = 1 rmse is marked r min = 16 , and an alternative rank (predicted using the w = 13.7 rmse) is marked r alt = 13 . In this case, both ranks are nearly the same as the actual rank r min = 15 that yields minimum 'E15' loss. Moreover, the difference in loss between these three ranks is negligible, so use of w = 13.7 or 1 is somewhat of a moot point in this example.
However, in other examples (Sects. 5, 6), we have found that using the rmse predicted with w = 1 yields a more accurate estimate of the optimal reconstruction rank. Using w = 1 drives the rmse down, which drives the w = 1 rank r min lower than the w > 1 rank r alt . Considering the shape of the loss curve (see Fig. 6c), it is much better to estimate the rank too high than too low. Therefore, it is recommended that w = 1 should be used to predict rmse and subsequently r min .

Effect of PIV processing scheme
Another question arises as to the efficacy of the proposed noise filtering method with respect to different PIV processing schemes. Table 1 compares results for several PIV processing approaches. Interestingly, the minimum loss r min decreases as the final interrogation window size increases. This result is explained by the fact that r min is some fraction of 2 TD , and also decreases with increasing interrogation window size.
For all cases in Table 1, the tail fit method of "Appendix 1" produces an accurate estimate ̄ of the actual error , and the rank r min (predicted using w = 1 ) is very close to the actual optimum rank r min . Moreover, the resulting loss r min is nearly as low as the minimum loss r min for most cases. Again, the Brindise ELF method yields slightly higher losses, and the Raiola F and F methods yield significantly higher losses. In this example, we consider synthetic PIV of a much more complex flowfield, fully-developed turbulent channel flow. This example provides further validation of the proposed methods, and it allows us to illustrate the effects of sampling timestep and number of snapshots on the proposed methods.

Generation of data sets
Synthetic PIV images were generated as in Sect. 4, with particles placed randomly in the initial image and advected with the local flowfield for the second image. Here, the flowfield was interpolated from direct numerical simulation (DNS) data of turbulent channel flow (Kim et al. 1987;Lee et al. 2013) obtained from the Johns Hopkins turbulence database (Li et al. 2008;Graham et al. 2016). The image size was 1024 × 1024 px, with 20,480 particles ( ≈ 20 particles per 32 × 32 px interrogation window). Image pairs were generated corresponding to DNS domain 0 ≤ x ≤ 1 , −1 ≤ y ≤ 0 and the first 2000 timesteps of the DNS database. PIV processing was performed in DaVis 8.3.1, using an initial interrogation window size of 64 × 64 px with 50% overlap, then three passes on a final window size of 32 × 32 px with 75% overlap. After cropping the edges, the resulting vector field contained 126 × 126 vectors ( D = 31, 752 ) and 2000 timesteps.

Baseline case ( ıt, T = 500)
In this subsection, we consider only the first T = 500 timesteps of PIV data (spaced t apart, where t is the DNS database timestep). Here, the rms error between the noisy and clean PIV data is = 0.0105 , and the Marchenko-Pastur tail fit predicts ̄= 0.0109 and f = 7.9 ( w = 15.3).
The key difference between this turbulent flow example and the previous ones is that here, the clean singular values decay much more slowly. As a result, there is less separation between the clean and noisy PIV singular values (see Fig. 8a). For example, here s * T ∕s * T = 7.7 , whereas in the PIV cylinder example s * T ∕s * T = 15.3 , and in the analytic examples of Sects. 3.2 and 3.1, s T ∕s T = 32 and 49, respectively.
This reduced separation between the clean and noisy singular values makes it challenging to correctly estimate the error level ̄ and effective smoothing window width w. 7 As a result, the rmse predictions end up too large when using w > 1 (c.f. compare the w = 15.3 theory curve and the numeric rmse curve in Fig. 8b). Instead, the rmse should be estimated using w = 1 , which provides a conservative estimate of the rmse that is useful for predicting the optimum reconstruction rank r min . Figure 8c shows the reconstruction loss for this channel flow case: The estimated reconstruction rank r min = 105 yields a loss of r min = 497 , which is nearly as low as the minimum 'E15' loss r min = 495 (at r min = 114 ). As in previous examples, the Brindise ELF method has slightly higher loss ( r B = 569 with r B = 110 ), and the Raiola methods have higher losses ( rF = 857 at rF = 180 ). Equation (22) again provides a very good loss estimate: ̄̄r min = 494 (just 0.8% lower than r min ). The reconstruction merit (23) is M = 74%.

Effect of timestep and number of snapshots
Two important questions to address at this point are: (1) how do the various noise filtering methods perform as the separation between the noisy and clean singular values varies?, and (2) how do the timestep and number of frames in the dataset affect the singular values? To address these questions, we performed the SVD on six datasets subsampled with various Fig. 7 (Sect. 4 cylinder PIV example) Vorticity field for a single timestep, illustrating the 'clean PIV' and 'noisy PIV' data, as well as two reconstructions: the ̃ s̃ ⊺ reconstruction at rank r = k F − 1 = 7 , and the ̃ s̃ ⊺ E15 reconstruction at rank r min = 16 timesteps ( t , 2 t , and 4 t , where t is the DNS database timestep) and number of snapshots ( T = 500, 1000, 2000). Figure 9a shows the clean singular values for these six cases. When plotted versus normalized mode index k / T, the cases with the same timestep (but different T) overlay, indicating that the number of snapshots T has little effect on the clean singular values. Rather, the shape (slope and magnitude) of the tail of the clean singular values is dictated by the timestep.
By contrast, the noisy singular values are relatively insensitive to the timestep (Fig. 9b), so a larger timestep causes there to be less separation between clean and noisy singular values. Figure 9c shows the loss for the three T = 500 cases (with the t case carried over from Sect. 5.2). As expected, a larger timestep results in more loss, and less of a "bucket" below the noisy dataset loss 2 TD within which to advantageously filter noise. Consequently, the reconstruction merit (23) decreases with increasing timestep; for the t , 2 t , and 4 t cases, we find M = 74% , 54%, and 9%, respectively. In all three cases, the present methods yield reasonable predictions of the optimal reconstruction rank r min . The ELF method performs well at t and 2 t but not 4 t . In contrast, the Raiola method performs best at 4 t.
For reference, the rmse for the { 2 t , T = 500 } case are shown in Fig. 9d. Here, the effect of using w > 1 versus w = 1 is very pronounced, with the w = 1 rmse predictions yielding a much better optimal rank estimate r min = 164 . The w > 1 rmse predictions yield r alt = 111 , which results in significantly more loss than using r min .

Comment regarding optimal rank criterion
Some readers might find it controversial to use the root mean square error of the spatial modes (rmse(ṽ k )) in order to predict the optimal reconstruction rank. The reasoning is that modes with very similar energy might switch order, so their computed rmse might be very large, even if the modes themselves did not change.
Regarding modes with similar energy (i.e. small gap between neighboring singular values), there are two important cases to consider: First, consider "paired modes", which are modes with similar energy that are well below the noise ceiling. Equation (13) shows that the rmse of each of these modes will indeed be higher than the rmse of "isolated modes" (i.e. those with well-separated singular values). The reason is that paired modes can and do mix upon perturbation. However, Wedin's theorem ensures that the subspace spanned by these modes, as a group, is only slightly perturbed. Thus, paired modes below the noise ceiling should be and are used in our data reconstruction method.
The second case to consider is "noise modes", which are modes with rmse at the noise ceiling (rmse(ṽ k ) ≈ √ 2∕D ). , we show that modes with rmse(ṽ k ) ≈ √ 2∕D constitute random noise. Thus, these modes should be discarded for noise filtering.
A subtle point is that Eq. (15) keeps all the paired modes and only discards the noise modes, because r min is set as the "maximum k such that t k > 5% ". In other words, there might be some paired modes with t k > 5% , but as long as there are some isolated modes with t k < 5% for larger k, then those paired modes are kept. For example, note in Fig. 9d that modes k = 134-135 and 142-143 have rmse above the 5% cutoff, but these paired modes are kept, because there are other isolated modes between those and r min = 164 , which is the last mode with rmse below the 5% cutoff.

Laminar jet PIV example
In this final example, we apply the present noise filtering methods to a real PIV dataset. Here, we consider a publiclyavailable PIV dataset that captures the Kelvin-Helmholz instabilities of a laminar jet in a quiescent fluid (Neal et al. 2015). This set of PIV images is unique in that it is accompanied by 'clean' vector fields that have been obtained using high-dynamic-range PIV (PIV-HDR), which has higher accuracy than standard PIV because it uses multiple cameras with very high image resolution (Neal et al. 2015).

Generation of datasets
A sample PIV image is shown in Fig. 10; the red solid rectangle highlights the cropped region used for PIV processing, and the green dash-dot rectangle represents the region where the PIV-HDR data are available. Since the PIV and PIV-HDR data are not available on the same grid, the SVDs of these two datasets cannot be compared directly, and the reconstruction loss must be computed only over the PIV-HDR region. For the reconstruction loss, the PIV-HDR data are down-sampled to the standard PIV grid locations. PIV processing was performed using DaVis 8.3.1, with a final window size of 16 × 16 px with 75% overlap, leading to a 43 × 84 vector grid. Figure 10 displays a sample vector field and corresponding vorticity contours; flow is left to right.

Results and discussion
Similar to the previous example, singular values of the noisy data ( Fig. 11a) are very-well fit by a Marchenko-Pastur distribution with f > 1 (here f = 8.1 , w = 15.7 ). In this example, a long 'transition region' k F < k < k exists, wherein the modes transition from fairly clean ( k < k F = 3 ) to complete noise ( k > k = 125 ). With the 'E15' reconstruction (16), the singular values in this transition region are reduced. Although 'clean' data exists, their singular values are not directly comparable to those of the PIV dataset due to the different region sizes, so 'clean' singular values are not shown.
Reconstruction loss (Fig. 11c) was determined by sampling both the 'clean' PIV-HDR data and the PIV reconstruction datasets at points within the HDR region. The results generally follow the previous examples: The standard ' ũsṽ ⊺ ' reconstruction has a minimum (here, r = 377 at r = 18 ) and then rises for higher ranks. The ELF method selects r B = 15 modes for reconstruction, resulting in a loss of r B = 395 . The Raiola F method selects rank rF = 43 , yielding loss rF = 430 . The 'E15' reconstruction selects rank r min = 30 and has loss r min = 360 , which happens to be the minimum E15 loss. Of the three methods, again the E15 reconstruction has the minimum loss. The Raiola F method was not applicable, because the clean singular values were unknown for this real PIV dataset. The loss estimate (22) and reconstruction merit (23) also are not applicable here, because the loss is not computed over the entire PIV domain. 8 Snapshots of vorticity for the noisy data compared to the 'E15' reconstruction (at r min = 30 ) are shown on the left of Fig. 12 at two different times for the entire PIV region. The time-mean velocity field has been removed, revealing significant noise in the measurements of unsteady components. The E15 reconstruction filters out much of the noise, leaving a much clearer view of the important flow structures. The three large panels of Fig. 12 show the noisy data, E15 reconstruction, and 'clean' PIV-HDR data on the HDR region. The E15 reconstruction matches the prominent features of the 'clean' data quite well.

Conclusions
This paper addresses several questions regarding noise filtering via the singular value decomposition: How effective is the SVD for filtering out the noise and reconstruction the clean data?
The effectiveness of the SVD for noise filtering is dependent on the decay rate of the clean singular values and the noise level of the data. Recall, the reconstruction loss was reasonably estimated by Eq. (22), which predicted Considering (25), it is clear that the lowest losses can be achieved in problems where the optimum reconstruction rank r min is small and the clean singular values in the tail k >r min are negligible. The examples from Sects. 2.1, 4, and 5.2 have such character, so their reconstruction loss r min was much less than the loss of the original noisy data 2 TD.
The non-dimensional ratio can be used to gauge the improvement in loss. Values greater than zero suggest that the reconstruction is more accurate than the original noisy data, with values approaching unity suggesting a highly accurate reconstruction. The character of having negligible clean singular values and a low reconstruction rank is akin to having a large separation between the tails of the clean and noisy singular values. With large separation, the filtering capacity of the SVD is quite good. However, in cases with little separation, an SVD-based reconstruction might filter some noise but not necessarily improve the accuracy (loss) over that of the  original noisy data (c.f. the 4 t curve in Fig. 9c). In Sect. 5.3, we showed that the separation between clean and noisy singular values can be increased by using a decreased timestep. What method yields the most accurate data reconstruction, and what are the limits to its accuracy? While a number of practical reconstruction approaches were investigated, the 'E15' method (15)/(16)/(17) proved to be most robust, with the lowest mean loss ⟨ r ⟩ and the least variation r . Moreover, choosing the rank r min yields nearly the minimum loss possible when reconstructing with the noisy singular vectors.
In "Appendix 3", we show that even more accurate reconstructions could be formed if the clean singular vectors were known (or correctly estimated somehow). Thus, one focus of future work is to develop a practical method for estimating the clean singular vectors from the noisy ones.
Can the magnitude of the noise be inferred from the noisy data themselves? A method to estimate the RMS noise was presented in "Appendix 1". The approach is to fit a Marchenko-Pasteur distribution to the tail of the singular value distribution, with the noise level and the choice of index defining the tail determined so as to minimize the least square error of this fit. This method is sufficiently robust so as to enable noise estimation (and subsequently noise filtering) in an automated data processing code.
Collectively, this body of work provides a thorough analysis of the effects of noise on the SVD of noisy data, the potential for noise estimation using the SVD, and the capabilities of the SVD for noise filtering.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix 1: Estimation of the 'measurement error' , 'spatial correlation factor' f, and 'effective smoothing window width' w
Here we describe a procedure that can be used to estimate the measurement error ̄ , 'spatial correlation factor' f, and 'effective smoothing window width' w for a given noisy dataset. The procedure involves fitting the tail of the noisy singular values with a Marchenko-Pastur distribution, ′ŝ . The inputs to the procedure are the noisy singular values s and the size, T and D, of the dataset. The outputs are ̄ , f, and w, as well as the noise level of the fit ′ , and the index k marking the start of the fit.
In (Epps and Krivitzky 2019), we provide the formula for the Marchenko-Pastur distribution ′ŝ . We use the notation ′ŝ to emphasize that these singular values are linearly proportional to the noise level ′ (such that ŝ corresponds to a unit-noise distribution). These ŝ are a function of T, D, and f.
In (Epps and Krivitzky 2019), we showed that the tail of the noisy singular values s follows a Marchenko-Pastur distribution ′ŝ . That is, s ≈ �ŝ in the tail ≥ k , where k is defined as the minimum index for which s < √ D . This suggests a noise estimation procedure: Fit the tail of the data ( s for = k , … , T ) with ′ŝ , and then upon setting �ŝ k = √ D at the tail-start index k , find the estimate ̄= �ŝ k ∕ √ D . One difficulty with this approach is that k is unknown a priori, because it depends on . Therefore, a slightly more elaborate procedure is required.
Given a guess of the tail-start index k, we use least squares to fit a Marchenko-Pastur distribution ′ŝ to the tail of the noisy singular values s ( =k,…,T) . The mean square error between log 10 ( �ŝ ) and log 10s is: The ′ that yields the minimum L is found by solving dL d(log 10 � ) = 0 for ′ , which yields Thus, for each choice of tail-start index k, Eq. (27) can be used to find the best fit � (k) , and (26) can be used to evaluate the associated error L(k).
The actual tail-start index k can then be estimated as the index for which the fit error L(k) is a minimum. This assumption is reasonable, because k marks a rapid departure of the noisy singular values from the fitted Marchenko-Pastur distribution.
For example, Fig. 13a shows a distribution of noisy singular values and the resulting best-fit Marchenko-Pastur distribution. Figure 13b shows the mean square error L(k) (26) and the fitted noise level, log 10 � (k) (27) versus tail start index k. Observe that the mean square error L(k) has a clear minimum (at the index marked k ), which suggests that the overall best fit is formed with k =k . Moreover, note that there is very low sensitivity of the inferred noise level � (k) to the particular choice of index k , so this procedure provides a robust way to determine ′ .
For i.i.d. noise, w = f = 1 , and the above methods can be used to find ′ and k . For spatially-correlated noise, w and (27) log 10 f need to be determined as part of the fit. In our procedure, we determine the best fit using (26)/(27) for a series of f, so that we build a table of L(k,f ) . The best fit is that corresponding to the k and f for which the fit error L(k,f ) is a minimum.

Spatially-correlated Noise
Spatially-correlated noise can occur in experimental data that are spatially smoothed during collection or processing. For example, PIV data are typically collected from overlapping interrogation windows, and processing typically includes smoothing by a weighted average over the nine nearest neighbors. Such a dataset effectively has fewer than D independent data sites. Thus, it is reasonable to expect that the singular values of spatially-correlated random data still follow a Marchenko-Pastur distribution, but with parameter y = T∕D replaced by fT / D (see Epps and Krivitzky 2019, Appendix D). Indeed, we have empirically found this approximation to work well when D∕T ≳ 20 and D∕fT ≳ 5 . The 'spatial-correlation factor' f represents the ratio so f > 1 indicates effectively-fewer independent data sites due to spatial correlation.
For example, consider random data with spatial correlation that is produced by taking a moving average of i.i.d. random data. Such an average could either have uniform weighting 1 / w or Gaussian weighting (see Matlab b Spatial correlation factor f plotted versus smoothing window width w; the uniform smoothing data are from a, and the Gaussian smoothing data and PIV data are given in (Epps and Krivitzky 2019) gausswin function), where w is the width of the smoothing window. Figure 14a shows the singular values of random data with uniform spatial smoothing. The w = 1 curve corresponds to the original i.i.d. data (no smoothing) and is well represented by the original Marchenko-Pastur distribution ( f = 1 ). Clearly, the 'spatial-correlation factor' f increases with increasing 'smoothing window width' w. Figure 14b shows these f versus w data for uniform smoothing and Gaussian smoothing, as well as the f versus w data from the synthetic PIV example for several PIV processing schemes. These w data were determined in (Epps and Krivitzky 2019) so as to match the theoretical prediction of rmse(ũ 1 ) to the numerical value. Equation (12) is a curve fit to these synthetic PIV data and provides an empirical relationship between w and f.
The precise value of w is not critically important for noise filtering, since it is recommended that w = 1 be used in (13) to the evaluate rmse(ṽ k ) for PIV data. However, it is conceivable that one might be interested in knowing the precise value of w for a particular flow problem or PIV processing scheme. Under those circumstances, it is recommended to use a known dataset to determine the f-w relation for the new PIV processing scheme before post-processing the target data.

Summary of error estimation procedure
To summarize, the following procedure is used to determine f, k , ′ , ̄ , and w: 1. For each f = 1.0, 1.1, … , floor(D∕T) 9 and tail-start index k = 1, … , (0.8 T), 10 compute the unit Marchenko-Pastur distribution ŝ , evaluate (27) to determine the best-fit noise level � (f ,k) , and compute the associated mean square error L(f ,k) via (26). 2. Find the f and index k for which L(f ,k) is a minimum. Set k to this selected k. 3. Form a preliminary estimate of the measurement error: Equation (29) follows from the observation that the noisy singular values s k rapidly depart from the Marchenko-Pastur distribution ′ŝ near the value �ŝ = √ D . Note, however, that typically ′ is greater than the true while ′′ is less than , so a weighted average of ′ and ′′ typically forms the best estimate of .

Find index
5. Estimate the measurement error by 6. Using the best fit f, evaluate (12) to find w.
This approach has proven to be straightforward and robust under an array of singular value distributions, error levels, and datasets. This procedure provides a robust alternative to the traditional "scree test" (Cattell 1966), which can be foiled by closely spaced singular values, such as those in Sect. 3.2 example.

Appendix 2: Derivation of optimum reconstruction when using noisy modes
Consider reconstruction using the noisy mode shapes {ũ ,ṽ } and some optimum singular values s to be determined: We now derive the s that minimize the reconstruction loss r ≡ ‖ −Ā r ‖ 2 F (4). In general, the loss can be written as: with implied summation over k = 1, … , T and = 1, … , r , i = 1, … , T , and j = 1, … , D . The minimum loss can be found by taking partial derivatives of (34) with respect to each s and setting them to zero now with no implied sum over . Clearly, the optimum s is s = s c with c defined as 9 For i.i.d. data, only use f = 1. 10 We cap k at 0.8 T in order to leave sufficient singular values in the tail to yield a reliable and meaningful tail fit.
In general, the reconstruction Ā r = ∑ r =1ūsv ⊺ from (3) depends on the choice of rank, as well as estimates of the clean singular values and vectors. In this section, we consider hypothetical reconstruction methods that use various combinations of the noisy and clean variables: where c is defined below. Obviously, the methods that use the clean variables ( u ,s ,v ) are only hypothetical, because these clean variables are unknown if just given Ã ; nevertheless, these methods are interesting, because they illustrate the best-case scenarios for reconstruction losses. Comparing these hypothetical data reconstruction methods allows us to answer two important questions: Which has more of an effect on the reconstruction loss, perturbations to the singular values or to the singular vectors? Figure 15 shows that reconstruction loss is affected much more by perturbations to the singular vectors than by perturbations to the singular values. For example, the reconstruction loss due to perturbed singular vectors ũ sṽ ⊺ (blue curve) is much larger than that due to perturbed singular values us v ⊺ (green curve).

How important to the reconstruction accuracy is it to estimate the clean singular values?
Observe that the ' ũsṽ ⊺ ' reconstruction (red curve) has a local minimum (near r = k 2 ), whereas the ' ũ sṽ ⊺ ' reconstruction (blue curve) has nearly constant loss for r > k 2 . Thus, the advantage of estimating the clean singular values is that the resulting reconstruction loss will be much less sensitive to the choice of rank.

Assessment of reconstruction error predictions from perturbation theory
In this section, we use perturbation theory to make theoretical predictions of the expected reconstruction losses ⟨ r ⟩ (4) for each of the hypothetical reconstruction schemes shown in Fig. 15. This analysis is relegated to the appendix, because these perturbation theory predictions are relatively poor. Indeed, Fig. 15 shows poor agreement between the mean reconstruction loss determined from Monte Carlo simulations (solid curves) and the perturbation theory predictions derived herein (dashed curves, formulae in Table 2). The reason for the poor agreement is that the perturbation theory predictions are not very accurate for k > k 2 , which is the region of interest in Fig. 15. One interesting result is that perturbation theory predicts that most of the loss is due to the perturbations of the singular vectors, not the singular values. Indeed, comparing Eqs. (61) and (63), we find that if the clean singular values are used for reconstruction (instead of the noisy ones), then the loss is only improved by 2 r. Table 2 summarizes the perturbation theory predictions of the expected loss from reconstructions of the form: The entries of Table 2 are derived as follows. In general, the expected loss ⟨ r ⟩ ≡ ⟨‖ −Ā r ‖ 2 F ⟩ is where = s 2 , and summation is implied over k = 1, … , T and = 1, … , r , i = 1, … , T , and j = 1, … , D . The crux of  Table 2) Table 2 Expected reconstruction loss ⟨ r ⟩ ≡ ⟨‖ −Ā r ‖ 2 F ⟩ from several reconstruction methods estimating ⟨ r ⟩ is in evaluating ⟨A ijĀij ⟩ = ⟨U ikŪi s ks V jkVj ⟩ using perturbation theory. For method (56), Ā r = ∑ r =1 u s v ⊺ , we have ⟨̄ ⟩ = and ⟨A ijĀij ⟩ = ⟨U ik U i s k s V jk V j ⟩ = s k s k = . Trivially, the loss (46) then is ⟨ r ⟩ = ∑ T k=r+1 k (57). This is the minimum possible loss for a rank r reconstruction, per the Schmidt theorem (1907).