Singular value decomposition of noisy data: mode corruption

Epps, Brenden P.; Krivitzky, Eric M.

doi:10.1007/s00348-019-2761-y

Singular value decomposition of noisy data: mode corruption

Research Article
Published: 08 July 2019

Volume 60, article number 121, (2019)
Cite this article

Experiments in Fluids Aims and scope Submit manuscript

Brenden P. Epps¹ &
Eric M. Krivitzky¹

1666 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Although the singular value decomposition (SVD) and proper orthogonal decomposition have been widely used in fluid mechanics, Venturi (J Fluid Mech 559:215–254, 2006) and Epps and Techet (Exp Fluids 48:355–367, 2010) were among the first to consider how noise in the data affects the results of these decompositions. Herein, we extend those studies using perturbation theory to derive formulae for the 95% confidence intervals of the singular values and vectors, as well as formulae for the root mean square error (rmse) of each noisy SVD mode. Moreover, we show that the rmse is well approximated by $\epsilon /\tilde{s}_k$ (where $\epsilon$ is the rms noise and $\tilde{s}_k$ is the singular value), which provides a useful estimate of the overall uncertainty in each mode.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Eighty Years of the Finite Element Method: Birth, Evolution, and Future

Article Open access 13 June 2022

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Article 05 June 2015

Singular value decomposition of noisy data: noise filtering

Article Open access 16 July 2019

Notes

The SVD is related to the biorthogonal decomposition (Aubry 1991) and the method of empirical orthogonal functions (Loren 1956). The POD (Berkooz et al. 1993; Holmes et al. 1996, 1997) is related to the Karhunen–Loève transform (Karhunen 1946; Loève 1978), principal components analysis (Pearson 1901), the method of empirical eigenfunctions, and the method of snapshots (Sirovich 1987).
The number of data sites D is the number of individual pieces of data at each time step. For example, consider sampling two-dimensional velocity data on an $I \times J$ grid of field points; then $D = 2IJ$ is the total number of data sites.
Ideally, $\mathbf{E}$ contains i.i.d. noise drawn from a Gaussian distribution, but herein we also consider $\mathbf{E}$ containing spatially correlated noise, as occurs in PIV data.
In terms of the POD eigenvalues ${\tilde{\lambda }}_k = \tilde{s}_k^2$, the threshold criterion (3) requires ${\tilde{\lambda }}_k > \epsilon ^2 TD$.
Note that the reconstructed singular values $\bar{s}_k$ could be used in place of the noisy ones $\tilde{s}_k$, but we find this makes little difference in the predicted rmse.
Proof: since $\mathbf{U}$ is orthogonal ($\mathbf{U}\mathbf{U}^\intercal = \mathbf{I}$), we can write (41) as $\mathbf{H}= \mathbf{U}{\varvec{{\Lambda }}} \mathbf{U}^\intercal$. At the same time, $\mathbf{H}= \mathbf{A}\mathbf{A}^\intercal = \mathbf{U}\mathbf{S}\mathbf{V}^\intercal \mathbf{V}\mathbf{S} \mathbf{U}^\intercal = \mathbf{U}\mathbf{S}^2 \mathbf{U}^\intercal$.
Kato uses the notation: $\chi$, $\mathbf{T}$, and $\mathbf{T}(\chi )$ for $\epsilon$, $\mathbf{H}$, and ${\tilde{\mathbf{H}}}$, respectively. Kato and Venturi use $\mathbf{S}$ for $\mathbf{Q}$.
If $\mathbf{H}$ has repeated eigenvalues, then Eq. (84) represents the weighted mean of such eigenvalues. In this case, the present theory then needs to be modified (via Kato’s reduction theory). However, these modifications complicate the analysis and prevent one from simplifying the results into forms as simple as, for example, equation (87).
Although matrix $\mathbf{W}^{(1)}$ refers to mode k, we have omitted the subscript k to facilitate referring to its $im\text {th}$ element as $W^{(1)}_{im}$. The $i\text {th}$ element of vector ${\tilde{\mathbf{u}}}_k$ is $\tilde{U}_{ik}$.
Note that all odd “powers” of $\mathbf{E}$ average to zero, so $\langle W^{(1)}_{im}U_{mk} \rangle = \langle W^{(3)}_{im}U_{mk} \rangle = \big \langle ( W^{(1)}_{im}U_{mk} ) \, ( W^{(2)}_{in}U_{nk} ) \big \rangle = \dots = 0$.
Note that again all odd “powers” of $\mathbf{E}$ average to zero, so $\langle N^{(1)}_{im}V_{mk} \rangle = \langle N^{(3)}_{im}V_{mk} \rangle = \dots = 0$.
The author prefers to interpolate using a piecewise cubic Hermite interpolating polynomial, pchip, because it provides continuity of the function and its first derivative while not being susceptible to overshoots as in a cubic spline. In Matlab, $g' = \texttt {pchip}(x,g, x')$ returns g(x) evaluated at $x'$.

References

Aubry N (1991) On the hidden beauty of the proper orthogonal decomposition. Theor Comput Fluid Dyn 2:339–352
Article Google Scholar
Beltrami E (1873) Sulle funzioni bilineari. English translation by D. Boley is available as Techical Report 90-37, University of Minnesota Department of Computer Science, Minneapolis, MN, 1990
Benaych-Georges F, Nadakuditi RR (2011) The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv Math 227:494–521
Article MathSciNet Google Scholar
Berkooz G, Holmes P, Lumley JL (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annu Rev Fluid Mech 25:539–575
Article MathSciNet Google Scholar
Breuer K, Sirovich L (1991) The use of the Karhunen-Loève procedure for the calculation of linear eigenfunctions. J Comput Phys 96:277–296
Article MathSciNet Google Scholar
Brindise MC, Vlachos PP (2017) Proper orthogonal decomposition truncation method for data denoising and order reduction. Exp Fluids 58(4):28
Article Google Scholar
Cagney N, Balabani S (2013) On multiple manifestations of the second response branch in streamwise vortex- induced vibrations. Phys Fluids 25(7):075110
Article Google Scholar
Charonko JJ, King CV, Smith BL, Vlachos PP (2010) Assessment of pressure field calculations from particle image velocimetry measurements. Meas Sci Technol 21(10):105401
Article Google Scholar
Cohen K, Siegel S, McLaughlin T, Gillies E (2003) Feedback control of a cylinder wake low-dimensional model. AIAA J 41(7):1389–1391
Article Google Scholar
Davis C, Kahan W (1970) The rotation of eigenvectors by a perturbation. III. SIAM J Numer Anal 7(1):1–46
Article MathSciNet Google Scholar
Dawson STM, Hemati MS, Williams MO, Rowley CW (2016) Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Exp Fluids 57(3):42
Article Google Scholar
Dopico FM (2000) A note on $\sin \theta$ theorems for singular subspace variations. BIT 40(2):395–403
Article MathSciNet Google Scholar
Druault P, Bouhoubeiny E, Germain G (2012) POD investigation of the unsteady turbulent boundary layer developing over porous moving flexible fishing net structure. Exp Fluids 53:277–292
Article Google Scholar
Epps BP, Krivitzky EM (2019) Singular value decomposition of noisy data: noise filtering. Exp Fluids (accepted)
Epps BP, Techet AH (2010) An error threshold criterion for singular value decomposition modes extracted from PIV data. Exp Fluids 48:355–367
Article Google Scholar
Feng LH, Wang JJ, Pan C (2011) Proper orthogonal decomposition analysis of vortex dynamics of a circular cylinder under synthetic jet control. Phys Fluids 23(1):014106
Article Google Scholar
Gandhi V, Bryant DB, Socolofsky SA, Stoesser T, Kim JH (2015) Concentration-based decomposition of the flow around a confined cylinder in a UV disinfection reactor. J Eng Mech 141(12):04015050
Article Google Scholar
Holden D, Socha JJ, Cardwell ND, Vlachos PP (2014) Aerodynamics of the flying snake chrysopelea paradisi: how a bluff body cross-sectional shape contributes to gliding performance. J Exp Biol 217(3):382–394
Article Google Scholar
Holmes P, Lumley JL, Berkooz G (1996) Turbulence, coherent structures, dynamic systems, and symmetry. Cambridge University Press, Cambridge
Book Google Scholar
Holmes PJ, Lumley JL, Berkooz G, Mattingly JC, Wittenberg RW (1997) Low-dimensional models of coherent structures in turbulence. Phys Rep 287:337–384
Article MathSciNet Google Scholar
Jordan C (1874a) Mémoire sur les formes bilinéaires. J Math Pures Appl 19:35–54
MATH Google Scholar
Jordan C (1874b) Sur la réduction des formes bilinéaires. Comptes Rend Acad Sci 78:614–617
MATH Google Scholar
Karhunen K (1946) Zur spektraltheorie stochastischer prozesse. Ann Acad Sci Fennicae A1:34
MathSciNet MATH Google Scholar
Kato T (1976) Perturbation theory for linear operators. Springer, Berlin
MATH Google Scholar
Kourentis L, Konstantinidis E (2012) Uncovering large-scale coherent structures in natural and forced turbulent wakes by combining PIV, POD, and FTLE. Exp Fluids 52:749–763
Article Google Scholar
Kriegseis J, Dehler T, Pawlik M, Tropea C (2009) Pattern-identification study of the flow in proximity of a plasma actuator. In: 47th AIAA aerospace sciences meeting, p 1001
Li RC (1998) Relative perturbation theory: (i) eigenvalue and singular value variations. SIAM J Matrix Anal Appl 19(4):956–982
Article MathSciNet Google Scholar
Loève M (1978) Probability theory. Springer, Berlin
MATH Google Scholar
Lorenz EN (1956) Empirical orthogonal functions and statistical weather prediction. Tech. rep., MIT
Marchenko VA, Pastur LA (1967) Distribution of eigenvalues for some sets of random matrices. Mat Sbornik 114(4):507–536
Google Scholar
Marié S, Druault P, Lambaré H, Schrijer F (2013) Experimental analysis of the pressure-velocity correlations of external unsteady flow over rocket launchers. Aerosp Sci Technol 30:83–93
Article Google Scholar
Mokhasi P, Rempfer D (2004) Optimized sensor placement for urban flow measurement. Phys Fluids 16(5):1758–1764
Article Google Scholar
Neal DR, Sciacchitano A, Smith BL, Scarano F (2015) Collaborative framework for piv uncertainty quantification: the experimental database. Meas Sci Technol 26(7):074003. http://stacks.iop.org/0957-0233/26/i=7/a=074003
Article Google Scholar
Nguyen TD, Wells JC, Mokhasi P, Rempfer D (2010) POD-based estimations of the flowfield from PIV wall gradient measurements in the backward-facing step flow. In: Proceedings of ASME 2010 3rd joint US-European fluids engineering summer meeting and 8th international conference on nanochannels, microchannels, and minichannels
Pearson K (1901) LIII on lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci 2(11):559–572
Article Google Scholar
Rajaee M, Karlsson S, Sirovich L (1994) Low-dimensional description of free-shear-flow coherent structures and their dynamical behaviour. J Fluid Mech 258:1–29
Article Google Scholar
Rowley CW, Mezic I, Bagheri S, Schlatter P, Henningson D (2009) Spectral analysis of nonlinear flows. J Fluid Mech 641:115–127
Article MathSciNet Google Scholar
Schmidt E (1907) Zur theorie der linearen und nichtlinearen integralgleichungen. I teil. Entwicklung willkurlichen funktionen nach system vorgeschriebener. Math Annal 63:433–476
Article Google Scholar
Sirovich L (1987) Turbulence and the dynamics of coherent structures. Part 1: coherent structures, Part 2: symmetries and transformations, Part 3: dynamics and scaling. Q Appl Math 45:561–590
Article Google Scholar
Stewart GW (1978) A note on the perturbations of singular values. Tech. Rep. TR-720, University of Maryland
Stewart GW (1990) Perturbation theory for the singular value decomposition. Technical Report UMIACS-TR-90-124, CS-TR 2539, University of Maryland
Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566
Article MathSciNet Google Scholar
Strang (2009) Introduction to linear algebra, 4th edn. Wellesley-Cambridge Press, Wellesley
MATH Google Scholar
Sylvester JJ (1889a) A new proof that a general quadric may be reduced to its canonical form (that is, a linear function of squares) by means of a real orthogonal substitution. Messenger Math 19:1–5
MATH Google Scholar
Sylvester JJ (1889b) On the reduction of a bilinear quantic of the nth order to the form of a sum of n products by a double orthogonal substitution. Messenger Math 19:42–46
Google Scholar
Tu JH, Rowley CW, Luchtenburg DM, Brunton SL, Kutz JN (2013) On dynamic mode decomposition: theory and applications. arXiv:1312.0041
Utturkar Y, Zhang B, Shyy W (2005) Reduced-order description of fluid flow with moving boundaries by proper orthogonal decomposition. Inte J Heat Fluid Flow 26:276–288
Article Google Scholar
Venturi D (2006) On proper orthogonal decomposition of randomly perturbed fields with applications to flow past a cylinder and natural convection over a horizontal plate. J Fluid Mech 559:215–254
Article MathSciNet Google Scholar
Venturi D, Karniadakis GE (2004) Gappy data and reconstruction procedures for flow past a cylinder. J Fluid Mech 519:315–336
Article MathSciNet Google Scholar
Wedin PA (1972) Perturbation bounds in connection with singular value decomposition. BIT 12:99–111
Article MathSciNet Google Scholar
Weyl H (1912) Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Math Annal 71:441–479
Article MathSciNet Google Scholar
Yildirim B, Chryssostomidis C, Karniadakis G (2009) Efficient sensor placement for ocean measurements using low-dimensional concepts. Ocean Model 27:160–173
Article Google Scholar

Download references

Acknowledgements

B. P. Epps dedicates this paper to Julian D. Cole, who likely would have enjoyed working on this problem. B. P. Epps is grateful to Ben Southworth for his initial analytical studies.

Author information

Authors and Affiliations

Thayer School of Engineering, Dartmouth College, Hanover, NH, 03755, USA
Brenden P. Epps & Eric M. Krivitzky

Authors

Brenden P. Epps
View author publications
You can also search for this author in PubMed Google Scholar
Eric M. Krivitzky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brenden P. Epps.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Relation between the SVD and POD

For discrete data the POD and SVD yield identical results. In fact, we now show that the $\mathbf{U}$, $\mathbf{S}$ and $\mathbf{V}$ obtained via the POD are identical to those obtained via the SVD. Given real-valued data $\mathbf{A}\in \mathbb {R}^{T\times D}$ with $T<D$, the POD procedure of Sirovich (1987) is as follows:

1.
Form the temporal autocorrelation matrix $\mathbf {H} \in \mathbb {R}^{T\times T}$ via:
$$\begin{aligned} \mathbf {H} = \mathbf {A} \mathbf {A}^\intercal \,. \end{aligned}$$
(40)
2.
Solve the matrix eigenvalue problem
$$\begin{aligned} \mathbf {H} \mathbf {U} = \mathbf {U} \varvec{\Lambda }. \end{aligned}$$
(41)
This eigenvector matrix $\mathbf {U} \in \mathbb {R}^{T\times T}$ is identical to the SVD left singular vector matrix, and the eigenvalue matrix $\mathbf {\Lambda } \in \mathbb {R}^{T\times T}$ contains the squares of the T singular values of $\mathbf {A}$, $\lambda _k = s_k^2$.^{Footnote 6}
3.
Find the scaled POD modes, $\mathbf {M}$, by projecting the eigenvectors onto the data set
$$\begin{aligned} \mathbf {M} = \mathbf {U}^\intercal \mathbf {A}. \end{aligned}$$
(42)
Using (1), Eq. (42) is equivalent to $\mathbf {M} = \mathbf {U}^\intercal \mathbf {U} \mathbf {S} \mathbf {V}^\intercal$ which simplifies to $\mathbf {M} = \mathbf {S} \mathbf {V}^\intercal$.
4.
Rescale each mode to unit norm, and record the amplitudes. In other words, separate $\mathbf {S} \mathbf {V}^\intercal$ into $\mathbf {S}$ and $\mathbf {V}^\intercal$.

Thus, the end results of the POD procedure are the same $\mathbf {U}$, $\mathbf {S}$, and $\mathbf {V}$ that are given by the SVD.

Appendix 2: Perturbation theory derivations

In this section, we use perturbation theory to derive expressions for the expected values and standard deviations of the noisy singular values and vectors ($\langle {\tilde{\mathbf{u}}}_k \rangle$, $\langle \tilde{s}_k \rangle$, $\langle {\tilde{\mathbf{v}}}_k \rangle$, $\sigma _{{\tilde{\mathbf{u}}}_k}$, $\sigma _{\tilde{s}_k}$, $\sigma _{{\tilde{\mathbf{v}}}_k}$). The present derivation builds upon the work of Kato (1976) and Venturi (2006): Kato (1976) derived the perturbations of the eigenvalues and eigenvectors of a noisy matrix, and Venturi (2006) made the key realization that because the SVD of ${\tilde{\mathbf{A}}}$ is related to the eigendecomposition of ${\tilde{\mathbf{A}}}{\tilde{\mathbf{A}}}^\intercal$, Kato’s theory can be used to describe the perturbations of the SVD. Herein, we extend these results in two ways: (1) by assuming unique singular values, we are able to greatly simplify the perturbation theory results into useful formulae for the experimentalist; and (2) we extend the theory to account for spatially correlated noise, which occurs in PIV.

Before we can derive the desired quantities, we need to establish several definitions. The temporal autocorrelation matrix $\mathbf {H} \in \mathbb {R}^{T\times T}$ for the analytic data is defined as:

$$\begin{aligned} \mathbf{H}\equiv \mathbf{A}\mathbf{A}^\intercal \,. \end{aligned}$$

(43)

The temporal autocorrelation matrix ${\tilde{\mathbf{H}}}$ for the noisy data ${\tilde{\mathbf{A}}}=\mathbf{A}+\mathbf{E}$ is thus defined as

$$\begin{aligned} {\tilde{\mathbf{H}}}\equiv {\tilde{\mathbf{A}}}{\tilde{\mathbf{A}}}^\intercal&= \mathbf{A}\mathbf{A}^\intercal + \mathbf{A}\mathbf{E}^\intercal + \mathbf{E}\mathbf{A}^\intercal + \mathbf{E}\mathbf{E}^\intercal \,, \end{aligned}$$

(44)

$$\begin{aligned}&\equiv \mathbf{H}+ \epsilon \, {\hat{\mathbf{H}}}^{(1)}+ \epsilon ^2\, {\hat{\mathbf{H}}}^{(2)}\,, \end{aligned}$$

(45)

where ${\hat{\mathbf{H}}}^{(1)}\equiv \mathbf{A}{\hat{\mathbf{E}}}^\intercal + {\hat{\mathbf{E}}}\mathbf{A}^\intercal$ and ${\hat{\mathbf{H}}}^{(2)}\equiv {\hat{\mathbf{E}}}{\hat{\mathbf{E}}}^\intercal$. The elements of ${\hat{\mathbf{E}}}\equiv \mathbf{E}/\epsilon$ are each $\mathcal{N}(0,1)$. These error data can either be independent or spatially correlated, as discussed in “Appendix 2.1" and “Appendix 2.2” below.

Herein, we assume that all eigenvalues of $\mathbf{H}$ are unique (a.k.a. simple, multiplicity of one), which implies that the eigennilpotent matrix $(\mathbf{H}- \lambda _k \mathbf{I})\mathbf{P}_k$ is zero for all modes. This allows us to considerably simplify the presentation from that in Kato (1976) and Venturi (2006). The basic idea is that for repeated (a.k.a. degenerate) eigenvalues, the perturbation causes these eigenvalues to split and become unique upon the perturbation. Thus, although there might be cases where $\mathbf{H}$ has repeated eigenvalues, ${\tilde{\mathbf{H}}}$ will always have unique eigenvalues. Since we are concerned with the practical experimental scenario of only knowing ${\tilde{\mathbf{H}}}$, it suffices for our purposes to only consider the case of unique eigenvalues, because the experimentalist will never know whether or not the original eigenvalues were unique.

Continuing with the required definitions, now define the projection matrix $\mathbf{P}\equiv \mathbf{A}(\mathbf{A}^\intercal \mathbf{A})^{-1}\mathbf{A}^\intercal = \mathbf{U}\mathbf{U}^\intercal$, and define the mode k projection matrix by

$$\begin{aligned} \mathbf{P}_k \equiv \mathbf{u}_k \mathbf{u}_k^\intercal \,. \end{aligned}$$

(46)

Given vector $\mathbf {x}$, the projection of $\mathbf {x}$ onto the column space of $\mathbf{A}$ is $\mathbf{P}\mathbf {x}$, and the projection of $\mathbf {x}$ onto mode k is $\mathbf{P}_k \mathbf {x}$. Some useful identities are these: $\mathbf{P}^2 = \mathbf{P}$, $\mathbf{P}_k^2 = \mathbf{P}_k$, $\mathbf{P}_j\mathbf{P}_k = \mathbf{P}_k \delta _{jk}$ ($\delta _{jk}$ is the Kronecker delta). Note that since $\mathbf{H}= \mathbf{U}{\varvec{{\Lambda }}} \mathbf{U}^\intercal = \sum _{k=1}^T \lambda _k \mathbf{P}_k$, the following additional results are true: $\mathbf{P}\mathbf{H}= \mathbf{H}\mathbf{P}= \mathbf{P}\mathbf{H}\mathbf{P}= \mathbf{H}$, and $\mathbf{H}\mathbf{P}_k = \mathbf{P}_k\mathbf{H}= \mathbf{P}_k\mathbf{H}\mathbf{P}_k = \lambda _k \mathbf{P}_k$.

Kato’s theory makes use of the resolvent matrix, $\mathbf{R}(\zeta ) \equiv (\mathbf{H}- \zeta \mathbf{I})^{-1}$. The eigenvalues of $\mathbf{H}$ are the singularities of $\mathbf{R}(\zeta )$. Since $\mathbf{H}\mathbf{P}_k = \lambda _k \mathbf{P}_k$, it is natural to consider the reduced resolvent matrix^{Footnote 7}

$$\begin{aligned} \mathbf{Q}_k \equiv \sum \limits _{j=1 \atop j\ne k}^T \frac{\mathbf{P}_j}{\lambda _j - \lambda _k} \,. \end{aligned}$$

(47)

Since $\mathbf{P}_j\mathbf{P}_k = \mathbf{P}_k \delta _{jk}$, observe that $\mathbf{P}_k\mathbf{Q}_k = \mathbf{Q}_k\mathbf{P}_k = 0$, and thus $\mathbf{Q}_k^2 \equiv \sum _{j\ne k} \frac{\mathbf{P}_j}{(\lambda _j - \lambda _k)^2}$.

The expected value calculations are a tedious yet laborious application of the above definitions and identities. It is convenient to use index notation with the stipulation that summation over i, j, and k is not implied. We will define the $ij{\text{th}}$ element of several matrices, many of which are defined specially for the $k{\text{th}}$ mode. Repeated indices other than i, j, or k indicate implied summation over $1,\ldots ,T$ or $1,\ldots ,D$.

In this section, we consider the expected values of various combinations of ${\hat{\mathbf{H}}}^{(1)}$ and ${\hat{\mathbf{H}}}^{(2)}$ with $\mathbf{P}_k$, $\mathbf{Q}_k$, and $\mathbf{Q}_k^2$. Using index notation (but with no implied sum over k), we have

$$\begin{aligned} A_{ij}&\equiv U_{im}S_{mn}V_{jn} = U_{im}V_{jm}s_m, \end{aligned}$$

(48)

$$\begin{aligned} H_{ij}&\equiv A_{im}A_{jm} = U_{im}U_{jm}s_m^2 , \end{aligned}$$

(49)

$$\begin{aligned} \hat{H}^{(1)}_{ij}&\equiv A_{im}\hat{E}_{jm} + \hat{E}_{im}A_{jm}, \end{aligned}$$

(50)

$$ = (U_{in}\hat{E}_{jm} + \hat{E}_{im}U_{jn}) V_{mn}s_n , $$

(51)

$$\begin{aligned} \hat{H}^{(2)}_{ij}&\equiv \hat{E}_{im}\hat{E}_{jm}, \end{aligned}$$

(52)

$$\begin{aligned} (\mathbf{P}_k)_{ij}&\equiv U_{ik}U_{jk}, \end{aligned}$$

(53)

$$\begin{aligned} U_{mk}U_{mk}&= 1, \end{aligned}$$

(54)

$$\begin{aligned} (\mathbf{Q}_k)_{ij}&\equiv U_{im} U_{jm} (\lambda _m - \lambda _k )^{-1} (1-\delta _{mk}) . \end{aligned}$$

(55)

1.1 Spatially independent error data

Assuming each element of $\mathbf{E}\in \mathbb {R}^{T\times D}$ is independent, identically distributed $\mathcal{N}(0,\epsilon ^2)$, all odd powers of $\mathbf{E}$ terms are expected to be zero for any combination of indices: $\langle E_{mn} \rangle = 0$, $\langle E_{mn}E_{pq}E_{rs} \rangle = 0$, and so on. Some even power terms are as follows:

$$\begin{aligned} \langle E_{mn}E_{pq} \rangle&= \epsilon ^2 \, \delta _{mp}\delta _{nq}, \end{aligned}$$

(56)

$$\begin{aligned} \langle E_{mn}E_{pn} \rangle&= \epsilon ^2 D \, \delta _{mp} , \end{aligned}$$

(57)

$$\begin{aligned} \langle E_{mn}E_{pq}E_{rs}E_{tu} \rangle&= \epsilon ^4 \big [ (\delta _{mp}\delta _{nq})(\delta _{rt}\delta _{su})\nonumber \\&\quad + (\delta _{mr}\delta _{ns})(\delta _{pt}\delta _{qu})\nonumber \\&\quad + (\delta _{mt}\delta _{nu})(\delta _{pr}\delta _{qs}) \big ] . \end{aligned}$$

(58)

1.2 Spatially correlated error data

Error data with spatial correlation can be modeled as that produced by uniform smoothing (i.e. a moving average, as in the Matlab smooth function) of i.i.d. random data as follows:

$$\begin{aligned} E_{mn} = \frac{1}{w} \sum _{q=1}^w \bar{E}_{m,q-1+n-(w-1)/2}, \end{aligned}$$

(59)

where w is the window width. It is easy to show that if ${\bar{\mathbf{E}}}$ is composed of i.i.d. random data with a normal distribution $\mathcal{N}(0,\bar{\epsilon }^2)$, then $\mathbf{E}$ also has a normal distribution, $\mathcal{N}(0,\epsilon ^2)$, but with standard deviation $\epsilon = \bar{\epsilon }/ \sqrt{w}$ and now spatial correlation [introduced by the spatial smoothing in (59)]. For proof, consider $\langle E_{mn}^2 \rangle$, with no implied sum over m or n. Note that for the original i.i.d. data, $\langle \bar{E}_{mn} \bar{E}_{pq} \rangle = \bar{\epsilon }^2 \, \delta _{mp}\delta _{nq}$, so

$$\begin{aligned} \epsilon ^2 = \langle E_{mn}^2 \rangle&= \left\langle \left( \frac{1}{w} \sum _{n=1}^w \bar{E}_{mn} \right) \left( \frac{1}{w} \sum _{q=1}^w \bar{E}_{mq} \right) \right\rangle \nonumber \\&= \frac{1}{w^2} \sum _{n=1}^w \sum _{q=1}^w ( \bar{\epsilon }^2 \, \delta _{nq}) = \frac{w \bar{\epsilon }^2}{w^2} = \frac{\bar{\epsilon }^2}{w} \,. \end{aligned}$$

(60)

Now consider the expected values of other useful $\mathbf{E}$ terms. Clearly, the odd powers are still expected to be zero: $\langle E_{mn} \rangle = 0$, $\langle E_{mn}E_{pq}E_{rs} \rangle = 0$, and so on. Also, the data are still uncorrelated in time, so the even power terms still involve delta functions of the first index, such as $\langle E_{mn} E_{pq} \rangle \sim \delta _{mp}$.

For perturbation theory, we are interested in sums across entire rows, such as $\sum _{q=1}^D \langle E_{mn} E_{pq} \rangle$. Such a “cross-sum” involves terms like $\langle E_{mn} E_{p,n+1} \rangle$ and $\langle E_{mn} E_{p,n+2} \rangle$, which are evaluated as follows:

$$\begin{aligned} \langle E_{mn} E_{p,n+1} \rangle&= \left\langle \left( \frac{1}{w} \sum _{n=1}^w \bar{E}_{mn} \right) \left( \frac{1}{w} \sum _{q=2}^{w+1} \bar{E}_{pq} \right) \right\rangle \nonumber \\&= \frac{1}{w^2} \sum _{n=1}^w \sum _{q=2}^{w+1} ( \bar{\epsilon }^2 \, \delta _{mp} \delta _{nq}) = \frac{w-1}{w} \epsilon ^2 \, \delta _{mp} \,, \end{aligned}$$

(61)

$$\begin{aligned} \langle E_{mn} E_{p,n+2} \rangle&= \frac{1}{w^2} \sum _{n=1}^w \sum _{q=3}^{w+2} ( \bar{\epsilon }^2 \, \delta _{mp} \delta _{nq}) = \frac{w-2}{w} \epsilon ^2 \, \delta _{mp} \,. \end{aligned}$$

(62)

Thus, the desired cross-sum can be computed as follows:

$$\begin{aligned} \begin{aligned} \sum _{q=1}^D \langle E_{mn} E_{pq} \rangle \,=\,&\left( \frac{1}{w} + \frac{2}{w} + \dots + \frac{w-1}{w} + 1 \right. \\&\left. + \frac{w-1}{w} + \dots + \frac{1}{w} \right) \epsilon ^2 \, \delta _{mp} = w \epsilon ^2 \, \delta _{mp}, \end{aligned} \end{aligned}$$

(63)

To make the perturbation theory analysis tractable, we make the following “lumping” approximation:

$$\begin{aligned} \langle E_{mn} E_{pq} \rangle = w \epsilon ^2 \, \delta _{mp} \delta _{nq}, \end{aligned}$$

(64)

Equation (64) is consistent with the row sum (63) and the i.i.d. case ($w=1$) (56). The approximation in (64) is to set all the $q\ne n$ terms in (63) to zero and instead to lump the contributions of these terms into the $q=n$ term. The consequence of this lumping approximation is that in the perturbation theory analysis, various $A_{pq}$ terms are then only evaluated at $q=n$; in other words, $A_{pq}$ takes the value $A_{pn}$ over the entire smoothing window. Since $A_{pq}$ is expected to vary only slightly over the smoothing window, this approximation is justified.

With the lumping approximation (64), the even power terms are now given as follows:

$$\begin{aligned} \langle E_{mn}E_{pq} \rangle&= w \epsilon ^2 \, \delta _{mp}\delta _{nq}, \end{aligned}$$

(65)

$$\begin{aligned} \langle E_{mn}E_{pn} \rangle&= \epsilon ^2 D \, \delta _{mp} , \end{aligned}$$

(66)

$$\begin{aligned} \big \langle \Vert \mathbf{E}\Vert _F \big \rangle&= \langle \sqrt{ E_{mn} E_{mn} } \rangle = \epsilon \sqrt{TD} , \end{aligned}$$

(67)

$$\begin{aligned} \langle E_{mn}E_{pq}E_{rs}E_{tu} \rangle&= \epsilon ^4 w^2 \big [ (\delta _{mp}\delta _{nq})(\delta _{rt}\delta _{su}) \nonumber \\&\quad + (\delta _{mr}\delta _{ns})(\delta _{pt}\delta _{qu})\nonumber \\&\quad + (\delta _{mt}\delta _{nu})(\delta _{pr}\delta _{qs}) \big ] . \end{aligned}$$

(68)

Note the key difference between expressions (65) and (68) versus (66) and (67). In (65), a sum is implied across $q = 1,\ldots ,D$, and this cross-sum leads to the appearance of w, as in (63). In (68), cross-sums are implied across each $\{q,s,u\}$, which leads to the appearance of $w^2$. By contrast, the weight w does not appear in (66) and (67), because the second index (n) is the same for both terms, and so no cross-sum is implied (but rather just a regular sum over all n).

Consider now other terms that appear in the perturbation theory derivations. Since all odd powers of $\mathbf{E}$ terms are expected to be zero, so to are odd “powers” of $\mathbf{H}^{(n)}$ terms. That is, $\langle H_{mn}^{(1)} \rangle = \langle H_{mn}^{(1)}H_{pq}^{(2)} \rangle = \langle H_{mn}^{(1)}H_{pq}^{(1)}H_{rs}^{(1)} \rangle = \dots = 0$. Some even “power” terms are these:

$$\begin{aligned}&\langle H_{mn}^{(1)}H_{pq}^{(1)} \rangle = \langle (A_{md}E_{nd}+E_{md}A_{nd})(A_{pe}E_{qe}+E_{pe}A_{qe}) \rangle \nonumber \\&\quad = \langle A_{md}A_{pe} E_{nd}E_{qe} + A_{md}A_{qe} E_{nd}E_{pe} \nonumber \\&\qquad + A_{nd}A_{pe} E_{md}E_{qe} + A_{nd}A_{qe} E_{md}E_{pe}\rangle \nonumber \\&\quad = w \epsilon ^2 ( H_{mp}\delta _{nq} + H_{mq}\delta _{np} \nonumber \\&\qquad\quad + H_{np}\delta _{mq} + H_{nq}\delta _{mp} ) \quad \text {(since only the}\,e=d\,\text {term survived)}, \end{aligned}$$

(69)

$$\begin{aligned}&\langle H_{mn}^{(2)} \rangle = \langle E_{md}E_{nd} \rangle \nonumber \\&\quad~ \quad = \epsilon ^2 D \, \delta _{mn} \quad \quad \text {(since implied sum over}\,d=1,\ldots ,D\text {)}, \end{aligned}$$

(70)

$$\begin{aligned}&\langle H_{mn}^{(2)}H_{pq}^{(2)} \rangle \nonumber = \langle E_{md}E_{nd} E_{pe}E_{qe} \rangle \\&\quad = \epsilon ^4 D^2 \, \delta _{mn}\delta _{pq} \nonumber + w^2 \epsilon ^4 D (\delta _{mp}\delta _{nq} + \delta _{mq}\delta _{np}), \end{aligned}$$

(71)

$$\begin{aligned}&\langle H_{mn}^{(1)}H_{pq}^{(1)} H_{rs}^{(2)} \rangle \nonumber \\&\quad = \langle (A_{md}E_{nd}+E_{md}A_{nd})(A_{pe}E_{qe}+E_{pe}A_{qe}) (E_{rg}E_{sg}) \rangle \nonumber \\&\quad = \langle A_{md}A_{pe} E_{nd}E_{qe}E_{rg}E_{sg} + A_{md}A_{qe} E_{nd}E_{pe}E_{rg}E_{sg} \nonumber \\&\qquad + A_{nd}A_{pe} E_{md}E_{qe}E_{rg}E_{sg} + A_{nd}A_{qe} E_{md}E_{pe}E_{rg}E_{sg}\rangle \nonumber \\&\quad = \epsilon ^4 [ H_{mp}(w D\, \delta _{nq} \delta _{rs} + w^2 \delta _{nr} \delta _{qs} + w^2 \delta _{ns} \delta _{qr}) \nonumber \\&\qquad + H_{mq}(w D\, \delta _{np} \delta _{rs} + w^2 \delta _{nr} \delta _{ps} + w^2 \delta _{ns} \delta _{pr}) \nonumber \\&\qquad + H_{np}(w D\, \delta _{mq} \delta _{rs} + w^2 \delta _{mr} \delta _{qs} + w^2 \delta _{ms} \delta _{qr}) \nonumber \\&\qquad + H_{nq}(w D\, \delta _{mp} \delta _{rs} + w^2 \delta _{mr} \delta _{ps} + w^2 \delta _{ms} \delta _{pr}) ]. \end{aligned}$$

(72)

The trace operation is $\text {tr}(\mathbf{X}) \equiv X_{mm}$, with implied sum over $m=1,\ldots ,T$. Also note the identities $\text {tr}(\mathbf{X} + \mathbf{Y}) = \text {tr}(\mathbf{X}) + \text {tr}(\mathbf{Y})$ and $\langle \text {tr}(\mathbf{X})\rangle = \text {tr}(\langle \mathbf{X}\rangle )$. Thus,

$$\begin{aligned} \big \langle \text {tr}\big ({\hat{\mathbf{H}}}^{(2)}\big )\big \rangle&= \langle \hat{E}_{mn}\hat{E}_{mn} \rangle = TD , \end{aligned}$$

(73)

$$\begin{aligned} \big \langle \text {tr}\big ({\hat{\mathbf{H}}}^{(2)}\mathbf{P}_k \big )\big \rangle&= \langle \hat{E}_{mn}\hat{E}_{pn} U_{pk} U_{mk} \rangle = D\, \delta _{mp} U_{pk} U_{mk} = D , \end{aligned}$$

(74)

$$\begin{aligned} \hat{\lambda }^{(1)}_k&= \text {tr}\left[ {\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k \right] \nonumber \\&\quad = [(U_{pn}\hat{E}_{qm} + \hat{E}_{pm}U_{qn}) V_{mn}s_n ]\, [U_{qk}U_{pk} ]\nonumber \\&\quad = 2 s_k U_{qk}\hat{E}_{qm} V_{mk}, \end{aligned}$$

(75)

$$\begin{aligned} \langle \hat{\lambda }^{(1)}_k \rangle&= 0 \end{aligned}$$

(76)

$$\begin{aligned} \langle \hat{\lambda }^{(2)}_k\rangle&= \big \langle \text {tr}\left[ {\hat{\mathbf{H}}}^{(2)}\mathbf{P}_k - ({\hat{\mathbf{H}}}^{(1)}\mathbf{Q}_k)({\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k) \right] \big \rangle \nonumber \\& = D - \big \langle \hat{H}^{(1)}_{mn} (\mathbf{Q}_k)_{np} \hat{H}^{(1)}_{pq}(\mathbf{P}_k)_{qm} \big \rangle \nonumber \\&= D - w ( H_{mp}\delta _{nq} + H_{mq}\delta _{np} + H_{np}\delta _{mq}\nonumber \\&\quad + H_{nq}\delta _{mp} ) ( U_{n\ell } U_{p\ell } (\lambda _\ell - \lambda _k)^{-1} (1-\delta _{\ell k}) ) (U_{qk}U_{mk}) \nonumber \\&= D - w (\lambda _\ell + \lambda _k) (\lambda _\ell - \lambda _k)^{-1} (1-\delta _{\ell k}), . \end{aligned}$$

(77)

$$\begin{aligned} \langle (\hat{\lambda }^{(1)}_k)^2 \rangle&= 4 w s_k^2 = 4 w \lambda _k . \end{aligned}$$

(78)

Computing the expected values of $\langle \big ( W^{(1)}_{im}U_{mk} \big )^2 \rangle$, $\langle W^{(2)}_{im}U_{mk} \rangle$, $\langle \big ( N^{(1)}_{im}V_{mk} \big )^2 \rangle$, and $\langle N^{(2)}_{im}V_{mk} \rangle$ is laborious but no more complicated than the examples shown here. The final results are as follows:

$$\begin{aligned} \left\langle \big ( W^{(1)}_{im}U_{mk} \big )^2 \right\rangle&= w \frac{ \lambda _m + \lambda _k }{(\lambda _m - \lambda _k)^2} U_{im}^2 (1-\delta _{mk}), \end{aligned}$$

(79)

$$\begin{aligned} \left\langle W^{(2)}_{im}U_{mk} \right\rangle&= -\frac{w}{2} \frac{\lambda _m+\lambda _k}{(\lambda _m-\lambda _k)^2} (1-\delta _{mk}) U_{ik}, \end{aligned}$$

(80)

$$\begin{aligned} \left\langle \big ( N^{(1)}_{im}V_{mk} \big )^2 \right\rangle&= \frac{1 - w V_{ik}^2}{ \lambda _k} + w\frac{\lambda _m(3\lambda _k-\lambda _m)}{\lambda _k(\lambda _m-\lambda _k)^2} V_{im}^2 (1-\delta _{mk}), \end{aligned}$$

(81)

$$\begin{aligned} \left\langle N^{(2)}_{im}V_{mk} \right\rangle&= -\frac{1}{2 \lambda _k} \left[ D - w + w \frac{\lambda _m(3\lambda _k-\lambda _m)}{(\lambda _m - \lambda _k)^2} (1-\delta _{mk}) \right] V_{ik}, \end{aligned}$$

(82)

with $(1-\delta _{mk})$ indicating a sum over $m=1,\ldots ,T$ but $m\ne k$.

Recall that the lumping approximation was used in formulae (65)–(82), so these formulae are also valid for the case of i.i.d. error data upon setting $w=1$.

1.3 Perturbed eigenvalues

Both for completeness and because the singular values are the square roots of the eigenvalues $\tilde{s}_k = \sqrt{{\tilde{\lambda }}_k}$, we first consider the perturbed eigenvalues ${\tilde{\lambda }}_k$. The $k{\text{th}}$ eigenvalue of ${\tilde{\mathbf{H}}}$ can be written as a perturbation expansion (Kato [II-2.21])

$$\begin{aligned} \tilde{\lambda }_k = \lambda _k + \epsilon \, \hat{\lambda }^{(1)}_k + \epsilon ^2\, \hat{\lambda }^{(2)}_k + \cdots , \end{aligned}$$

(83)

where, assuming the eigenvalues of $\mathbf{H}$ are unique^{Footnote 8} (Kato [II-2.33])

$$\begin{aligned} \hat{\lambda }^{(1)}_k&= \text {tr}\left[ {\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k \right] , \end{aligned}$$

(84a)

$$\begin{aligned} \hat{\lambda }^{(2)}_k&= \text {tr}\left[ {\hat{\mathbf{H}}}^{(2)}\mathbf{P}_k - ({\hat{\mathbf{H}}}^{(1)}\mathbf{Q}_k)({\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k) \right] . \end{aligned}$$

(84b)

Note that subscript k refers to the mode number, and no summation over k is implied. The expected value and standard deviation of $\tilde{\lambda }_k$ are (Venturi 2006)

$$\begin{aligned} \langle \tilde{\lambda }_k\rangle&= \lambda _k + \epsilon ^2\, \langle \hat{\lambda }^{(2)}_k \rangle + \mathcal{O}(\epsilon ^4), \end{aligned}$$

(85)

$$\begin{aligned} \sigma _{\tilde{\lambda }_k}&\equiv \sqrt{\Big \langle \big (\tilde{\lambda }_k - \langle \tilde{\lambda }_k\rangle \big )^2 \Big \rangle } =\epsilon \, \sqrt{\Big \langle \big (\hat{\lambda }^{(1)}_k\big )^2 \Big \rangle } + \mathcal{O}(\epsilon ^2) \,. \end{aligned}$$

(86)

Equation (85) follows from the fact that $\hat{\lambda }^{(1)}_k, \hat{\lambda }^{(3)}_k, \ldots$ are “odd” in ${\hat{\mathbf{E}}}$, so their expected values are zero. The evaluation of $\langle \hat{\lambda }^{(2)}_k \rangle$ and $\big \langle \big (\hat{\lambda }^{(1)}_k\big )^2 \big \rangle$ for i.i.d. error data is detailed in (Venturi 2006). Herein, we simplify his analysis by assuming unique eigenvalues, and we extend his results to the case of spatially correlated error. Substituting our Eqs. (77) and (78) into (85) and (86), we find

$$\begin{aligned} \langle \tilde{\lambda }_k\rangle&= \lambda _k + \epsilon ^2\, \Bigg ( D - w\sum \limits _{\begin{array}{c} m=1 \\ m\ne k \end{array}}^T \frac{\lambda _m + \lambda _k}{\lambda _m - \lambda _k} \Bigg ) + \mathcal{O}(\epsilon ^4) , \end{aligned}$$

(87)

$$\begin{aligned} \sigma _{\tilde{\lambda }_k}&= 2 s_k \,\epsilon \sqrt{w} + \mathcal{O}(\epsilon ^2) . \end{aligned}$$

(88)

1.4 Perturbed singular values

We now use the above results to find the perturbed singular values. Since the singular values of ${\tilde{\mathbf{A}}}$ are the square roots of the eigenvalues of ${\tilde{\mathbf{H}}}$, we have

$$\begin{aligned} \tilde{s}_k = \sqrt{\tilde{\lambda }_k} = \sqrt{\lambda _k + \epsilon \, \hat{\lambda }^{(1)}_k + \epsilon ^2\, \hat{\lambda }^{(2)}_k + \cdots } . \end{aligned}$$

(89)

If the error is small, then (89) can be expanded in a Taylor series about $\lambda _k$,

$$\begin{aligned} \begin{aligned} \tilde{s}_k&= \lambda_k^{\frac{1}{2}} + \frac{1}{2}\frac{1}{\lambda _k^\frac{1}{2}} (\epsilon \, \hat{\lambda }^{(1)}_k + \epsilon ^2\, \hat{\lambda }^{(2)}_k + \cdots ) \\&\quad -\, \frac{1}{2!}\frac{1}{4}\frac{1}{\lambda _k^\frac{3}{2}} (\epsilon \, \hat{\lambda }^{(1)}_k + \epsilon ^2\, \hat{\lambda }^{(2)}_k + \dots )^2 + \cdots , \end{aligned} \end{aligned}$$

(90)

and upon substituting $s_k = \sqrt{\lambda _k}$, we have

$$\begin{aligned} \tilde{s}_k = s_k + \epsilon \left( \frac{\hat{\lambda }^{(1)}_k}{2 s_k} \right) + \epsilon ^2 \left( \frac{\hat{\lambda }^{(2)}_k}{2 s_k} - \frac{(\hat{\lambda }^{(1)}_k)^2}{8 s_k^3} \right) + \mathcal{O}(\epsilon ^3) \,. \end{aligned}$$

(91)

The expected value and standard deviation of $\tilde{s}_k$ are

$$\begin{aligned} \langle \tilde{s}_k \rangle&= s_k + \epsilon ^2 \left( \frac{ \langle \hat{\lambda }^{(2)}_k \rangle }{2 s_k} - \frac{\left\langle (\hat{\lambda }^{(1)}_k)^2 \right\rangle }{8 s_k^3} \right) + \mathcal{O}(\epsilon ^4), \end{aligned}$$

(92)

$$\begin{aligned} \sigma _{\tilde{s}_k}&\equiv \sqrt{ \left\langle \left( \tilde{s}_k - \langle \tilde{s}_k \rangle \right) ^2 \right\rangle } = \epsilon \, \sqrt{ \left\langle \left( \frac{\hat{\lambda }^{(1)}_k}{2 s_k} \right) ^2 \right\rangle } + \mathcal{O}(\epsilon ^2) \,. \end{aligned}$$

(93)

Inserting (77) and (78) into (92) and (93) yields

$$\begin{aligned}&\boxed {\langle \tilde{s}_k \rangle = s_k + \frac{\epsilon ^2}{2 s_k} \Bigg ( D - w - w\sum \limits _{\begin{array}{c} m=1 \\ m\ne k \end{array}}^T \frac{\lambda _m + \lambda _k}{\lambda _m - \lambda _k}\Bigg ) + \mathcal{O}(\epsilon ^4) } \end{aligned}$$

(5)

$$\begin{aligned}&\boxed {\sigma _{\tilde{s}_k} =\epsilon \sqrt{w} + \mathcal{O}(\epsilon ^2) }\,. \end{aligned}$$

(6)

Note that to first order, $\sigma _{\tilde{\lambda }_k} = 2s_k\, \sigma _{\tilde{s}_k}$, which makes sense since $\lambda _k = s_k^2$, so $d\lambda _k = 2 s_k \, ds_k$.

1.5 Perturbed left singular vectors

The $k{\text{th}}$ left singular vector of ${\tilde{\mathbf{A}}}$ is (Kato [II-4.24])^{Footnote 9}

$$\begin{aligned} {\tilde{\mathbf{u}}}_k&= \mathbf{u}_k + \epsilon \, \mathbf{W}^{(1)}\mathbf{u}_k + \epsilon ^2\,\mathbf{W}^{(2)}\mathbf{u}_k + \cdots \end{aligned}$$

(94)

$$\begin{aligned} \tilde{U}_{ik}&= U_{ik} + \epsilon \, W^{(1)}_{im}U_{mk} + \epsilon ^2\,W^{(2)}_{im}U_{mk} + \cdots \,, \end{aligned}$$

(95)

with implied summation over $m=1,\ldots ,T$, where (Kato [II-4.23])

$$\begin{aligned} \mathbf{W}^{(1)}&= - \mathbf{Q}_k{\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k \end{aligned}$$

(96a)

$$\begin{aligned} \mathbf{W}^{(2)}&= -\, \mathbf{Q}_k{\hat{\mathbf{H}}}^{(2)}\mathbf{P}_k +(\mathbf{Q}_k{\hat{\mathbf{H}}}^{(1)})^2\mathbf{P}_k \nonumber \\&\quad -\,\mathbf{Q}_k^2({\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k)^2 -\tfrac{1}{2}\mathbf{P}_k{\hat{\mathbf{H}}}^{(1)}\mathbf{Q}_k^2{\hat{\mathbf{H}}}^{(1)}\mathbf{P}_k \,. \end{aligned}$$

(96b)

The expected value and standard deviation of $\tilde{U}_{ik}$ are (Venturi 2006)^{Footnote 10}

$$\begin{aligned} \langle \tilde{U}_{ik} \rangle&= U_{ik} + \epsilon ^2 \langle W^{(2)}_{im} U_{mk} \rangle + \mathcal{O}(\epsilon ^4) \end{aligned}$$

(97)

$$\begin{aligned} \sigma _{\tilde{U}_{ik}}&\equiv \left[ \left\langle \big (\tilde{U}_{ik} - \langle \tilde{U}_{ik}\rangle \big )^2 \right\rangle \right] ^{\frac{1}{2}} \nonumber \\&= \epsilon \left[ \left\langle \big ( W^{(1)}_{im}U_{mk} \big )^2 \right\rangle \right] ^{\frac{1}{2}} + \mathcal{O}(\epsilon ^2) \,. \end{aligned}$$

(98)

Extending Venturi’s results, with some effort to evaluate $\langle \big ( W^{(1)}_{im}U_{mk} \big )^2 \rangle$ and $\langle W^{(2)}_{im} U_{mk} \rangle$ [see Eqs. (79) and (80)], we find

$$\begin{aligned}&\boxed {\langle \tilde{U}_{ik} \rangle = \Bigg ( 1 - \epsilon ^2 \frac{w}{2} \sum \limits _{m=1 \atop m\ne k}^T \frac{ \lambda _m + \lambda _k }{(\lambda _m - \lambda _k)^2} \Bigg ) U_{ik} + \mathcal{O}(\epsilon ^4)} \end{aligned}$$

(7)

$$\begin{aligned}&\boxed {\sigma _{\tilde{U}_{ik}} = \epsilon \sqrt{w} \, \Bigg [ \sum \limits _{m=1 \atop m\ne k}^T \frac{ \lambda _m + \lambda _k }{(\lambda _m - \lambda _k)^2} U_{im}^2 \Bigg ]^{\frac{1}{2}} + \mathcal{O}(\epsilon ^2) } \,. \end{aligned}$$

(8)

1.6 Perturbed right singular vectors

The $k\text{th}$ right singular vector of ${\tilde{\mathbf{A}}}$ is (Venturi 2006)

$$\begin{aligned} {\tilde{\mathbf{v}}}_k&= {{\mathbf{v}}}_k + \epsilon \, \mathbf{N}^{(1)}{{\mathbf{v}}}_k + \epsilon ^2\,\mathbf{N}^{(2)}{{\mathbf{v}}}_k + \cdots \end{aligned}$$

(99)

$$\begin{aligned} \tilde{V}_{ik}&= V_{ik} + \epsilon \, N^{(1)}_{im}V_{mk} + \epsilon ^2\,N^{(2)}_{im}V_{mk} + \cdots , \end{aligned}$$

(100)

with implied summation over $m=1,\ldots ,D$. Venturi (2006) gives $\mathbf{N}^{(1)}$, and we derive $\mathbf{N}^{(2)}$ in “Appendix 3”.

$$\begin{aligned} \mathbf{N}^{(1)}&= \frac{1}{\lambda _k} ( \mathbf{A}^\intercal \mathbf{W}^{(1)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{A}) + \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k} \right) \mathbf{I}, \end{aligned}$$

(101a)

$$\begin{aligned} \mathbf{N}^{(2)}&= \frac{1}{\lambda _k} ( \mathbf{A}^\intercal \mathbf{W}^{(2)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{W}^{(1)}\mathbf{A}) \nonumber \\&\quad + \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k^2} \right) ( \mathbf{A}^\intercal \mathbf{W}^{(1)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{A}) \nonumber \\&\quad + \left( - \frac{1}{2} \frac{\hat{\lambda }^{(2)}_k}{\lambda _k} + \frac{3}{8} \frac{(\hat{\lambda }^{(1)}_k)^2}{\lambda _k^2} \right) \mathbf{I}\,. \end{aligned}$$

(101b)

The expected value and standard deviation of $\tilde{V}_{ik}$ are (Venturi 2006)^{Footnote 11}

$$\begin{aligned} \langle \tilde{V}_{ik} \rangle&= V_{ik} + \epsilon ^2 \langle N^{(2)}_{im} V_{mk} \rangle + \mathcal{O}(\epsilon ^4), \end{aligned}$$

(102)

$$\begin{aligned} \sigma _{\tilde{V}_{ik}}&\equiv \left[ \left\langle \big (\tilde{V}_{ik} - \langle \tilde{V}_{ik}\rangle \big )^2 \right\rangle \right] ^{\frac{1}{2}} \nonumber \\&= \epsilon \, \left[ \left\langle \big ( N^{(1)}_{im}V_{mk} \big )^2 \right\rangle \right] ^{\frac{1}{2}} + \mathcal{O}(\epsilon ^2) \,. \end{aligned}$$

(103)

Extending Venturi’s results, with some effort to evaluate $\langle \big ( N^{(1)}_{im}V_{mk} \big )^2 \rangle$ and $\langle N^{(2)}_{im} V_{mk} \rangle$ (see Eqs. (81) and (82)], we find

$$\begin{aligned}&\boxed {\langle \tilde{V}_{ik} \rangle = \Bigg ( 1 - \tfrac{\epsilon ^2}{\lambda _k} \Bigg [ \tfrac{D-w}{2} + \tfrac{w}{2} \sum\limits_{m=1 \atop m\ne k}^T \frac{ \lambda _m(3\lambda _k - \lambda _m) }{(\lambda _m - \lambda _k)^2} \Bigg ] \Bigg ) V_{ik} } \end{aligned}$$

(9)

$$\begin{aligned}&\boxed {\sigma _{\tilde{V}_{ik}} = \frac{\epsilon }{s_k} \, \Bigg [ 1 - w V_{ik}^2 + w \sum \limits _{m=1 \atop m\ne k}^T \frac{ \lambda _m(3\lambda _k - \lambda _m) }{(\lambda _m - \lambda _k)^2} V_{im}^2 \Bigg ]^{\frac{1}{2}} } \,, \end{aligned}$$

(10)

with $O(\epsilon ^4)$ and $O(\epsilon ^2)$ accuracy, respectively.

Appendix 3: Derivation of $\mathbf{N}^{(1)}$ and $\mathbf{N}^{(2)}$

The $k\text {th}$ right singular vector of ${\tilde{\mathbf{A}}}$ is (Venturi 2006)

$$\begin{aligned} {\tilde{\mathbf{v}}}_k&= {{\mathbf{v}}}_k + \epsilon \, \mathbf{N}^{(1)}{{\mathbf{v}}}_k + \epsilon ^2\,\mathbf{N}^{(2)}{{\mathbf{v}}}_k + \cdots \,\,. \end{aligned}$$

(104)

To derive $\mathbf{N}^{(1)}$ and $\mathbf{N}^{(2)}$, recall the definition of the SVD ${\tilde{\mathbf{A}}}= {\tilde{\mathbf{U}}}{\tilde{\mathbf{S}}}{\tilde{\mathbf{V}}}^\intercal$ and thus ${\tilde{\mathbf{A}}}^\intercal {\tilde{\mathbf{u}}}_k = \tilde{s}_k{\tilde{\mathbf{v}}}_k$. Also recall $\tilde{s}_k = \sqrt{{\tilde{\lambda }}_k}$, so

$$\begin{aligned} {\tilde{\mathbf{v}}}_k = \frac{1}{\sqrt{{\tilde{\lambda }}_k}} {\tilde{\mathbf{A}}}^\intercal {\tilde{\mathbf{u}}}_k \,. \end{aligned}$$

(105)

Expanding in a Laurent series,

$$\begin{aligned} \frac{1}{\sqrt{{\tilde{\lambda }}_k}}&= \frac{1}{ \sqrt{ \lambda _k + \epsilon \, \hat{\lambda }^{(1)}_k + \epsilon ^2\, \hat{\lambda }^{(2)}_k + \cdots } } \nonumber \\&= \frac{1}{\sqrt{\lambda _k}} + \epsilon \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k^\frac{3}{2}} \right) \nonumber \\&\quad + \epsilon ^2 \left( - \frac{1}{2} \frac{\hat{\lambda }^{(2)}_k}{\lambda _k^\frac{3}{2}} + \frac{1}{2!}\frac{3}{4} \frac{(\hat{\lambda }^{(1)}_k)^2}{\lambda _k^\frac{5}{2}} \right) + \cdots \end{aligned}$$

(106)

so that

$$\begin{aligned} {\tilde{\mathbf{v}}}_k&= \left[ \frac{1}{\sqrt{\lambda _k}}+ \epsilon \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k^\frac{3}{2}} \right) \right. \nonumber \\&\quad \,\left. +\, \epsilon ^2 \left( - \frac{1}{2} \frac{\hat{\lambda }^{(2)}_k}{\lambda _k^\frac{3}{2}} + \frac{3}{8} \frac{(\hat{\lambda }^{(1)}_k)^2}{\lambda _k^\frac{5}{2}} \right) + \cdots \right] \left( \mathbf{A}^\intercal + \epsilon {\hat{\mathbf{E}}}^\intercal \right) \nonumber \\&\quad \times \, \left[ \mathbf{u}_k + \epsilon \, \mathbf{W}^{(1)}\mathbf{u}_k+ \epsilon ^2\,\mathbf{W}^{(2)}\mathbf{u}_k + \cdots \right] \,. \end{aligned}$$

(107)

Grouping terms of order 1, $\epsilon$, and $\epsilon ^2$, and comparing the results with (104) we have:

$$\begin{aligned} {{\mathbf{v}}}_k&= \frac{1}{\sqrt{\lambda _k}} \mathbf{A}^\intercal \mathbf{u}_k \quad \text {as expected, and} \end{aligned}$$

(108)

$$\begin{aligned} \mathbf{N}^{(1)}{{\mathbf{v}}}_k&= \frac{1}{\sqrt{\lambda _k}} ( \mathbf{A}^\intercal \mathbf{W}^{(1)}\mathbf{u}_k + {\hat{\mathbf{E}}}^\intercal \mathbf{u}_k ) + \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k^\frac{3}{2}} \right) \mathbf{A}^\intercal \mathbf{u}_k \end{aligned}$$

(109)

$$\begin{aligned} \mathbf{N}^{(2)}{{\mathbf{v}}}_k&= \frac{1}{\sqrt{\lambda _k}} ( \mathbf{A}^\intercal \mathbf{W}^{(2)}\mathbf{u}_k + {\hat{\mathbf{E}}}^\intercal \mathbf{W}^{(1)}\mathbf{u}_k )\nonumber \\&\quad + \left( - \frac{1}{2} \frac{\hat{\lambda }^{(2)}_k}{\lambda _k^\frac{3}{2}} + \frac{3}{8} \frac{(\hat{\lambda }^{(1)}_k)^2}{\lambda _k^\frac{5}{2}} \right) \mathbf{A}^\intercal \mathbf{u}_k \,. \end{aligned}$$

(110)

Upon factoring out ${{\mathbf{v}}}_k$ using $\mathbf{u}_k = \mathbf{A}{{\mathbf{v}}}_k/\sqrt{\lambda _k}$ or $\mathbf{A}^\intercal \mathbf{u}_k = {{\mathbf{v}}}_k \sqrt{\lambda _k}$, we have the result

$$\begin{aligned} \mathbf{N}^{(1)}&= \frac{1}{\lambda _k} ( \mathbf{A}^\intercal \mathbf{W}^{(1)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{A}) + \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k} \right) \mathbf{I}, \end{aligned}$$

(111a)

$$\begin{aligned} \mathbf{N}^{(2)}&= \frac{1}{\lambda _k} ( \mathbf{A}^\intercal \mathbf{W}^{(2)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{W}^{(1)}\mathbf{A})\nonumber \\&\quad + \left( -\frac{1}{2} \frac{\hat{\lambda }^{(1)}_k}{\lambda _k^2} \right) ( \mathbf{A}^\intercal \mathbf{W}^{(1)}\mathbf{A}+ {\hat{\mathbf{E}}}^\intercal \mathbf{A})\nonumber \\&\quad + \left( - \frac{1}{2} \frac{\hat{\lambda }^{(2)}_k}{\lambda _k} + \frac{3}{8} \frac{(\hat{\lambda }^{(1)}_k)^2}{\lambda _k^2} \right) \mathbf{I}\,. \end{aligned}$$

(111b)

Appendix 4: Accuracy of RMSE estimates

In Sect. 3.1, we derived the following back-of-the-envelope formulae for the rmse (valid for modes $k<k_F$):

$$\begin{aligned} \text {rmse}({\tilde{\mathbf{u}}}_k)&\approx \sqrt{w}\epsilon /s_k \\ \text {rmse}({\tilde{\mathbf{v}}}_k)&\approx \epsilon /s_k \, . \end{aligned}$$

(31)

It is worthwhile to ask under what conditions will these formulae be accurate?

In comparing (31) to the full $\text {rmse}({\tilde{\mathbf{v}}}_k)$ Eq. (19),

$$\begin{aligned} \text {rmse}({\tilde{\mathbf{v}}}_k) \approx \frac{\epsilon }{s_k} \, \Bigg [ \frac{D-w}{D} + \frac{w}{D} \sum \limits _{m=1 \atop m\ne k}^T \frac{ \lambda _m(3\lambda _k - \lambda _m) }{(\lambda _m - \lambda _k)^2} \Bigg ]^{\frac{1}{2}} \,, \end{aligned}$$

(19)

it is clear that (19) reduces to (31) if the summation term in (19) is negligible. Moreover, it is straightforward to show that if the summation term in (19) is negligible, then the summation term in (17) is roughly equal to one, and (17) reduces to (31) as well. But under what conditions does this occur? Recall Eq. (17) read

$$\begin{aligned} \text {rmse}({\tilde{\mathbf{u}}}_k) \approx \frac{\sqrt{w}\, \epsilon }{s_k} \, \Bigg [ \frac{1}{T} \sum \limits _{m=1 \atop m\ne k}^T \frac{ \lambda _k (\lambda _m + \lambda _k) }{(\lambda _m - \lambda _k)^2} \Bigg ]^{\frac{1}{2}} \,. \end{aligned}$$

(17)

Assuming $\lambda _m \gg \lambda _k$ for ${\scriptstyle m=1,\ldots ,k-2 }$ and $\lambda _m \ll \lambda _k$ for ${\scriptstyle m=k+2,\ldots ,T}$, the summation term in (17) can be approximated as:

$$\begin{aligned} \begin{aligned} \frac{1}{T}&\left[ \sum \limits _{m=1}^{k-2} \frac{ \lambda _k }{\lambda _m}\right] + \frac{1}{T} \left[ \sum \limits _{m=k+2}^T 1 \right] \, \\&+\frac{1}{T} \frac{ \lambda _k (\lambda _{k-1} + \lambda _k) }{(\lambda _{k-1} - \lambda _k)^2} + \frac{1}{T} \frac{ \lambda _k (\lambda _{k+1} + \lambda _k) }{(\lambda _{k+1} - \lambda _k)^2} \end{aligned} \end{aligned}$$

(112)

The first sum is negligible, because these $\lambda _m \gg \lambda _k$. The second sum yields $1 - (k+1)/T$, which is roughly 1 for $k \ll T$. The third and fourth terms can be combined (in approximation), taking the numerators both as $2\lambda _k^2$ and the denominators both as $(\text {gap}(\lambda _k))^2$. With these approximations, we have

$$\begin{aligned} \text {rmse}({\tilde{\mathbf{u}}}_k) \approx \frac{\sqrt{w}\epsilon }{s_k} \, \Bigg [ 1 + \frac{4}{T} \frac{ \lambda _k^2 }{(\text {gap}(\lambda _k))^2} \Bigg ]^{\frac{1}{2}} \,. \end{aligned}$$

(113)

Equation (113) now predicts that $\text {rmse}({\tilde{\mathbf{u}}}_k) \approx \sqrt{w} \, \epsilon /s_k$ only when the relative gap is larger than

$$\begin{aligned} \boxed { \frac{\text {gap}(\lambda _k)}{ \lambda _k } > \sqrt{\frac{4}{T}} } \,. \end{aligned}$$

(114)

To validate Eq. (114), consider the following example with analytic data $\mathbf{A}$ generated using the singular vectors from (26) and (28), as in Sect. 3.1, and new singular values

$$\begin{aligned} \log _{10}(s_k) = -5 \frac{k-1}{T-1} - 0.1 \sin \left( 12\pi \frac{k-1}{T-1}\right) \,. \end{aligned}$$

(115)

Here, the noise data $\mathbf{E}$ were constructed by first drawing from a normal distribution with standard deviation $\sqrt{w}\epsilon$ and then performing uniform spatial smoothing over a window of width w. This two-step process yields spatially correlated noise with standard deviation $\epsilon$. Results of a Monte Carlo simulation with $\epsilon = 10^{-5}$, $w = 5$, $T = 200$, $D = 2000$, and $N=1000$ are shown in Fig. 19.

One interesting feature of this example is that (115) gives the singular values “flat spots” at $k=5{-}13, 22{-}29,$ and so on, where the relative gap between the singular values becomes very small (see Fig. 19a, b). These “flat spots” significantly increase the rmse of the modes (see Fig. 19c). Because of the “flat spots”, the summations in rmse predictions (17) and (19) contribute significantly.

Inspecting Fig. 19b, c, it is evident that the rmse “lifts off” the $\epsilon /s_k$ curve when the relative gap is small enough that (114) is violated. This example shows that Eq. (114) must be true for the back-of-the-envelope formulae (31) to be accurate.

The other interesting feature of this example is that the noise data were generated using spatial smoothing, so this example provides validation of the rmse theory (17)/(19) and estimates (31) for the case of $w>1$. Here, we find that the theoretical rmse predictions nearly overlay the numerically-computed rmse for nearly all modes.

Appendix 5: Singular values of random data

The Marchenko–Pastur distribution (Marchenko and Pastur 1967) is related to the distribution of singular values of a matrix of i.i.d. Gaussian noise. Here, we provide this distribution, and we show that for $D \gg T$, the Marchenko–Pastur distribution can also be modified so as to represent random data that has spatial correlation.

1.1 Spatially independent random data

Consider the expected distribution $\mathbb {E}[\acute{s}(\mathbf{E})]$ of singular values of a matrix $\mathbf{E} \in \mathbb {R}^{T \times D}$. Let $\mathbf{E}$ be composed of independent, identically distributed (i.i.d.) random data drawn from a Gaussian (normal) distribution with zero mean and standard deviation $\epsilon$, $\mathcal{N}(0,\epsilon ^2)$. Without loss of generality, assume $T<D$. The Marchenko–Pastur Law relates to the expected distribution $\mathbb {E}[\lambda (\mathbf{Y})]$ of eigenvalues of matrix $\mathbf{Y} = (\mathbf{E}{} \mathbf{E}^\intercal )/(D\epsilon ^2)$. Recall, the singular values of $\mathbf{E}$ are equal to $\sqrt{\text {eigenvalues of } \mathbf{E}{} \mathbf{E}^\intercal }$ and thus equal to $\sqrt{(D \epsilon ^2) (\text {eigenvalues of } \mathbf{Y}) }$. Therefore,

$$\begin{aligned} \mathbb {E}[\acute{s}(\mathbf{E})] = \sqrt{(D \epsilon ^2) \mathbb {E}[\lambda (\mathbf{Y})] } \,. \end{aligned}$$

(116)

The Marchenko–Pastur Law states that in the limit $T\rightarrow \infty$ with constant $y \equiv T/D$, the distribution of eigenvalues $\lambda (\mathbf{Y})$ converges to that defined by the following probability density function (Marchenko and Pastur 1967)

$$\begin{aligned} p(z) = \frac{1}{2\pi y} \frac{ \sqrt{(b-z)(z-a)} }{z} \,, \end{aligned}$$

(117)

where

$$\begin{aligned} \begin{aligned} b&= (1 + \sqrt{y} )^2 \\ a&= (1 - \sqrt{y})^2, \end{aligned} \end{aligned}$$

(118)

are the largest and smallest eigenvalues of $\mathbf{E}$, respectively.

To derive the associated distribution of singular values, consider the cumulative distribution function $P(z) = \int _a^z p(z') \, {\text {d}}z'$, which is the fraction of eigenvalues $\lambda$ in the range $a \le \lambda \le z$. By definition $P(z = a) = 0$ and $P(z = b) = 1$. Carrying out the integration, we find

$$\begin{aligned} P(z)&= \frac{1}{2\pi y} \Bigg \{ 2\sqrt{ab} \left[ \tan ^{-1}\left( \sqrt{\frac{a(b-z)}{b(z-a)}} \right) - \frac{\pi }{2} \right] \nonumber \\&\quad + \frac{a+b}{2} \left[ \tan ^{-1}\left( \frac{ z-(a+b)/2}{\sqrt{(b-z)(z-a)}} \right) + \frac{\pi }{2} \right] \nonumber \\&\quad + \sqrt{(b-z)(z-a)} \Bigg \} \end{aligned}$$

(119)

For a matrix $\mathbf{E}$ of finite size ($T < \infty )$, Eq. (119) can be used to determine the T singular values of $\mathbf {E}$ as follows. First, find the T eigenvalues $\lambda _k$ $\scriptstyle (k=1,\ldots ,T)$ corresponding to $P(\lambda _k) = 1 - (k-1)/(T-1)$, such that $P(\lambda _k)$ takes on T discrete values (of even spacing) between $P(\lambda _1=b)=1$ and $P(\lambda _T=a)=0$. Practically, this can be accomplished by evaluating P(z) on a fine grid $z \in [a,b]$ and interpolating^{Footnote 12} to find

$$\begin{aligned} \lambda _k = {\texttt{pchip}}\left( P(z), z, \, 1 - \tfrac{k-1}{T-1} \, \right) \,. \end{aligned}$$

(120)

The expected singular values of $\mathbf{E}$ are given by Eq. (116), which becomes

$$\begin{aligned} \acute{s}_k = \epsilon \sqrt{ D \lambda _k } \,. \end{aligned}$$

(121)

For example, the largest $\acute{s}_1$ and smallest $\acute{s}_T$ singular values of $\mathbf{E}$ are

$$\begin{aligned} \begin{aligned} \acute{s}_1&= \epsilon \sqrt{D b} = \epsilon \left( \sqrt{D}+\sqrt{T}\right) \\ \acute{s}_T&= \epsilon \sqrt{D a} = \epsilon \left( \sqrt{D}-\sqrt{T}\right) \,. \end{aligned} \end{aligned}$$

(122)

For shorthand in this manuscript, we will say “Marcheko–Pastur distribution” to refer to the expected singular values computed via (119), (120), and (121).

Note that the singular values in (121) are proportional to the error level $\epsilon$, so it is useful to define a ‘unit-error’ singular value distribution

$$\begin{aligned} {\hat{s}}_k \equiv \sqrt{ D \lambda _k } \,, \end{aligned}$$

(123)

such that $\acute{s}_k = \epsilon {\hat{s}}_k$.

1.2 Spatially correlated random data

Spatially correlated noise can occur in experimental data that are spatially smoothed during collection or processing. For example, PIV data are typically collected from overlapping interrogation windows, and processing typically includes smoothing by a weighted average over the 9 nearest neighbors. Such a dataset effectively has fewer than D independent data sites. Thus, it is reasonable to expect that the singular values of spatially correlated random data still follow a Marchenko–Pastur distribution, but with $y=T/D$ replaced by $y=fT/D$ in Eqs. (117)–(119). Indeed, we have empirically found this approximation to work well when $D/T \gtrsim 20$ and $D/fT \gtrsim 5$. The ‘spatial-correlation factor’ f represents the ratio $f = (\text {actual } D)/(\text {effective } D)$, so $f=1$ represents uncorrelated data, and $f > 1$ indicates effectively-fewer independent data sites due to spatial correlation.

For example, consider random data with spatial correlation that is produced by taking a moving average of i.i.d. random data. Such an average could either have uniform weighting 1 / w (i.e. as in Eq. (59)) or Gaussian weighting:

$$\begin{aligned} g_i = \frac{ e^{-i^2/[2(h/2.5)^2] } }{ \sum _{i=-h}^h e^{-i^2/[2(h/2.5)^2] } }, \end{aligned}$$

(124)

for $i = -h, \ldots , h$, where $h = (w-1)/2$ and w is the window width (see Matlab gausswin function).

Figure 20 shows the singular values of random data with either uniform or Gaussian spatial smoothing. The $w=1$ curve corresponds to the original i.i.d. data (no smoothing) and is well represented by the original Marchenko–Pastur distribution ($f=1$). Clearly, the ‘spatial-correlation factor’ f increases with increasing width w of the smoothing window. As f increases, the modified Marchenko–Pastur distribution becomes steeper: with the $y = fT/D$ substitution, the largest and smallest singular values in (122) become

$$\begin{aligned} \begin{aligned} \acute{s}_1&= \epsilon \left( \sqrt{D}+\sqrt{fT}\right) \\ \acute{s}_T&= \epsilon \left( \sqrt{D}-\sqrt{fT}\right) \,. \end{aligned} \end{aligned}$$

(125)

Appendix 6: Theoretical perturbation bounds

Classical perturbation theory provides bounds for the singular values and canonical angles (Stewart 1990; Dopico 2000). In this Appendix, we show that these bounds are much looser than the 95% confidence intervals derived in Sect. 2.

1.1 Perturbation bounds for singular values

Weyl’s theorem provides a bound on the perturbation of each singular value (Weyl 1912)

$$\begin{aligned} |\tilde{s}_k - s_k|&\le \Vert \mathbf{E}\Vert _2 , \end{aligned}$$

(126)

where $\Vert \mathbf{E}\Vert _2 = \acute{s}_1 \approx \epsilon (\sqrt{fT}+\sqrt{D})$. Weyl’s theorem implies that large singular values are relatively unaffected by the measurement error; that is, $\tilde{s}_k \approx s_k$ for $\tilde{s}_k \gg \Vert \mathbf{E}\Vert _2$. However, small singular values ($\tilde{s}_k < \Vert \mathbf{E}\Vert _2$) are quite dubious. Assuming $\tilde{s}_k > s_k$, as is typical for smaller singular values, Weyl’s theorem (126) only guarantees

$$\begin{aligned} \tilde{s}_k \lesssim s_k + \epsilon (\sqrt{fT}+\sqrt{D}), \end{aligned}$$

(127)

We now show that (127) is a much looser bound than the 95% confidence interval predicted by our perturbation theory results. Assuming well separated singular values, the summation in Eq. (5) can be ignored, and with aid of (6), the 95% confidence interval $\langle \!\langle \tilde{s}_k \rangle \!\rangle = \langle \tilde{s}_k \rangle \pm 1.96 \sigma _{\tilde{s}_k}$ provides the upper bound

$$\begin{aligned} \tilde{s}_k \lesssim s_k + \frac{\epsilon ^2 D}{2 s_k} + 1.96 \epsilon . \end{aligned}$$

(128)

Clearly (128) is a tighter bound than (127) for all the $s_k > \epsilon D / ( 2 [ (\sqrt{fT}+\sqrt{D}) - 1.96]) \approx \epsilon \sqrt{D}/2$. In other words, (128) is a tighter bound than (127) for all $s_k$ of interest, since lower modes with $s_k < \epsilon \sqrt{D}/2$ are mostly noise anyway.

1.2 Perturbation bounds for singular vectors

Wedin’s theorem provides a bound on the canonical angles of the singular vectors (Davis and Kahan 1970; Wedin 1972)

$$\begin{aligned} \sqrt{\sin ^2\phi _k + \sin ^2\theta _k} \le \frac{\sqrt{ \Vert \mathbf{E}^\intercal {\tilde{\mathbf{u}}}_k\Vert ^2_F + \Vert \mathbf{E}{\tilde{\mathbf{v}}}_k\Vert ^2_F }}{\text {gap}(\tilde{s}_k)} \,, \end{aligned}$$

(129)

where

$$\begin{aligned} \text {gap}(\tilde{s}_k) = \min \big ( \min \limits _{j\ne k}|\tilde{s}_k - s_j| \, , \, \tilde{s}_k\big ) \,. \end{aligned}$$

(130)

In words, a singular vector is extremely sensitive to perturbations if it corresponds to a singular value that has nearby neighbors.

Wedin’s theorem can also be stated for a space $\tilde{\mathcal{S}}$ spanned by a collection of singular vectors $\{{\tilde{\mathbf{u}}}_k, k \in \mathcal{K}\}$, where $\mathcal K$ is a subset of the integers $1,\ldots ,T$. This is useful for the case when the gaps between singular values within the set $\{\tilde{s}_k, k \in \mathcal{K}\}$ are small but the gap between this set and any other singular value, $\text{gap}(\tilde{s}_{\mathcal{K}}) = \min ( \min \nolimits _{k \in {\mathcal{K}} \atop j \not \in {\mathcal{K}} } |\tilde{s}_k - s_j|\, , \, \tilde{s}_k)$, is large. In this scenario, the individual vectors ${\tilde{\mathbf{u}}}_{k\in \mathcal{K}}$ may be ill-defined, but the space $\tilde{\mathcal{S}}$ may be well defined.

Using our perturbation theory results, we can evaluate the left hand side (LHS) and right hand side (RHS) terms of Wedin’s theorem (129). Equation (32) prescribes the left hand side terms of (129) as $\sin \phi _k \approx \epsilon \sqrt{wT}/s_k$ and $\sin \theta _k \approx \epsilon \sqrt{D}/s_k$, so

$$\begin{aligned} \langle LHS \rangle \approx \epsilon \sqrt{T+D}/s_k \end{aligned}$$

(131)

The expected value of the right hand side can be evaluated as follows using the methods in “Appendix 2”,

$$\begin{aligned} \langle RHS \rangle&= \left\langle \Bigg [ \sum _{j=1}^D E_{mj}^2U_{mk}^2 + \sum _{i=1}^T E_{im}^2V_{mk}^2 \Bigg ]^\frac{1}{2}/\text {gap}(\tilde{s}_k) \right\rangle \nonumber \\&= \epsilon \sqrt{T+D}/\text {gap}(\tilde{s}_k) \,. \end{aligned}$$

(132)

Figure 21 compares the LHS and RHS terms in Wedin’s theorem for the example in Sect. 3.1. Clearly, expressions (131) and (132) well approximate the left hand side (LHS) and right hand side (RHS) terms of Wedin’s theorem.

This example shows that Wedin’s theorem (129) is not an extremely useful tool, because it is too broad of a bound [(i.e. the right hand side of (129) is much larger than the left hand side]. On the other hand, our approximations (32) well describe the canonical angles and thus are much better estimates of the quantities of interest than Wedin’s theorem.

For cases where the gap between singular values is not large, we expect approximations (32) to fail. Thus, the expectation is that for cases with smaller $\text {gap}(\tilde{s}_k)$, Wedin’s theorem is expected to ensure

$$\begin{aligned} \frac{\epsilon \sqrt{T+D}}{s_k} \lesssim \sqrt{\sin ^2\phi _k + \sin ^2\theta _k} \lesssim \frac{\epsilon \sqrt{T+D}}{\text {gap}(\tilde{s}_k)} .\end{aligned}$$

(133)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Epps, B.P., Krivitzky, E.M. Singular value decomposition of noisy data: mode corruption. Exp Fluids 60, 121 (2019). https://doi.org/10.1007/s00348-019-2761-y

Download citation

Received: 22 January 2019
Revised: 17 April 2019
Accepted: 04 June 2019
Published: 08 July 2019
DOI: https://doi.org/10.1007/s00348-019-2761-y

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Singular value decomposition of noisy data: mode corruption

Abstract

Graphic abstract

Access this article

Similar content being viewed by others

Eighty Years of the Finite Element Method: Birth, Evolution, and Future

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Singular value decomposition of noisy data: noise filtering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Relation between the SVD and POD

Appendix 2: Perturbation theory derivations

1.1 Spatially independent error data

1.2 Spatially correlated error data

1.3 Perturbed eigenvalues

1.4 Perturbed singular values

1.5 Perturbed left singular vectors

1.6 Perturbed right singular vectors

Appendix 3: Derivation of \(\mathbf{N}^{(1)}\) and \(\mathbf{N}^{(2)}\)

Appendix 4: Accuracy of RMSE estimates

Appendix 5: Singular values of random data

1.1 Spatially independent random data

1.2 Spatially correlated random data

Appendix 6: Theoretical perturbation bounds

1.1 Perturbation bounds for singular values

1.2 Perturbation bounds for singular vectors

Rights and permissions

About this article

Cite this article

Navigation

Singular value decomposition of noisy data: mode corruption

Abstract

Graphic abstract

Access this article

Similar content being viewed by others

Eighty Years of the Finite Element Method: Birth, Evolution, and Future

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Singular value decomposition of noisy data: noise filtering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Relation between the SVD and POD

Appendix 2: Perturbation theory derivations

1.1 Spatially independent error data

1.2 Spatially correlated error data

1.3 Perturbed eigenvalues

1.4 Perturbed singular values

1.5 Perturbed left singular vectors

1.6 Perturbed right singular vectors

Appendix 3: Derivation of \(\mathbf{N}^{(1)}\) and \(\mathbf{N}^{(2)}\)

Appendix 4: Accuracy of RMSE estimates

Appendix 5: Singular values of random data

1.1 Spatially independent random data

1.2 Spatially correlated random data

Appendix 6: Theoretical perturbation bounds

1.1 Perturbation bounds for singular values

1.2 Perturbation bounds for singular vectors

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation