Randomized Signal Processing with Continuous Frames

This paper focuses on signal processing tasks in which the signal is transformed from the signal space to a higher dimensional coefficient space (also called phase space) using a continuous frame, processed in the coefficient space, and synthesized to an output signal. We show how to approximate such methods, termed phase space signal processing methods, using a Monte Carlo method. As opposed to standard discretizations of continuous frames, based on sampling discrete frames from the continuous system, the proposed Monte Carlo method is directly a quadrature approximation of the continuous frame. We show that the Monte Carlo method allows working with highly redundant continuous frames, since the number of samples required for a certain accuracy is proportional to the dimension of the signal space, and not to the dimension of the phase space. Moreover, even though the continuous frame is highly redundant, the Monte Carlo samples are spread uniformly, and hence represent the coefficient space more faithfully than standard frame discretizations.


Introduction
We consider signal processing tasks based on continuous frames [1,46]. In the general setting, an input signal s, from the Hilbert space of signals H, is first analyzed into a coefficient space representation V f [s] via the frame analysis operator V f . Then, V f [s] is manipulated in the coefficient space by first applying a pointwise nonlinearity r • V f [s] and then a linear operator T , to produce T (r • V f [s]), which is finally synthesized back to the signal space via V * f . The end-to-end pipeline reads We call pipelines of the form (1) phase space signal processing. The linear operator T models a global change of the signal in the feature space, while the nonlinearity r allows modifying the feature coefficients term-by-term with respect to their values. Some examples of continuous frames are the 1D continuous wavelet transform -CWT [26,13], the short time Fourier transform -STFT [25], the Shearlet transform [28] and the Curvelet transform [7]. Signal processing tasks of the form (1) are used in a multitude of applications. In multipliers [37,40,2,47,3], T is a multiplicative operator and the nonlinearity is trivial r(x) = x. Multipliers have applications, for example, in audio analysis [4] and increasing signal-to-noise ratio [36]. In signal denoising, e.g., wavelet shrinkage denoising [15,14], and Shearlet denoising [29], the linear operator is trivial T = I and r is a nonlinearity that attenuates low values. The same is true in shearlet based image enhancement [21,Section 4]. In phase vocoder, T is a dilation operator and r is a so-called phase correction nonlinearity [44,12,52,32,34,16,41,45]. In analysis-based iterative thresholding algorithms for inverse problems, the sparsification step in each iteration can be written as (1) with T = I and r a thresholding nonlinearity [22,30].
We note that the theory presented in this paper works also when the analysis and synthesis in (1) are done by two different frames sharing the same feature space. An example application is shearlet or curevelet based Radon transform inversion [11,20,9], where T is a multiplicative operator, r is thresholding, analysis is done by the curvelet/shearlet frame, and synthesis is done by some modified curvelet/shearlet frame. For simplicity of the presentation, we stick to the same frame for analysis and synthesis. More accurately, in (5) and (6) we extend (1) to a pipeline based on a frame and its canonical dual.
In this paper we study quadrature discretizations of (1) based on random samples.

Quadrature vs. discrete frame discretizations of continuous frames
An analysis operator of a continuous frame V f has the form where G is a measure space called phase space, that usually has some physical interpretation (e.g., in the STFT G is the time-frequency plane), {f g } g∈G is the continuous frame, and L 2 (G) is called the coefficient space. Accordingly, the synthesis operator V * f has the form [46,Theorem 2.6] As evident from the above description, phase space signal processing involves integrals, and thus some form of discretization is required. The common approach is to sample points from G to construct a discrete frame version {f g k } ∞ k=1 of the continuous frame (e.g., as in [25,13]). For the discrete frame, the synthesis operator reads, for (F k ) k ∈ l 2 , Note that (4) looks like a quadrature approximation of (3). However, in the standard continuousto-discrete frame approach, the points g k are not chosen for approximating (3). Rather, g k are chosen so that the discrete system {f g k } ∞ n=1 satisfies the discrete frame inequality. Hence, the discrete frame is related to the continuous one by the fact that f g k are sampled from f (·) , not by some approximation requirement. In this paper we take a different route to discretization, requiring that (4) approximates (3). Let us call the latter discretization approach the quadrature approach.
There is an advantage in the quadrature approach over the discrete frame approach when working with highly redundant continuous frames. To illustrate the idea, consider the STFT, where G = R 2 is the time-frequency plane. Suppose that we extend G by adding to the time and frequency axes t, ω a third axis c that controls the time width of the window. To discretize the resulting continuous frame f t,ω,c to a discrete frame, one may simply choose to fix the window width axis to one single value c = c 0 , and sample a time-frequency 2D grid (t n , ω m , c 0 ) n,m . Indeed, such an approach would result in a standard discrete STFT, which is a discrete frame. However, the information along the third axis is lost in this discretization. In fact, nothing in the continuousto-discrete frame approach forces the discretization to faithfully represent the whole continuous phase space. In the quadrature approach, our goal is to discretize the continuous frames more faithfully, sampling uniformly all the feature directions in phase space.

Randomized quadrature approximations of continuous frames
Our motivation is hence to discretize highly redundant continuous frames in the quadrature approach. In such situations, the dimensionality of phase space is higher than that of the signal space, and hence using a randomized discretization makes sense. Our approach is motivated by randomized methods in finite-dimensional numerical linear algebra, which are a prominent approach for dealing with high dimensional data [35,50,51,17,18,10]. The goal in this paper is to develop a similar randomized theory in an infinite dimensional setting in general separable Hilbert spaces, namely, in the phase space signal processing setting.
Randomized algorithms in a context of continuous frames were presented in the past. Relevant sampling is a line of work in which integral transforms are randomly discretized [5,23,49,42]. While the goal in our approach is to approximate the continuous frame with a quadrature sum, the goal in relevant sampling is to sample discrete frames from continuous frames.
We summarize our construction as follows.
Signal processing in phase space. Let V f : H → L 2 (G) be the analysis operator of a continuous frame, and is the identity I, we consider the following two formulations of signal processing in phase space. Synthesis phase space signal processing is defined by the pipeline and analysis phase space signal processing is defined by Here, T is a bounded operator in L 2 (G) and r : C → C is a nonlinearity. The pipelines (5) and (6) can be seen as working with the canonical dual frame S −1 f f [46] either in the analysis or in the synthesis step. We suppose that S f has an efficient discretization in the signal space H. Hence, we would like to find an efficient discretization of the rest of the pipeline, namely, of Mote Carlo signal processing in phase space. We study a Monte Carlo approximation of signal processing in phase space based on the pipelines (5) and (6). We first consider a Monte Carlo approximation of synthesis. For F ∈ L 2 (G), under certain assumptions given in Subsection 3.5, we consider the approximation of the synthesis operator by Here, {g k } K k=1 ⊂ G is a finite set of independent random sample points, and C is a normalization constant.
Using this approximation, in Section 4 we study the approximation rate of stochastic signal processing in phase space (5) and (6). The first version of the approximation reads, for the synthesis and analysis formulations respectively, Under some general assumptions, we also approximate the signal processing pipelines (5) and (6) when T in an integral operator defined by where R : G 2 → C (Definition 6). The synthesis and analysis approximations in this case read respectively Here, {g k } K k=1 , {y m } L m=1 ⊂ G are two finite sets of independent random sample points. Methods (10) and (11) are useful for integral operators. Methods (8) and (9) are useful when the samples T r • V f [s] (g k ) can be computed using some other samples , which is the case for multiplicative and diffeomorphism operators (see Subsection 6.2).
Summary of our main results.
• We prove the convergence of the Monte Carlo methods (8)-(11) as the number as samples increase, and also introduce non-asymptotic error bounds. When considering discrete signals of resolution/dimension M , embedded in the continuous signal space, the error in the stochastic method is of order O( M/K), where K is the number of Monte Carlo samples.
• The computational complexity of our method does not depend on the dimension of phase space for a rich class of signal processing pipelines. This allows approximating highly redundant continuous frames efficiently using sample points which are well spread in all directions in phase space.
• As a toy application of the theory, we show how to increase the expressive capacity of the 2D time-frequency phase space by adding a third axis, and use the construction in a phase vocoder scheme.
The proofs in this paper are inspired by the constructions in standard Monte Carlo theory (see, e.g., [6, Section 2]), adapted to infinite dimensional Hilbert spaces.

Background: harmonic analysis in phase space
In this section we review the theory of continuous frames, and give the two examples: STFT and CWT. By convention, all Hilbert spaces in this paper are assumed to be separable. The Fourier transform F is defined with the following normalization We denote the norm of a vector v in a Banach space B by v B . For a measure space {G, µ}, we denote interchangeably by the p norm of the signal f ∈ L p (G), where 1 ≤ p < ∞, and denote interchangeably . We note that an equality between two L p functions is by definition an almosteverywhere equality. We denote the induced operator norm of operators in Banach spaces using the subscript of the Banach space, e.g., for a bounded linear operator T : L p (G) → L p (G), we denote T p . When we want to emphasize that the norm is an operator norm, we also denote T p→p .

Examples
An important class of Parseval frames are wavelet transforms based on square integrable representations, which we call in this paper simply wavelet transforms. We refer the reader to [24, Chapters 2.3-2.5], and the classical papers [19,27]. The wavelet system in the general theory is generated by fixing one signal f ∈ H, that is typically called the mother wavelet or the window function, and applying on it a set of transformations {π(g)f | g ∈ G}, parameterized by a locally compact topological group G. The Haar measure is taken in G, and π : G → U(H) is assumed to be a square integrable representation, where U(H) is the group of unitary operators in H.
The wavelet transform is defined by For any two mother wavelets f 1 and f 2 , the reconstruction formula of the wavelet transform is given by Here, A is a special positive operator in H, called the Duflo-Moore operator, uniquely defined for every square integrable representation π, that determines the normalization of windows.
The short time Fourier transform. The following construction is taken from [25]. Consider the signal space L 2 (R). Let T : R → U(L 2 (R)) be the translation in For a normalized window f , the mapping is a Parseval continuous frame, with the standard Lebesgue measure of the phase space R 2 . The resulting transform V f [s](x, ω) = s, π(x, ω)f is called the Short Time Fourier Transform (STFT).

The 1D continuous wavelet transform
The following construction is taken from [26,13]. Consider the signal space L 2 (R), and the translation T as in the STFT. Let D : R \ {0} → U(L 2 (R)) be the dilation in L 2 (R), defined for is closed under compositions. We can treat A as a group of tuples R × (R \ {0}), with group product derived from the compositions of operators in (18). The group A is called the 1D affine group. The mapping where F is the Fourier transform. The resulting wavelet transform is called the Continuous Wavelet Transform (CWT). Next, we show how the CWT atoms are interpreted as time-frequency atoms, and the CWT is interpreted as a time-frequency transform. Here, by changing variable ω = 1 τ , we obtain the Parseval frame based on the representation π ′ (x, ω) = T (x)D(ω −1 ). The parameter ω is interpreted as frequency. The mapping π ′ is a representation of the 1D affine group with the new parameterization ω = 1 τ , in which the Haar measure is the standard Lebesgue measure of R × (R \ {0}).

Elements of stochastic signal processing in phase space
In this section we develop basic approximation results that will be used later in the paper to bound the approximation error between the signal processing pipelines (5) and (6), and their stochastic approximations (7)-(11).

Phase space operators
We start by defining integral operators in the coefficient space.
Definition 6. Let T be a bounded linear operator in L 2 (G), where G is a locally compact topological space with σ-finite Borel measures.
1. We call T a phase space integral operator (PSI operator) if there exists a measurable function R : G × G → C with R(·, g) ∈ L 2 (G) for almost every g ∈ G, such that for every F ∈ L 2 (G) Example 7. The Gramian operator Q f of a continuous frame is a phase space operator by Propo-

Sampling in phase space
Let F ∈ L 2 (G), and let f be a continuous frame. The phase space G in general does not have finite measure, and thus uniform sampling is not defined on G. However, when G has infinite measure, functions F ∈ L 2 (G) must decay in some sense "at infinity", so it is possible to restrict our sampling to a compact domain in G, in which F has most of its energy. More accurately, since G is σ-finite, it is the disjoint union of at most countably many sets of finite measure. Namely, there are disjoint measurable sets X n of finite measure, with n∈N X n = G, such that for every Denote G n = n j=1 X n , and note that n∈N G n = G. Now, (21) is equivalent to Thus, for every ǫ > 0, there exists an indicator function ψ ǫ (that depends on F ) of a measurable set of finite measure ψ ǫ 1 , such that In our analysis, we allow more general forms of envelopes ψ ǫ .
Given an envelope ψ, samples can be drawn from G according to the probability density ψ(g) ψ 1 . In the following analysis we fix an envelope ψ ǫ = ψ independently of a specific function F ∈ L 2 (G). This is the common approach in classical signal processing, where a compact frequency band [a, b] is predefined independently of a specific signal. It is implicit that we can only treat signals having most of their frequency energy in [a, b]. Any frequency information outside of [a, b] is lost or projected into the band. In Section 5, and specifically Definition 25, we study the support of ψ required to capture most of the energy of discrete signals.

Input sampling in phase space operators
Given a PSI operator T with kernel R, in this subsection we sample the input variable g of R(g ′ , g), and keep the output variable g ′ continuous. In Subsection 4 we show that sampling the output variable g ′ is a special case of the framework developed in this subsection. Let ψ be an envelope, and g ∈ G be a random sample according to the probability distribution ψ(g) ψ 1 . Define the random rank one operator T ψ,1 , applied on F ∈ L 2 (G), by We also denote T ψ,1 F (g ′ ; g) = ψ 1 R(g ′ , g)F (g), where g ′ is the variable of the output function T ψ,1 F . Next, we define the Monte Carlo approximation of T F as a sum of independent T ψ,1 F vectors.
We call T ψ,K F the Monte Carlo phase space integral operator applied on F and based on K samples, approximating T F .
When the envelope ψ is fixed throughout the analysis, we often denote interchangeably T K F = T ψ,K F . In the following we fix an envelope ψ.
Remark 10. Note that T k F is a random variable -a function with ({g k } K k=1 , g ′ ) as the variable. Thus, we can sample L 2 (G) vectors in (23), which are equivalence classes of functions, without requiring any continuity assumption on R(·, ··)F (··). Indeed, T k F is defined up to a set of tuples We define the variance V(T K F ) as the integral Given an envelope ψ, by abuse of notation, we also denote by ψ the multiplicative operator Proposition 11. Let T be a PSI operator, and F ∈ L 2 (G). Then, 2. if T is a uniformly square integrable PSI operator, with bound D, then and The proof of Item 1 of Proposition 11 follows directly from the definition of PSI operators (Definition 6). Indeed, for T 1 F , and for T K we use linearity. The proof of Item 2 of Proposition 11 is in the next subsection. We next bound the average square error in approximating T [ψF ] by T k F .
Proposition 12. Let f be a continuous frame, and T a uniformly square integrable PSI operator with bound D. Then Proof. By the Fubini-Tonelli theorem, and Proposition 11, The expected error in Proposition 12 is pointwise in F . We note that an operator expected error bound of the form "E T K − T ψ(·) 2 2 = O( ψ 1 K )" like in the finite dimensional matrix operator case [48] is not possible. Indeed, for any sample set We thus focus in this paper on pointwise error estimates.
The following is an important special case of Propositions 11 and 12.
be the Gramian operator of a bounded continuous frame with f g H ≤ C and upper frame bound B. Let K f be the frame kernel. Then, we have and . Moreover, Q f is a uniformly square integrable PSI operator with bound B 1/2 C, and Proof. By Example 7, Q f is a uniformly square integrable PSI operator with bound B 1/2 C. The rest of the results follow from Propositions 11 and 12.

Proof of Proposition 11
The proof is based on the following lemma.
Lemma 14. Let T be a PSI operator with kernel R(g ′ , g), let ψ be an envelope, and let F ∈ L 2 (G). Then the following holds.
1. The expected value of T 1 F satisfies 2. If T is a uniformly square integrable PSI operator, then V(T 1 F ) ∈ L 1 (G), and , and expected value is with respect to the random variable g.
3. If T is a uniformly square integrable PSI operator, with bound D, then Proof. Part 1 was shown in the discussion below Proposition 11. For parts 2 and 3, we write We first use the Fubini-Tonelli theorem to prove integrability with respect to g ′ of the first term of (30). By the fact that T is uniformly square integrable (Definition 6.2), and by ψ ∞ ≤ 1, This leads to (28). By the non-negativity of the integrand in the definition of V( (28) and (31), we get (29).
Proof of Proposition 11. Part 1 was shown in the discussion below Proposition 11. Next we show Part 2. By the Fubini-Tonelli theorem, we have When expanding the product in (32), we have the term and mixed terms, for k = k ′ , which are equal to zero, since Here, the fact that [T 1 probability measures, justify the above use of Fubini's theorem.
We thus have, by part 3 of Lemma 14,

Monte Carlo synthesis
In this subsection, we use the results of Subsection 3.3 to define and analyze the Monte Carlo approximation of synthesis.
Definition 15. Let ψ be an envelope and {g k } K k=1 random samples as in Definition 9. Given F ∈ L 2 (G), the Monte Carlo synthesis V * ψ,K f F is the random variable G K → H defined as When the envelope ψ is constant in the analysis, we often denote the Monte Carlo synthesis in short by V * K f . The following proposition formulates the Monte Carlo synthesis using the Monte Carlo PSI operator Q K f approximating the Gramian operator Q f (Corollary 13), and the frame operator S f .
By linearity, it is enough to prove for K = 1. By the fact that . Proposition 17 (Synthesis Monte Carlo approximation rate). Let f be a bounded continuous frame with frame bounds A, B, and f g H ≤ C. Let ψ ∈ L 1 (G) be an envelope. Then Proof. By Proposition 16 and Lemma 35 of Appendix B Indeed, by the frame bound Now, the result follows from Corollary 13.

Stochastic phase space signal processing of continuous signals
In this subsection we formulate and analyze the Monte Carlo approximations (8)-(11) of the signal processing in phase space pipelines (5) and (6). In Subsection 4.2 we bound the expected value of the error, and in Subsection 4.3 we bound the concentration of measure of the error.

Definition of stochastic phase space signal processing
For Parseval frames (Definition 1.6), the frame operator is S f = I, and hence signal processing in phase space takes the form P f,T,r := V * f T r • V f . We call P f,T,r Parseval signal processing in phase space even if f is non-Parseval. For non-Parseval frames, synthesis and analysis signal processing in phase space involve the multiplications P f,T,r S −1 f and S −1 f P f,T,r respectively. Since it is enough to bound the error entailed by randomly approximating P f,T,r , and then multiply the bound by S −1 f 2 for non-Parseval pipelines. We hence focus only on Parseval signal processing in our analysis.
As discussed in Subsection 3.2, sampling in phase space requires enveloping. We hence formulate the following list of signal processing pipelines.
Definition 18. Let f be a continuous frame over the phase space G, T be a bounded operator in L 2 (G), r : C → C, and ψ, η ∈ L 1 (G) two envelopes. Let s ∈ H denote a generic signal.

A signal processing pipeline is defined by
2. An output enveloped signal processing pipeline is defined by 3. An input-output enveloped signal processing pipeline is defined by The following list of Monte Carlo approximations correspond to the pipelines of Definition 18.

Definition 19
. Let P f,T,r be a signal processing pipeline, ψ and η two envelopes, and K, L ∈ N. Let s ∈ H denote a generic signal.
1. The output stochastic signal processing pipeline is defined by 2. For a phase space integral operator T , the input-output stochastic signal processing pipeline is defined by We typically fix f , T , and r, in which case we omit them from the pipeline notation and denote P, P ψ , P ψ,K etc. Equations (8)- (11) give explicit formulas for the synthesis and analysis formulations of [Ps] ψ,K;η,L and [Ps] ψ,K , based on the samples in phase space. As noted in Subsection 1.2, the pipeline (37) is useful for multipliers, shrinkage, and phase vocoder, and the pipeline (38) is useful for PSI operators.

Expected error in stochastic phase space signal processing
In the following, we estimate the error of the stochastic methods.
Theorem 20. Let f be a bounded continuous frame with bounds A, B, and f g H ≤ C. Let T be a bounded operator in L 2 (G), and r : C → C satisfy |r(x)| ≤ E |x| for some E ≥ 0 and every x ∈ C. Let ψ and η be two envelopes, and K, L ∈ N. Then, for every signal s ∈ H, the following two properties hold.
1. Output enveloped signal processing stochastic approximation: 2. Input-output enveloped signal processing stochastic approximation: if T is a uniformly bounded PSI operator with bound D, then We use the following simple observation to prove Theorem 20.
Lemma 21. Let Z 1 , Z 2 , Z 3 be non-negative real-valued random variables such that Z 1 ≤ Z 2 + Z 3 pointwise in the sample set. Then, Proof. We have Z 1 ≤ 2 max{Z 2 , Z 3 }, where the maximum is pointwise in the sample space. Therefore, Z 2 1 ≤ 4 max{Z 2 2 , Z 2 3 } ≤ 4Z 2 2 + 4Z 2 3 , and (39) follows. Proof of Theorem 20. We prove 2, and note that 1 is simpler and uses similar techniques. Denote by g = {g 1 , . . . , g K } the output samples underlying V * ψ,K f , and by y = {y 1 , . . . , y L } the input samples underlying T η,L . Denote F = r(V f [s]) ∈ L 2 (G). By the triangle inequality, and by the When calculating the conditional expected value of V , with respect to a fixed g (denoted here by E( · |g)), we use Lemma 21 and Proposition 12 to get Note that Fubini-Tonelli theorem is satisfied in the computation of as a repeated integral of y and g, since the integrand is positive and the measure is σ-finite.
Next, by Proposition 17, so by Lemma 21, by Proposition 12, and by the fact that 0 ≤ η(y) ≤ 1, Altogether, The claim now follows from F

Concentration of error in stochastic phase space signal processing
Propositions 12 and 17, and Theorem 20 estimate the average square error of the stochastic approximations. In this subsection, we formulate the approximation results as bounds on the error that hold in high probability. We show how to apply the classical concentration of measure estimates, Markov's inequality, and Bernstein's inequality, in our setting.

A Bernstein inequality in Hilbert spaces
In the following version of Bernstein's inequality, we define expected values of weakly integrable random vectors v over the sample set G using the weak integral (Definition 3) as E w (v) = w G v(g)dµ(g). The following version of Bernstein's inequality is a direct result of [8, Theorem 2.6], and is proved in Appendix A.

Theorem 22 (Hilbert space Bernstein's inequality).
Let H be a separable Hilbert space, and G a probability space. Let {v k } K k=1 : G K → H K be a finite sequence of independent random weakly integrable vectors. Suppose that for every k = 1, . . . , K, E w (v k ) = 0 and v k H ≤ B a.s. and assume that ρ 2 We note that existing variants of Bernstein's inequality in infinite dimensional Hilbert spaces are not adequate for us. For example, the operator Bernstein's inequality of [38] is limited to trace class operators, and thus does not even include the identity.

Concentration of error results
Markov type concentration of error results can be derived from Propositions 12 and 17, and Theorem 20, by multiplying the error bound by δ −1 , and replacing the expected value with an event that has probability at least (1 − δ). For example, the Markov type concentration of error version of Proposition 17 reads The following proposition summarizes the Markov and Bernstein types concentration of error bounds in output stochastic signal processing.
Theorem 23 (Output signal processing concentration of error). Let f be a bounded continuous frame with frame bounds A and B, and with f g H ≤ C, let ψ be an envelope, and K ∈ N. Let T be a bounded operator in L 2 (G), and r : C → C satisfy |r(x)| ≤ E |r(x)| for every x ∈ C, where E > 0. Let s ∈ H and 0 < δ < 1. Then, with probability more than 1 − δ, we have where κ(δ) can be chosen as one of the following two options.

2.
Bernstein type error bound: κ(δ) = 2 √ 2 ln 1 δ + 1 4 in case T ∞ < ∞ and K satisfies Proof. We prove 2 and note that 1 is simpler and based on Markov's inequality. Denote F = T r(V f [s]). Below, we use the following bounds and where We use Theorem 22 as follows. Define the independent random vectors . Therefore, by (45), Moreover, by Proposition 5, Example 7, and (46), for every Hence, by Theorem 22, for every Now, set This gives, in probability at least (1 − δ), whenever k satisfies (43). Last, using Proposition 16 and (33), we get

Discrete signals and linear volume discretization of frames
We treat discrete signals as embedded in the Hilbert space of signals H. A discrete signal is an element of a finite dimensional subspace of H. On the one hand, we can analyze discrete signals directly in H. On the other hand, discrete signals are determined by a finite number of scalars, so they are well adapted to numerical analysis. In our analysis, we sometimes restrict ourselves to a class of signals R ⊂ H which need not be a linear space. We typically consider R defined by imposing a restriction on signals in H which is natural for real life signals of some type.

Definition 24.
Let H be a Hilbert space that we call the signal space. A class of signals R ⊂ H is a (possibly non-linear) subset of H. A sequence of discretizations of R is a sequence of (generally non-linear) subspaces {V M ⊂ H} ∞ M=1 that satisfies the following condition: for every s ∈ R there is a sequence The idea in discretizing a continuous frame is to find an envelope ψ M for each discrete space such that for any s M ∈ V M , 2. For a linear volume discretizable continuous frame f with respect to R and {V M } ∞ M=1 , and a fixed tolerance ǫ > 0 with a corresponding fixed C ǫ and envelope sequence {ψ M } ∞ M=1 satisfying (48) and (49), we call f together with R, {V M } ∞ m=1 , and {ψ M } ∞ M=1 , an ǫ-linear volume discretization (ǫ-LVD) of f .

Error in discrete stochastic phase space signal processing
Next, we study the error in discrete stochastic phase space signal processing of LVD frames. Since the energy of V f [s M ] may be shifted after applying an operator T on V f [s M ], we first introduce the following definition.
Definition 26. Let G be a phase space, T a bounded linear operator in L 2 (G), ψ and η two envelopes, and ǫ > 0. We say that T maps the energy of η to ψ up to ǫ, if The next theorem summarizes the expected approximation error in stochastic signal processing with ǫ-LVD frames.
Let T be a bounded operator on L 2 (G) that maps the energy of η M to ψ M up to ǫ. Then, the following two bounds are satisfied for every s M ∈ V M . 1.
2. If T is a uniformly square integrable PSI operator with bound D, then Proof. We first prove (53). By Lemma 21, Next, we bound the second term of (54). [ For the second term of the last line of (55), by the LVD property, To conclude, (54) together with (55), (56), and Theorem 20, give (53). Next, we prove (52). As before, We bound the second term of (57) using (50) and (56) by This, together with (57) and Theorem 20, leads to (52).
Next, we formulate concentration of error results for LVD frames. A Markov type concentration of error result can be derived directly from Theorem 27. For a Bernstein type error bound, we offer the following theorem only for the output stochastic signal processing pipeline (Definition 19.1).

Discrete stochastic time-frequency signal processing
In this subsection we present a discretization under which the STFT is LVD. In the companion paper [33] we present a discretization under which the CWT is linear volume discretizable. We analyze time signals s : R → C by decomposing them to compact time interval sections. Without loss of generality, we suppose that each signal segment is supported in [−1/2, 1/2]. Focusing on one segment, we take the signal class R as Let W > 0. For each M ∈ N, we consider the following phase space domain G M ⊂ G, where G is the STFT time-frequency plane, The area of G M in the time-frequency plane is Denote We consider ω > 0 and z > 0, and note that the other cases are similar. For each value of ω > M W , we decompose the integral (64) along z into the two integrals in z ∈ (0, (M + ω)/2) and z ∈ ((M + ω)/2, ∞). For z ∈ (0, (M + ω)/2), since ω ≥ M W and z ≤ (M + ω)/2, so |z − ω| −2κ obtains its maximum at z = (M + ω)/2. Thus, by (60), Integrating the bound (65) for ω ∈ (W M, ∞) gives For z ∈ ((M + ω)/2, ∞),q decays like Now, by (64) and (67), Integrating the bound (68) for Last, the bounds (66) and (69) are combined to give ( This means that given ǫ > 0, we may choose W large enough to guarantee < ǫ, and also guarantee that for every M ∈ N, ψ M 1 ≤ C ǫ M , with C ǫ = 2W (1 + 2S) by (62).

Applications of stochastic signal processing of continuous signals
In this section, we introduce two applications of the theory developed in this paper: integration of continuous frames and stochastic phase space diffeomorphism.

Integration of linear volume discretizable frames
Here, we show how to integrate a set of LVD continuous frames into one continuous LVD frame, while retaining all stochastic approximation bounds of a single LVD frame. We first show how to integrate frames.
Proposition 30. Let G and U be two topological spaces with σ-finite Borel measures µ G and µ U respectively, with µ U (U ) = 1. Let A, B, C > 0 and H be a Hilbert space. For each u ∈ U , let f ·,u : g → f g,u be a bounded continuous frame over the phase space G and the signal space H, with frame constants A, B and bound f g,u H ≤ C. Suppose that the mapping f : (g, u) → f g,u is continuous. Then f is a bounded continuous frame over the phase space G × U , with frame constants A, B and bound f g,u H ≤ C.
Proof. Consider the mapping V f that maps s ∈ H to the function V f [s] : (g, u) → s, f g,u .
By continuity of (g, u) → f g,u , V f [s] is continuous for every s ∈ H. Indeed Thus, for every s ∈ H, V f [s] : G × U → C is a measurable function. For each u ∈ U , denote by V f·,u the analysis operator corresponding to the continuous frame f ·,u . By Fubini-Tonelli theorem, for every signal s ∈ H Therefore, and {f g,u } (g,u)∈G×U is a continuous frame with frame bounds A, B.
Next, we show that the LVD property is retained under integration of frames.
Proposition 31. Consider the setting of Proposition 30. Let R ⊂ H be a signal class, and V M a discretization of R. Suppose that for every u ∈ U , f ·,u is an LVD frame. Let ǫ > 0 and {ψ M ∈ L 1 (G)} ∞ M=1 a sequence of envelopes. Suppose that for every u ∈ U , f ·,u is ǫ-LVD with respect to the envelopes {ψ M ∈ L 1 (G)} and the constant C ǫ . For each M , denote by ψ M ∈ L 1 (G × U ) the envelope (g, u) → ψ M (g). Then, f is an ǫ-LVD frame with respect to the envelopes {ψ M ∈ L 1 (G × U )} and the bound C ǫ .

Proof. By Fubini-Tonelli theorem
Last, we show how to integrate operators in phase space, and show that mapping the energy between envelopes up to ǫ (Definition 26) is preserved under integration.
Proposition 32. Consider the setting of Proposition 30. Let R ⊂ H be a signal class, and V M a discretization of R. Suppose that for every u ∈ U , f ·,u is an LVD frame. Let ǫ > 0 and {η M ∈ L 1 (G)} ∞ M=1 and {ψ M ∈ L 1 (G)} two sequences of envelopes. Suppose that for every u ∈ U , f ·,u is ǫ-LVD with respect to the envelopes {η M ∈ L 1 (G)} and the constant C ǫ . For each u ∈ U , let T u be a bounded operator in L 2 (G) with T u L 2 (G) ≤ C T . Suppose that for every M ∈ N and a.e. u ∈ U , T u maps the energy of η M to ψ M up to ǫ. Let T be the operator in Then T is bounded with T L 2 (G×U) ≤ C T , and maps the energy of Under the assumptions of Proposition 32, f and T satisfy the conditions of Theorems 27 and 28. This means that the number of random samples in the stochastic method in f , required for a given accuracy, is comparable to the number of samples required for each {f ·,u } u∈U . Namely, the addition of the new feature direction U to the phase space G does not entail an increase in computational complexity, and the approximation of the continuous method by the discrete Monte ). The above procedure of integrating continuous frames can be carried out when the definition of a certain continuous frame depends on some free parameters u. For example, in the STFT and the CWT, the window function and mother wavelet are free parameters. Instead of fixing the window function, we may consider a parametric space of window functions, parameterized by u, sharing the same linear volume discretization, and add the parameter u as additional dimensions to phase space. For example, in the CWT we may choose as u the spread of the mother wavelet. Integration of frames is the basis on which we construct the LTFT in Definition 33 below.

Stochastic diffeomorphism operator and highly redundant phase vocoder
In this subsection, we study the signal processing pipeline when T is a diffeomorphism operator, and propose a potential application in audio signal processing. Let d : G → G be a diffeomorphism (invertible smooth mapping with smooth inverse), with Jacobian J d ∈ L ∞ (G). Consider the diffeomorphism operator T , defined for any F ∈ L 2 (G) by Note that T 2 = J d ∞ . Let r : C → C satisfy |r(x)| ≤ E |x|. The signal processing pipeline based on the diffeomorphism T is defined to be P f,T,r S −1 f s for the synthesis pipeline, and S −1 f P f,T,r s for the analysis pipeline. The stochastic approximations of these pipelines are given on s M ∈ V M , up to the application of S −1 f from the right or from the left, by In (72), the points {g k } K k=1 are sampled from the envelope η M . This means that the points {d(g k )} K k=1 are sampled from the envelope ψ M (g) = η M d(g) J d (g) with ψ M 1 = η M d(·) J d (·) 1 = η M 1 . We can use either Theorem 27 or Theorem 28 to bound the stochastic approximation error, and in either case we obtain an error of order O( √ dim(VM ) √ K ).

Integer time dilation phase vocoder
A time stretching phase vocoder is an audio effect that slows down an audio signal without dilating its frequency content. In the classical definition, G is the time frequency plane, and V f is the STFT. Phase vocoder can be formulated as phase space signal processing in case the signal is dilated by an integer [52,Section 7.4.3]. For an integer ∆, we consider the diffeomorphism operator T with d(g 1 , g 2 ) = (∆g 1 , g 2 ), and consider the nonlinearity r, defined by for a ∈ R + and θ ∈ R. The phase vocoder is defined to be . Note that since the STFT is a Parseval frame, there is no difference between analysis and synthesis signal processing.
Next, we replace the STFT frame in the phase vocoder with a highly redundant time-frequency representation, based on a 3D phase space.

The localizing time-frequency transform
Here, we construct an example redundant time frequency transform based on a combination of CWT atoms and STFT atoms. The CWT is better than the STFT at isolating transient high frequency events, since middle to high frequency wavelet atoms have shorter time supports than LTFT atoms. On the other hand, low frequency events are smeared by the CWT, since low frequency wavelet atoms have large supports. We thus use STFT atoms to represent low frequencies, and CWT atoms to represent middle frequencies. High frequencies are represented again by STFT atoms with narrow time supports. This is done to potentially avoid false positive detection of very short transient events by very short wavelet atoms.
We then add to this 2D time-frequency system a third axis that controls the number of oscillations in the CWT atoms. We motivate this as follows. Time-frequency atoms are subject to the uncertainty principle. The more accurately a time-frequency atom measures frequency, the less accurately it measures time. Different signal features call for a different balance between the time and the frequency measurement accuracy. In polyphonic audio signals we expect a range of such appropriate balances, which means that no choice of window is appropriate for all features. Hence, the addition of the number of oscillations axis may be useful for representing a variety of features in polyphonic audio signals.
Definition 33 (The localizing time-frequency continuous frame). Consider the above setting. The LTFT atoms are defined for (x, ω, τ ) ∈ R 2 × [τ 1 , τ 2 ], where x represents time, ω frequencies, and τ the number of wavelet oscillations, by In the companion paper [33] we prove that the LTFT is an LVD continuous frame. This is natural in view of Subsection 6.1, since, up to the low and high frequency truncation, the LTFT is based on integrating LVD wavelet transforms.

LTFT-based phase vocoder
In this subsection we offer a toy example application of the LTFT, namely, integer time stretching phase vocoder based on the implementation of Subsection 6.2.2. The integer time dilation phase modification r of (73) is said to preserve the horizontal phase coherence, and deals well with slowly varying instantaneous frequencies [39]. When the instantaneous frequency of a component in a signal changes rapidly, the phase modification r is not sufficient, and methods for "locking" the phase to a frequency bin outside the horizontal line are used in modern implementations [31,32]. Nevertheless, in this toy application we consider horizontal phase lock. One potential motivation for using this simplistic model is that phasiness artifacts 1 may be alleviated by using CWT atoms, since CWT atoms have shorter time supports than STFT atoms.
An example implementation of stochastic LTFT phase vocoder (72) is given in https://github.com/RonLevie/LTFT-Phase-Vocoder. In the companion paper [33], we prove that the total number of scalar operations in the phase vocoder LTFT method is O(ZM + M log(M )), when the number of Monte Carlo samples is K = ZM . In future work, we will study more modern phase vocoder implementations based on the LTFT, akin to [34,16,45,41].
By the fact that projections reduce norms, χ j (g) ≤ χ(g) for every g ∈ G K 0 . Moreover, χ j is a pointwise monotone sequence of measurable functions. By the strong convergence of the projections P j to I, we have ∀g ∈ G K 0 . lim j→∞ χ j (g) = χ(g).

B Pseudo inverse of frame analysis and synthesis operators
For an injective linear operator with close range B : V → W between the Hilbert spaces V and W, we define the pseudo inverse