Equivalent Norms for Modulation Spaces from Positive Cohen’s Class Distributions

We give a new class of equivalent norms for modulation spaces by replacing the window of the short-time Fourier transform by a Hilbert–Schmidt operator. The main result is applied to Cohen’s class of time-frequency distributions, Weyl operators and localization operators. In particular, any positive Cohen’s class distribution with Schwartz kernel can be used to give an equivalent norm for modulation spaces. We also obtain a description of modulation spaces as time-frequency Wiener amalgam spaces. The Hilbert–Schmidt operator must satisfy a nuclearity condition for these results to hold, and we investigate this condition in detail.

Together, ψ andψ describe the behaviour of ψ as a function of time and frequency, respectively, and give us different approaches to study properties of ψ. For instance, smoothness of ψ is related to decay ofψ. But althoughψ shows which frequencies ω contribute to ψ -those such that |ψ(ω)| is large -it does not indicate when, i.e. for which t ∈ R d , the frequency contributes to ψ. In time-frequency analysis one therefore looks for time-frequency distributions Q(ψ), which should be a function on R 2d such that the size of Q(ψ)(x, ω) describes the contribution of frequency ω at time x in ψ.
The existence of an ideal time-frequency distribution Q is prohibited by various uncertainty principles, but a common choice in time-frequency analysis is the shorttime Fourier transform (STFT) where the window ϕ is a function on R d well-localized in time and frequency, and π(z) denotes the time-frequency shift for z = (x, ω) given by The modulation spaces M p,q m (R d ) are then defined, for 1 ≤ p, q ≤ ∞ and a weight function m on R 2d , by the norm where ϕ 0 (t) = 2 d/4 e −π |t| 2 and the integrals are replaced by supremums for p, q = ∞. By our interpretation of V ϕ 0 ψ(x, ω) as a time-frequency distribution, we see that ψ M p,q m measures how localized ψ is in the time-frequency plane. More precisely, L p measures the decay of ψ in time, and L q the decay of ψ in frequency -i.e. the decay ofψ, or the smoothness of ψ. The fact that ψ M p,q m is finite is therefore a statement on the decay and smoothness of ψ.
A useful result on modulation spaces from [17] is that replacing the window ϕ 0 in (1) by another window ϕ with good time-frequency localization, we obtain an equivalent norm on M p,q m (R d ): The main result of this contribution is an extension of this fact: we show that the window can even be replaced by a Hilbert-Schmidt operator S on L 2 (R d ). To explain this transition from function-windows to operator-windows, we fix an arbitrary ξ ∈ L 2 (R d ) with ξ L 2 = 1 and consider the rank-one operator S = ξ ⊗ ϕ defined by It is easy to see that Sπ(z) * ψ L 2 = |V ϕ ψ(z)|, hence we may reformulate (2) as Our main result in Theorem 5.1 states that this holds not only for rank-one S as in (3), but for all Hilbert-Schmidt operators S having good time-frequency localization-a statement that itself will need elaboration. By choosing different S we will see that we obtain equivalent norms for the modulation spaces that express quite different properties from those expressed in (1), hence giving new insights into the structure of modulation spaces.
Comparing (1) and (4), we see that the STFT |V ϕ ψ(z)| is replaced by Sπ(z) * ψ L 2 . This suggests that we replace the STFT by the function V S : R 2d → L 2 (R d ) given by In Sect. 4 we show that V S actually behaves like the usual STFT V ϕ , by showing that it satisfies an isometry property and an inversion formula. This insight allows us to prove (4) in Sect. 5 using methods similar to those used to prove that the modulation spaces are independent of the window function in [23]. Sections 6, 7 and 8 are then devoted to examples and reinterpretations of the main result. First we consider Weyl operators in Sect. 6. The reformulation of (4) in Theorem 6.1 generalizes a result by Gröchenig and Toft [26] that identifies certain modulation spaces with function spaces introduced by Bony and Chemin [9].
In Sect. 7 we turn our attention to Cohen's class of time-frequency distributions. As there is no ideal time-frequency distribution, Cohen's class was introduced by Cohen in [10] as the time-frequency distributions Q a given by where a is some function (or distribution) on R 2d and W (ψ) is the Wigner-distribution, see (26) for its definition. By varying a one obtains time-frequency distributions with different properties. An important example of a Cohen's class distribution is the spectrogram Q(ψ)(z) = |V ϕ 0 ψ(z)| 2 . Then (1) shows that the modulation space norm of ψ is given by the L p,q m -norm of (the square root of) Q(ψ). We might therefore ask whether this is true if we replace the spectrogram by another Cohen's class distributions Q a . Using a description of Cohen's class in terms of bounded operators given in [34] together with (4), we are able to give in Theorem 7.1 a set of Cohen's class distributions whose L p,q m norms define the modulation space norms. The question of characterizing these Cohen class distributions Q a in terms of a seems to be a difficult problem in general. However, using a result from [31] we are able to prove the following in Theorem 7.4: Let 1 ≤ p, q ≤ ∞ and assume that the weight m grows at most polynomially. If a is a Schwartz function on R 2d and Q a (ψ) is a positive function for each .
Finally, we let S in (4) be a localization operator in Sect. 8. This leads to a characterization of modulation spaces as time-frequency Wiener amalgam spaces in Theorem 8.1, which is a continuous version of results by Dörfler, Feichtinger and Gröchenig [14,15], see also [16,36], much like the fact that the standard Wiener amalgam spaces have both a continuous and discrete description. We mention that [1,8,26,27] also use localization operators to get equivalent norms for modulation spaces, but their approach and results are different from those we consider. Before ending this introduction, we wish to point out that sufficient conditions on S for (4) to hold will be a recurring theme throughout the paper. The most general sufficient condition on S is that its Hilbert space adjoint must be a nuclear operator from In some ways this is a very natural condition: if applied to the rank-one operator in (3) it means that ϕ ∈ M 1 v (R d ), which is the standard condition for windows for modulation spaces. As we see in Sect. 3, this nuclearity condition is also easy to handle when working with localization operators. From other perspectives, such as the Weyl calculus, the condition is more mysterious, and we will therefore also study stronger sufficient conditions on S for (4) to hold.

Notation and Conventions
If X is a Banach space, we denote by X its dual space and the action of y ∈ X on x ∈ X is denoted by the bracket y, x X ,X , where the bracket is antilinear in the second coordinate to be compatible with the notation for inner products in Hilbert spaces. This means that we are identifying the dual space X with antilinear functionals on X . For two Banach spaces X , Y we denote by L(X , Y ) the Banach space of bounded linear operators S : X → Y , and if X = Y we simply write L(X ). For brevity we often write L( For p ∈ [1, ∞], p denotes the conjugate exponent, i.e. 1 p + 1 p = 1. The notation P Q means that there is some C > 0 such that P ≤ C · Q, and P Q means that Q P and P Q. For ⊂ R 2d , χ is the characteristic function of . S (R d ) denotes the Schwartz space, and S (R d ) its dual space of tempered distributions.

Time-Frequency Analysis
As we have seen in the introduction, our main results are phrased in terms of the time-frequency shifts π(z) ∈ L(L 2 ) for z = (x, ω) ∈ R 2d , defined by The time-frequency shifts are unitary on L 2 (R d ), and they satisfy for x, x , ω, ω ∈ R d . Closely related to the time-frequency shifts is the short-time Fourier transform (STFT) V ϕ ψ ∈ L 2 (R 2d ), given by The function ϕ is often referred to as the window of the STFT V ϕ ψ.
In particular, we see that for fixed window ϕ with ϕ 2 = 1 the map ψ → V ϕ ψ is an isometry from L 2 (R d ) to L 2 (R 2d ).

Admissible Weight Functions and Weighted, Mixed L p Spaces
A submultiplicative weight function v on R 2d is a non-negative function v : Whenever we refer to a submultiplicative weight function v we will assume that v is continuous and satisfies ; these assumptions do not lead to a loss of generality as any submultiplicative weight function is equivalent in a natural sense to a weight satisfying these assumptions, see [23,25]. Furthermore, these assumptions imply that if v is not identically 0, then v(z) ≥ 1 for all z ∈ R 2d . The assumptions above are satisfied by standard examples such as the polynomial weights v s (z) = (1 + |z| 2 ) s/2 s ≥ 0, but also by the exponential weights v a (z) = e a|z| for a ≥ 0. A non-negative weight function m on R 2d is said to be v-moderate if v is a submultiplicative weight function and there exists some constant C m v > 0 such that We refer the reader to the survey [25] for more examples and motivation for these assumptions. For any v-moderate weight m and 1 ≤ p, q ≤ ∞ we may define the Banach space L p,q m (R 2d ) to be the equivalence classes of Lebesgue measurable functions F : R 2d → C such that If p = ∞ or q = ∞, the corresponding integral is replaced by an essential supremum.

Modulation Spaces
Throughout the rest of the paper, we will let ϕ 0 ∈ L 2 (R d ) denote the normalized Gaussian, i.e.
For a submultiplicative weight v, we define the space This will serve as our space of test functions. Since v is submultiplicative, M 1 v (R d ) is non-empty as it contains ϕ 0 [25,Lem. 4.4], and for weights v of polynomial growth it contains the Schwartz functions S (R d ) [23,Prop. 11.3.4]. For more general weights M 1 v (R d ) will not necessarily contain S (R d ) and might be quite small. The timefrequency shifts π(z) are bounded on M 1 v (R d ) [23,Thm. 11.3.5] with and hence the STFT can be defined by modifying the inner product in the definition (7) to a duality bracket: For any v-moderate weight m and 1 ≤ p, q ≤ ∞, we then define the modulation space M p,q When p = q we will write M p m (R d ) for M p, p m (R d ), and when m ≡ 1 we write M p,q (R d ). Some properties of the modulation spaces are summarized below, proofs may be found in the monograph [23].

Proposition 2.2 Let m be a v-moderate weight and
Remark 1 (a) As a particular case of part c), we may identify , which we will do for the rest of the paper. The reader should also note that the duality extends the inner product on when v grows polynomially, so in this case we may identify M ∞ 1/v (R d ) with a subspace of the tempered distributions. This is not true for more general weights, hence we need to work with the abstract space M ∞ 1/v (R d ) defined as the dual space of our test functions The property of modulation spaces that is our main focus is the fact that changing the window for the STFT leads to an equivalent norm [23,Prop. 11.3.2].

Theorem 2.3 Let m be a v-moderate weight function and let
Our main result is that we also obtain equivalent norms for M p,q m (R d ) when φ is replaced by an operator S satisfying certain conditions, after modifying the definition of the STFT correspondingly. To prove this, we will use the precise statement of the upper bound V φ ψ L p,q m ψ M p,q m ; it follows from equation (11.33) in [23].

Classes of Operators for Time-frequency Analysis
Our main result rests upon properties of certain classes of operators, all of which may be described as integral operators.

Hilbert-Schmidt Operators
Given a function k ∈ L 2 (R 2d ), we define the (necessarily bounded) integral operator We call k the integral kernel of the operator T k . When equipped with the inner product the set of integral operators T k with integral kernels k ∈ L 2 (R 2d ) forms a Hilbert space of compact operators called the Hilbert-Schmidt operators, which we will denote by HS. Given T ∈ HS, we will sometimes denote its integral kernel by k T , which means that T = T k T . An important subspace of HS is the space S of trace class operators, consisting of those T ∈ HS such that ∞ n=1 |T |e n , e n L 2 < ∞, where {e n } ∞ n=1 is any orthonormal basis of L 2 (R d ) and |T | is the positive part in the polar decomposition of T . If T is a trace class operator, we may therefore define its trace tr(T ) by tr(T ) = ∞ n=1 T e n , e n L 2 , which can be shown to be independent of the orthonormal basis. For our part, we will need that if S, T ∈ HS, then ST is a trace class operator. In particular, this allows us to express the inner product on HS without reference to their kernels as integral operators, as one may show (see [13,Thm. 269]) that S, T HS = tr(ST * ).

A Space of Nuclear Operators
Both Hilbert-Schmidt and trace class operators will often be too large spaces for our purposes. We therefore introduce a Banach subspace of HS more adapted to the needs of time-frequency analysis. Let v be a submultiplicative weight function. The space we will need is the space N (L 2 ; M 1 v ) consisting of all nuclear operators is said to be nuclear [37] if it has an expansion of the form where φ ⊗ ψ denotes the rank-one operator becomes a Banach space with norm given by where the infimum is taken over all decompositions as in (9). It can be shown that if hence the expansion in (9) converges absolutely in N ( and . We will need the following simple property. by (8) and the fact that π(z) is unitary on L 2 (R d ). The norm inequality then follows from the definition (10) of the nuclear norm.
To be more precise, the class of operators we will be interested in are those S ∈ HS such that S * ∈ N (L 2 , M 1 v ), where S * is the Hilbert space adjoint of S. We can give a much more concrete description of this condition by noting that if Furthermore, given an expansion of S of the form (13), this extension satisfies where the sum converges absolutely in L 2 (R d ).
Proof The definition (14) simply means thatS is the Banach space adjoint of S * : we see thatS extends S. The absolute convergence of the sum in (15) follows directly from (13). To show that the decomposition into rank-one operators still holds forS, we need to show that for which is a straightforward calculation using the expansion of S * in (9) and the fact that all expansions converge absolutely in an appropriate Banach space, so that we may take the duality brackets inside the sum. The details are left for the reader.
In what follows we will simply denote the extensionS by S.
The fact that we use the Hilbert space L 2 (R d ) is not strictly necessary. We could have considered any separable Hilbert space H, and required that S ∈ L(L 2 , H) with S * ∈ N (H, M 1 v ). The result above would still hold, as would the main result of this paper. Our reason for considering H = L 2 (R d ) is that it gives us easier access to nontrivial examples, as it allows us to formulate our results in terms of integral operators as we explain in detail in the next subsection.

The Projective Tensor Product
The theory of nuclear operators is closely related to the projective tensor product of Banach spaces, as explained for instance in [37], which leads to a useful connection to integral operators. Abstractly, the projective tensor product X⊗Y of two Banach spaces X , Y is the completion of the algebraic tensor product X ⊗ Y with respect to the norm One can show (see [37,Prop. 2.8]) that X⊗Y consists precisely of elements ∞ n=1 x n ⊗ y n such that ∞ n=1 x n X y n Y < ∞. When X and Y are function spaces on R d , which is the case we will consider, we identify the elementary tensors x ⊗ y for x ∈ X and y ∈ Y with the function By definition, this means that we have a decomposition where x n ⊗ y n now denotes a rank-one operator. Hence if we apply this to X = M 1 v (R d ) and Y = L 2 (R d ) (since all function spaces we consider are invariant under complex conjugation, we need not pay any attention to the fact that y n appears in place of y n ), we see that .
. Surjectivity and boundedness follow from above. Injectivity is not too difficult to show in this case, but for more general Banach spaces X and Y the injectivity of the natural map from X⊗Y * onto N (Y , X ) boils down to the approximation property for Banach spaces [37,Cor. 4.8].
The slightly awkward condition T * ∈ N (L 2 , M 1 v ) may similarly be reformulated as requiring , this is essentially the content of (15). This condition cannot be reformulated as nuclearity of T , which is why we have opted for phrasing it as T * ∈ N (L 2 , M 1 v ). We also mention that there is a natural isomorphism . Formulating our assumption on T by requiring k T to belong to some projective tensor product makes it possible to relate T * ∈ N (L 2 , M 1 v ) to other spaces of operators. For instance, we may identify the trace class operators S 1 as the operators S ∈ HS such that k S belongs to the projective tensor product have also been studied recently in [38], where this space of operators is denoted by B v⊗v . It follows by [2,Thm. 5 with equivalent norms, where v⊗v( The particular case B := B 1⊗1 corresponding to v ≡ 1 has been studied in several other sources, see for instance [20,21]. We summarize this discussion, which essentially amounts to prodding the definitions in various ways, in a proposition.

Proposition 3.3 Given T ∈ HS and a submultiplicative weight function v, then
At the level of k T we have the inclusions which at the operator level leads to the inclusions The same inclusion holds when N (

Examples of Nuclear Operators
The connection to the projective tensor product allows us to write down some examples of S * ∈ N (L 2 , M 1 v ). [31]. If the submultiplicative weight v grows at most polynomially, then so does the weight function v⊗v( Example 3.5 (The Feichtinger algebra and the inner kernel theorem) By Proposition . This class of operators was recently studied in [38], where the reader may find a proof that T belongs to this space if and only if its Hilbert space adjoint T * does.
The unweighted case has been studied by several sources [20,21,32,39]. We mention in particular that [20,21] give a characterization of such operators that is independent of their kernel as an integral operator: sending weak* convergent sequences to norm-convergent sequences.
We now consider finite rank operators. By choosing S of the form in this example, we will be able to recover Theorem 2.3 from our main result, see also Example 4.2.

Example 3.6 (Finite rank operators) For
. This S is just a convenient way of storing the functions φ n in an operator -by applying S to ξ m for 1 ≤ m ≤ N we recover φ m .

Localization Operators
We also have some methods for producing new examples of operators in N ( is a normed space we may of course take linear combinations, but a more interesting method is to use the quantum convolutions introduced by Werner [43]. Given f ∈ L 1 (R 2d ) and a trace class operator S ∈ S, the convolution of f with S is defined to be the trace class operator f S given by the Bochner integral In particular, if we pick S to be a rank-one operator ϕ 2 ⊗ ϕ 1 for ϕ 1 , ϕ 2 ∈ L 2 (R d ), we find that is the time-frequency localization operator [11,12] given by

This integral converges as a Bochner integral in
The result for A (11).
It is easy to check that the Hilbert space adjoint of f S is f S * . Hence we immediately obtain the following.

Underspread Operators
When operators between function spaces are used to model communication channels, the resulting operators will typically be (at least approximately) underspread [40]. An underspread operator T ∈ HS is of the form where the support of F is contained in is called the spreading function of T , and one can show that any T ∈ HS has a spreading function in L 2 (R 2d ), as long as the integral in (18) is interpreted appropriately [21]. In quantum harmonic analysis the spreading function is considered a Fourier transform of the operator [43]. The next lemma shows that underspread trace class operators belong to B, i.e. have integral kernel in M 1 (R 2d ) ⊂ N (L 2 ; M 1 ). This is an operator-version of the well-known fact that band-limited integrable functions belong to M 1 (R d ) [22,Cor. 3.2.7]. The proof is moved to an appendix, as it requires the introduction of several results from quantum harmonic analysis that will not be needed later in the paper.

Proposition 3.9
If the spreading function of T ∈ S has compact support, then T ∈ B ⊂ N (L 2 ; M 1 ).

Time-Frequency Analysis with Operators as Windows
A fundamental object in time-frequency analysis is the short-time Fourier transform (STFT) V φ ψ with window φ. The goal of this section is to define an STFT where the window φ is replaced by an operator S, and to show that the basic properties of the STFT remain true for this generalized STFT. As a first step, we will need the Hilbert space L 2 (R 2d ; L 2 ) of equivalence classes of strongly Lebesgue measurable : The equivalence relation on We then define a version of the short-time Fourier transform with operators as windows. For S ∈ HS and ψ ∈ L 2 (R d ) we let

Remark 3 When S is a localization operator A
ϕ,ϕ f , the short-time Fourier transform above is closely related to the vector-valued analysis operator introduced by Romero in [36] to obtain equivalent norms for modulation spaces (and several other spaces) from certain discrete expressions. See (36) for the precise expression.
We obtain a generalization of Moyal's identity. It shows that V S is a linear isometry from L 2 (R d ) to L 2 (R 2d ; L 2 ).

Example 4.2
To see that V S actually generalizes the usual STFT, consider φ ∈ L 2 (R d ) and let ξ ∈ L 2 (R d ) be any function satisfying ξ L 2 = 1. Then let S = ξ ⊗ φ. For any ψ ∈ L 2 (R d ) we then have which contains precisely the same information as V φ ψ(z) given that we know ξ.
In particular, it is easy to show that S 2 It is well-known that the STFT z → V ξ ψ(z) is continuous for any ξ ∈ L 2 (R d ), in particular for ξ = S * φ, hence the map is Lebesgue measurable.
We then define for ∈ L 2 (R 2d ; The integral (19) is interpreted in a weak sense: we will see that so it follows from the Riesz representation theorem for Hilbert spaces that there must exist an element in L 2 (R d ), which we denote by R 2d π(z)S * (z) dz, such that for The next lemma shows that the integral in (19) is well-defined in this sense. Proof Let ∈ L 2 (R 2d ; L 2 ) and let φ ∈ L 2 (R d ). We need to show (20), as mentioned (21) then defines an element R 2d π(z)S * (z) dz of L 2 (R d ) by Riesz' representation theorem. We find that by Lemma 4.1. It is clear that V * S is linear, and the estimate also shows that it is bounded from L 2 (R d ; L 2 (R d )) to L 2 (R d ). A simple calculation shows that it is the adjoint of V S . The second part states that

Equivalent Norms for Modulation Spaces
The generalized Moyal identity in Lemma 4.1 shows that the norm of V S (ψ) in L 2 (R 2d ; L 2 ) is equivalent to the norm of ψ in L 2 (R d ). We will now generalize Theorem 2.3 by showing that if S satisfies some extra assumptions, the same is true if , where 1 ≤ p, q ≤ ∞ and m is some v-moderate weight. As before, v always denotes a submultiplicative weight function on R 2d .
We start by defining L p,q m (R 2d ; L 2 ). For 1 ≤ p, q ≤ ∞ and any v-moderate weight m, the Banach space L p,q m (R 2d ; L 2 ) consists of the equivalence classes of strongly Lebesgue measurable functions : where ∼ if (z) = (z) for a.e. z ∈ R 2d . When p = ∞ or q = ∞ the definition is modified in the usual way by replacing integrals by essential supremums. With this definition in place, we are ready to state our main result.
Theorem 5.1 Let 0 = S ∈ HS such that S * ∈ N (L 2 , M 1 v ). For any 1 ≤ p, q ≤ ∞ and v-moderate weight m, we have Our proof will follow the same structure as the usual proof that M p,q m is independent of the window function [23]: we will show that V S is bounded from M p,q Before we start, we make sure that there is no ambiguity in interpreting makes sense by Lemma 3.2, as S extends to a bounded operator from

Lemma 5.2 Let m be a v-moderate weight. For any
Proof Throughout the proof we will use the expansion in (15) to write Then This implies that hence the triangle inequality for L p,q We then apply Proposition 2.4 to get Using the definition of S * N from (10) we get that In order to give a sensible definition of V * S ( ) for ∈ L p,q m (R 2d ; L 2 ), we will need Hölder's inequality for the mixed-norm spaces L p,q m (R 2d ) [3,23]: For any ∈ L p,q m (R 2d ; L 2 ) we then define V * S ( ) as an element of M ∞ 1/v (R d ) by duality: To see that this actually defines a bounded linear functional on where the last inequality uses that for all 1 ≤ p, q ≤ ∞ and all v-moderate weights m. The reader should observe that this definition agrees with our original definition (19) when ∈ L 2 (R 2d ; L 2 ).

Lemma 5.3 Let m be a v-moderate weight. For any
Proof As a short preparation, we consider V S (π(z)φ) for φ ∈ L 2 (R d ). By definition With z = (x, ω) and z = (x , ω ), we find using (5) and (6) that z). (24) Recall that ϕ 0 is the L 2 -normalized Gaussian on R d , and that the norm on M p,q We therefore calculate that which in light of (25) gives where we have used Lemma 5.2 in the last step. The reader should also note that V S (ϕ 0 ) L 1 v (R 2d ;L 2 ) = G L 1 v is a straightforward computation, but relies on our assumption that v(−z) = v(z).
Finally, we also need that the inversion formula V * S V S ψ = S 2 HS ψ from Lemma 4.3 remains valid on the other modulation spaces.
As a preliminary step, we rewrite the left hand side of this expression in a way that involves explictly the action of ψ as a functional: (22).
Hence it suffices to show that , this holds by Lemma 4.3. To proceed, we will use that for any 1/v such that ψ n converges to ψ in the weak* topology of M ∞ 1/v (R d ) as n → ∞; a construction of such a sequence may be found in the proof of [15,Cor. 7]. Let us define Using the upper expression for n above, we have that n → S 2 as n → ∞ by the weak* convergence of ψ n to ψ. Using the lower expression, we find -assuming for now that the limit may be taken inside the integral -that Hence we have shown that which means that we are done once the interchange of the limit and integral has been justified. For each n we may bound the integrand by where we use (8) and (12) to move to the second line. Since φ ∈ M 1 v (R d ), it follows by Lemma 5.2 that z → v(z) · V S (φ)(z) L 2 is an integrable function. Hence we may apply the dominated convergence theorem.
The proof of Theorem 5.1 is now straightforward.

Proof of Theorem 5.1 The upper bound
is the content of Lemma 5.2. By using the inversion formula and Lemma 5.3 we obtain which implies the lower bound.

Remark 5
A different proof of a lower bound, more in line with the arguments in the proof of [26, Prop. 2.2] (see Sect. 6 for more on this result), is to use that S has a singular value decomposition where λ n is a summable sequence of non-negative numbers and {η n } ∞ n=1 , {ξ n } ∞ n=1 are orthonormal sequences in L 2 (R d ). It is easy to check that since S * is bounded from Then we find that Hence Sπ(z) * ψ L 2 ≥ λ 1 |V ξ 1 ψ(z)|, which leads to a lower bound by Theorem 2.3.
We have chosen to prove the lower bound in terms of V * S to emphasize the interpretation of our results as an STFT with operators as windows.
As a first example we make sure that our result includes the well-known window independence from Theorem 2.3 as a special case.
By the orthonormality of the ξ n 's we therefore have It follows by Theorem 5.1 that In particular, if N = 1 we recover Theorem 2.3 in the form and it is easy to show that in this case

The Weyl Calculus and Bony-Chemin Spaces
In Sect. 3.1 we defined Hilbert-Schmidt operators as integral operators, but any Hilbert-Schmidt operator can also be described as a Weyl operator. To define Weyl operators, we first introduce the cross-Wigner distribution of φ, ψ ∈ L 2 (R d ), which is the function When ψ = φ we write W (ψ) = W (ψ, ψ). Given a ∈ L 2 (R 2d ), we can define the Weyl operator L a ∈ HS by requiring that The operator L a is called the Weyl transform of a, and a is the Weyl symbol of L a . It is well-known that the Weyl transform a → L a is unitary from L 2 (R 2d ) to HS. In particular, every T ∈ HS has a unique Weyl symbol a ∈ L 2 (R 2d ) such that T = L a . An interesting property of the Weyl symbol is its interaction with the time-frequency shifts. In fact, we have by [33,Lem. 3.2] that where We may therefore reformulate Theorem 5.1 in terms of the Weyl transform.
The above theorem generalizes a result by Gröchenig and Toft in [26,Prop. 2.2], who showed that the the middle expression above defines an equivalent norm on under the assumptions that m is of polynomial growth and a is a Schwartz function (stronger conditions are stated in [26], but their proof uses only that a ∈ S (R d )). In fact, it is shown in [26] that the space of ψ ∈ S (R d ) such that the right hand side of (27) is finite coincides with a space H (m, g) introduced by Bony and Chemin [9, Def. 5.1] when g is the standard Euclidean metric on R 2d . Hence (27) states that H (m, g) = M 2 m (R d ) with equivalent norms. Theorem 6.1 extends (27) in several directions. It extends from p = q = 2 to any 1 ≤ p, q ≤ ∞ and from polynomial weights to general v-moderate weights. Our requirements on the Weyl symbol a are also weaker, although this is slightly obscured by the mysterious requirement that (L a ) * ∈ N (L 2 , M 1 v ). By Proposition 3.3 the condition S * ∈ N (L 2 , M 1 v ) means that the integral kernel k S belongs to the projective tensor product L 2 (R d )⊗M 1 v (R d ), and the Weyl symbol a and k S are related by [29] , ω e 2πiω·(x−y) dω. (28) Understanding the condition (L a ) * ∈ N (L 2 , M 1 v ) thus boils down to understanding what assumptions we need on a to ensure that the kernel k S in (28) belongs to

Polynomial Weights
By restricting our attention to polynomial weights v s (z) = (1 + |z| 2 ) s/2 for s ≥ 0, we obtain some sufficient conditions for (L a ) * ∈ N (L 2 , M 1 v s ), so that Theorem 6.1 holds.

Example 6.2 (Schwartz symbols)
If v = v s for s ≥ 0, we know from Example 3.4 that the Schwartz operators S, i.e. operators T with k T ∈ S (R 2d ), form a subspace of N (L 2 , M 1 v ). Furthermore, the space S is closed under taking adjoints, and may equivalently be described as the Weyl operators L a with a ∈ S (R 2d ) [31]. Taken together, this means that a ∈ S (R 2d ) implies (L a ) * ∈ S ⊂ N (L 2 , M 1 v s ). Thus Theorem 6.1 applies for all Schwartz functions a.
We then prove a slightly more refined result. Below we denote by v 4d s the weight function on R 4d given by v 4d One easily checks that v s⊗ v s v 4d 2s , which implies by part b) of Propositions 2.2 and 3.3 that By [29,Prop. 7.4 . By the chain of inclusions above, it follows that k L a ∈ When s = 0 the condition above is rather weak, as M 1 (R 2d ) even contains nondifferentiable functions.

Cohen's Class
Another interesting interpretation of Theorem 5.1 is in terms of Cohen's class of timefrequency distributions introduced by Cohen in [10]. Typically the definition of the Cohen's class distribution Q a associated with a ∈ S (R 2d ) is that [23] Q a (ψ) = a * W (ψ) for any ψ ∈ S (R d ). (29) One can show that ψ ∈ S (R d ) implies that W (ψ) ∈ S (R 2d ), so (29) is welldefined as the convolution of a tempered distribution with a Schwartz function. All our examples will satisfy a ∈ L 2 (R 2d ), and in this case Q a (ψ) is defined by (29) for any ψ ∈ L 2 (R d ), as a slight modification of Moyal's identity gives that W (ψ) ∈ L 2 (R 2d ), so (29) is well-defined by Young's inequality.
In [34] we have given an alternative description of Cohen's class. Given a Hilbert-Schmidt operator T ∈ HS, we define the Cohen's class distribution Q T associated with T by Any Cohen class distribution Q a for a ∈ L 2 (R 2d ) can equivalently be described using (30), since it follows from [34,Prop. 7 where L denotes the Weyl transform andǎ(z) = a(−z). From now on we will therefore write Cohen's class distributions in the form Q T for T ∈ HS rather than using (29).
In light of (30) we clearly have the relation , and we see that another reinterpretation of Theorem 5.1 is the following. . Example 7.2 (Spectrograms) To see why the square root appears in Theorem 7.1, it is worth recalling the simple case of S = ξ ⊗ φ for some 0 = φ ∈ M 1 v (R d ) and ξ L 2 = 1. Then S * S = φ ⊗ φ, and one may check that This is the so-called spectrogram of ψ with window φ, and we know from Theorem 2.3 that ψ M p,q m V φ ψ L p,q m , hence we need the square root in Theorem 7.1. Remark 6 We have skipped one technical detail in the Theorem 7.1 above, namely how to interpret Q T (ψ) for ψ ∈ M ∞ 1/v (R 2d ). This is certainly not immediately covered by (29) or (30). We solve this issue by rewriting Q T (ψ) to Q T (ψ) = π(z) * ψ, T * π(z) * ψ L 2 and then replacing the bracket by duality: [38,Prop. 4.1] for a proof. It is straightforward to check that (30) and (32) agree when ψ ∈ L 2 (R d ), and that Q T (ψ)(z) = V S (ψ)(z) 2 L 2 when T = S * S.

On Positive Cohen Class Distributions
The reader will not fail to notice that the Cohen class distributions for which Theorem 7.1 applies are of a particular kind, namely of the form Q T for T = S * S with S * ∈ N (L 2 , M 1 v ). The condition S * ∈ N (L 2 , M 1 v ) may be interpreted as requiring a certain timefrequency localization for Q S * S , as one can show that S * ∈ N ( , which we know from Example 7.2 corresponds to choosing the window φ for the modulation spaces, then S * S = φ ⊗φ, which has integral kernel in Hence requiring S * ∈ N (L 2 , M 1 v ) seems like a natural generalization of the assumption in Theorem 2.3 that windows φ for modulation spaces need to satisfy φ ∈ M 1 v (R d ). In addition, the fact that T = S * S means that T is a positive operator. By [34,Prop. 7.3], this is equivalent to Q T (ψ) being a non-negative function for each ψ ∈ L 2 (R d ). This assumption cannot simply be replaced by considering |Q T (ψ)|, as the following example shows.

Example 7.3
Let φ 1 and φ 2 be compactly supported functions in S (R d ) such that their supports do not overlap. Define T = φ 1 ⊗ φ 2 . Then the integral kernel (or equivalently the Weyl symbol) of T belongs to S (R 2d ), and has good time-frequency localization in this sense. However, T is not a positive operator as φ 1 = φ 2 , and Theorem 7.1 fails when replacing Q S * S by |Q T |: for instance, one easily finds using that (32) that when δ is the Dirac delta distribution An obvious question is whether the positivity and good time-frequency properties exhibited by Q S * S when S * ∈ N (L 2 , M 1 v ) are sufficient for Theorem 7.1 to hold: and is a positive operator on L 2 (R d ), does a version of Theorem 7.1 hold with Q S * S replaced by Q T ?
As a first step in this direction, we note that the statement is true if T ∈ S, i.e. if k T ∈ S (R 2d ), as [31,Prop. 3.15] states that if T ∈ S is positive, then √ T ∈ S. , which is of a different nature than the one we consider.

Modulation Spaces as Time-Frequency Wiener Amalgam Spaces
A consequence of Theorem 8.1 is that we may interpret modulation spaces as a timefrequency version of the so-called Wiener amalgam spaces [17]; a class of function function spaces that have been closely tied to the development of modulation spaces since the inception of the latter in [17]. To explain this interpretation, we start by In time-frequency analysis, when ϕ is well-localized in time and frequency such as the Gaussian, the size of |V ϕ ψ(x, ω)| is interpreted as a measure of the contribution of the frequency ω at time x of the signal ψ. By the reconstruction formula we can recover ψ from V ϕ ψ, and (34) finds a natural interpretation as a multiplication operator in the time-frequency plane: we represent ψ in the time-frequency plane by forming V ϕ ψ, but before we reconstruct ψ from V ϕ ψ we multiply it by f (z). A particular choice of f is to let f be the characteristic function χ for some compact subset . Then .
As we discussed in Sect. 7.1, the existence of such S is not clear in general, but if A ϕ,ϕ f ∈ S we can use Theorem 7.4 to deduce the following result. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Proof of Proposition 3.9
Proof of Proposition 3.9 First recall from Sect. 6 that B consists precisely of those T ∈ HS such that the Weyl symbol a T belongs to M 1 (R 2d ). Then recall that we assume T = R 2d F(x, ω)e −iπ x·ω π(x, ω) dxdω, where F(x, ω) ∈ L 2 (R 2d ) has compact support, say supp(F) ⊂ K for K ⊂ R 2d compact. As in [33], we denote the function F by F W (T ) -it plays the role of a Fourier transform of the operator T in quantum harmonic analysis. One can show that F W (T ) = F σ (a T ), where F σ ( f ) is the symplectic Fourier transform of f ∈ L 1 (R 2d ) given by f (x , ω )e −2πi(x ·ω−x·ω ) dx dω for x, x , ω, ω ∈ R d .
Then fix some R ∈ B such that F W (R) has no zeros, an explicit example is R = ϕ 0 ⊗ ϕ 0 [33, Ex. 6.1]. As R ∈ B, we have a R = F σ F W (R) ∈ M 1 (R 2d ).
Since a R ∈ L 1 (R 2d ) and F σ (a R ) = F W (R) never vanishes, the Wiener-Lévy theorem [35,Thm. 3.1] implies the existence of some h ∈ L 1 (R 2d ) such that Then define the operator where is the operation from (17). The "Fourier transform" F W interacts with the convolutions in the expected way [33,Prop. 6.4]; more precisely, we have that