Mean-dispersion principles and the Wigner transform

Given a function $f\in L^2(\mathbb R)$, we consider means and variances associated to $f$ and its Fourier transform $\hat{f}$, and explore their relations with the Wigner transform $W(f)$, obtaining a simple new proof of Shapiro's mean-dispersion principle. Uncertainty principles for orthonormal sequences in $L^2(\mathbb R)$ involving linear partial differential operators with polynomial coefficients and the Wigner distribution, or different Cohen class representations, are obtained, and an extension to the case of Riesz bases is studied.


Introduction
This paper treats uncertainty principles for families of orthonormal functions in L 2 (R) in connection with time-frequency analysis.When talking about uncertainty principles, in harmonic analysis, one refers to a class of theorems giving limitations on how much a function and its Fourier transform can be both localized at the same time.Different meanings of the word "localized" give rise to different uncertainty principles.For instance, referring to the most classical results (see [7] for a survey), in the Heisenberg uncertainty principle the localization of f and its Fourier transform f has to do with their associated variances, in Benedicks [1] it has to do with the measure of their supports, in Donoho-Stark [6] with the concept of ε-concentration, in Hardy [9] with (exponential) decay at infinity, and so on.There are, moreover, uncertainty principles giving not only limitations on the localization of a single function and its Fourier transform, but on how such limitations behave, becoming stronger and stronger, when adding more and more elements of an orthonormal system in L 2 .In this paper we focus in particular on results of this type involving means and variances.For f ∈ L 2 (R) we define the associated mean and the associated variance observe that, for f 2 = 1, such quantities are the mean and the variance of |f | 2 .The dispersion associated with f is ∆(f ) := ∆ 2 (f ).An uncertainty principle for orthonormal sequences, that constitutes the starting point of the present paper, is due to Shapiro.We shall use throughout the paper the notation N 0 := N ∪ {0}, and adopt the following normalization of the Fourier transform: Theorem 1.1 (Shapiro's Mean-Dispersion Principle).There does not exist an infinite orthonormal sequence {f k } k∈N 0 in L 2 (R) such that all µ(f k ), µ( fk ), ∆(f k ), ∆( fk ) are uniformly bounded.
This theorem appeared in an unpublished manuscript of Shapiro from 1991; in [12] a stronger result has been proved, namely, there does not exist an orthonormal basis {f k } k∈N 0 of L 2 (R) such that ∆(f k ), ∆( fk ), µ(f k ) are uniformly bounded, while there exists an orthonormal basis {f k } k∈N 0 of L 2 (R) such that µ(f k ), µ( fk ), ∆(f k ) are uniformly bounded.Moreover the following quantitative version of Shapiro's Mean-Dispersion Principle is proved in [10].
Theorem 1.2 ([10, Theorem 2.3]).Let {f k } k∈N 0 be an orthonormal sequence in L 2 (R).Then for every n ≥ 0 Equality holds for every 0 ≤ n ≤ n 0 , n 0 ∈ N 0 , if and only if there exist c k ∈ C with |c k | = 1 such that f k = c k h k for k = 0, . . ., n 0 , where h k are the Hermite functions on R defined as follows: where H k is the Hermite polynomial of degree k given by Observe that (1.4) differs for a constant from the result in [10], due to a different normalization of the Fourier transform.Theorem 1.1 is an easy consequence of Theorem 1.2; moreover, Theorem 1.2 also says that the limitation on the concentration of f k and fk become stronger and stronger by adding more and more elements from the orthonormal system, as the lower bound (n + 1) 2 increases faster than the number of involved functions.
In this paper we study uncertainty principles of mean-dispersion type involving quadratic timefrequency representations applied to the elements of an orthonormal system in L 2 (R).In order to state our main results we need some basic definitions.The classical cross-Wigner distribution is defined as e −itξ dt, f, g ∈ L 2 (R), (1.6) and we set for convenience W (f ) := W (f, f ).Let moreover L be the linear partial differential operator in R 2 defined as The following result (that we prove in Theorem 4.3 and Corollary 4.5 below) constitutes a Mean-Dispersion uncertainty principle associated to the Wigner transform.
Theorem 1.3.Let {f k } k∈N 0 be an orthonormal sequence in L 2 (R).Then for every n ≥ 0 where as usual •, • indicates the inner product in L 2 (see Section 3 for a discussion on the domain of L and the corresponding meaning of where h k are the Hermite functions (1.5).
We show that Theorem 1.3 implies Theorem 1.2 (and then also Theorem 1.1), and in this sense it can be interpreted as a Mean-Dispersion principle associated to the Wigner transform.The advantage of Theorem 1.3 is twofold.First, the proof is simpler than the one of Theorem 1.2 in [10].In particular, it does not need the Rayleigh-Ritz technique used there.Moreover, L is not the only operator that can be used in (1.8) in order to have Mean-Dispersion principles of the kind of Theorem 1.3.In Sections 4 and 5 we give more details on this fact.Here, we just point out that we can use instead of L the multiplication operator by x 2 + ξ 2 , obtaining that (see Theorem 5.1 below) if {f k } k∈N 0 is an orthonormal sequence in L 2 (R), then for every n ≥ 0 and equality is characterized as in Theorem 1.3.We show that if is the trace of the covariance matrix of |W (f k )(x, ξ)| 2 ; then, comparing (1.9) with (1.4) (in the case µ(f k ) = µ( fk ) = 0) we observe that we have replaced the two variances associated with f k and fk in (1.4), with (a constant times) the trace of the covariance matrix associated with W (f k ), which reflects the fact that W (f k ) includes at the same time both information on f k and on fk .Other extensions of Theorem 1.3 are also studied.Since there are many different time-frequency representations besides the classical Wigner, we consider the so-called Cohen class, given by all the representations Q(f, g) of the form such class contains all the most used time-frequency representations.A natural question is if in Theorem 1.3 one can substitute W (f k ) with Q(f k ) := Q(f k , f k ), and which operators can be considered instead of L. We prove in Section 6 that for a suitable class of kernels σ in (1.10) a result of the kind of Theorem 1.3 can be formulated for representations Q in the Cohen class.Finally, the Mean-Dispersion principle for the Wigner transform can be extended to Riesz bases instead of orthonormal bases.
The paper is organized as follows.In Sections 2 and 3 we give basic results on the Wigner transform and on the action of the Wigner transform on Hermite functions.In Section 4 we prove Theorem 1.3.Section 5 is devoted to the study of the case of the covariance matrix associated with W (f k ) and to the proof of (1.9).In Sections 6 and 7 we extend the results to the Cohen class and Riesz bases.

The Wigner distribution
Besides the classical cross-Wigner distribution W (f, g) for f, g ∈ L 2 (R) defined in (1.6) we also consider the following Wigner-like transform introduced in [4] Wig with standard extensions to f, g ∈ S ′ (R) and u ∈ S ′ (R 2 ).Such operators are strictly related since However, the second one has the advantage, with respect to the classical Wigner transform, that is a linear invertible operator, being composition of a linear invertible change of variables and a partial Fourier transform.Indeed, denoting by F (f )(ξ) = f (ξ) the classical Fourier transform (1.3), by the partial Fourier transform with respect to the second variable, and by we have that The inverses of the operators above are Moreover, denoting by for D x = −i∂ x and D y = −i∂ y , a straightforward computation (see also [4]) shows that for all u ∈ S(R 2 ).We write M and D for the multiplication and differentiation operators when just one variable is involved, so for u ∈ S(R) Moreover we also adopt, for convenience, the following notations.First, we write •, • to indicate both the inner product in L 2 , the duality S ′ -S (we consider here distributions as conjugate-linear functionals), and in general the integral each time such integral is finite, even though g, h are not L 2 functions.Second, we write when the last one makes sense and is finite.It coincides with We use the symbol •, • with analogous meaning in dimension greater than 1.
More generally, we have the following result (proved in [2] for u ∈ S(R 2 )): Proposition 2.1.Let P (x, y, D x , D y ) be a linear partial differential operator with polynomial coefficients.Then for all u ∈ S ′ (R 2 ): The above proposition will be useful to relate the classical Wigner distribution W (f ) to the mean (1.1) and the variance (1.2) associated with a function f ∈ L 2 (R) and its Fourier transform f ∈ L 2 (R).Proposition 2.2.Given f ∈ L 2 (R) with finite associated means and variances of f and f , the following properties hold: Let us first recall that (2.7) and (2.9) imply the following Moyal's formula for the cross-Wigner distribution (cf.[8, p. 66]) Note that the assumption that f has finite associated mean and variance implies that Mf ∈ L 2 (R): In the same way, the fact that f has finite associated mean and variance implies that Df ∈ L 2 (R).This means that Moyal's formula (2.12) can be applied when, in its left-hand side, Mf or Df appear in the arguments of the Wigner transform.
Now we analyze the case when in the left-hand side of (2.12) the expression W (f, M 2 g) appears, for f, g ∈ L 2 (R) with finite associated means and variances of f, g, f, ĝ.Observe that, for f, g ∈ S(R), Such an equality holds in fact for f, g ∈ S ′ (R) and for tempered distributions it reads By the observations above, Mf, Mg ∈ L 2 (R), and so from (2.14) we have that W (f, M 2 g) is a function, and we can consider Since g and Mg are L 2 -functions, we can consider as standard a sequence g j ∈ S(R) such that g j → g and Mg j → Mg for j → ∞.Since M 2 g j ∈ L 2 (R) for every j ∈ N 0 , by (2.12) we have Then, we have as j → ∞.On the other hand, by (2.14) and (2.3) we get Since g j → g, Mg j → Mg, and f, g, Mf, Mg ∈ L 2 (R), by the L 2 -continuity of the Wigner transform we have as j → ∞; by the same calculations as above we get (2.17) Recall now that for every u, v ∈ S ′ (R) the following formula holds then, since f and ĝ have finite associated means and variances, the same procedure can be applied when we have W (f, D 2 g) instead of W (f, M 2 g) obtaining that, with the notation (2.5)-(2.6), Similar considerations can be done for MDf , since is a function, being Df, Mg ∈ L 2 (R) under the assumptions of finite associated means and variances.Arguing as for M 2 f we then have All the above considerations will be implicit from now on.Let us now prove point (a): it follows from (2.13) since M 2 f, f = Mf, Mf .(b): With the notations (2.5)-(2.6),by point (a) applied to f : 3) and Moyal's formula (2.12): ), Moyal's and Parseval's formulas (2.12) and (2.7): (e): From (2.1), (2.12) and (2.7): 2) and (2.12): (g): From (2.1), (2.12), (2.19), (2.7) and point (a): (h): From (2.2), (2.12), (2.17) and point (a): (i): From (2.1), (2.3), (2.12), (2.20) and (2.7): we finally have that Therefore (j): From (2.2), (2.4), (2.12), (2.20), (2.7) and (2.21): It follows that (k): From (2.3), (2.12) and point (a): (l): From (2.4), (2.12), (2.7) and point (b): The proof is complete.
Corollary 2.3.Given f ∈ L 2 (R) with f = 1 and finite associated mean and variance of f and f , the following properties hold:

The Hermite basis
For k ∈ N 0 = N ∪ {0}, let h k be the Hermite functions on R defined by (1.5).It is well known that h k are eigenfunctions of the Fourier transform and form an orthonormal basis in L 2 (R).Moreover they are an absolute basis in S(R) (see [11]).Denoting by by [15,Thms. 3.2 and 3.4] we have that the functions {h j,k } j,k∈N 0 form an orthonormal basis in L 2 (R 2 ) and are eigenfunctions of the twisted Laplacian: By Fourier transform (see [3,Ex. 3.20]) are eigenfunctions of the operator L defined in (1.7), with the same eigenvalues as before, in the sense that Lĥ j,k = (2k + 1) ĥj,k .
Note that also { ĥj,k } j,k∈N 0 are in S(R) and form an orthonormal basis in L 2 (R 2 ).
More in general, following the same ideas as in [14,Thm. 21.2], we can prove: In order to prove that {W (f j , f k )} j,k∈N 0 is a basis for L 2 (R 2 ), by [5,Thm. 3.4.2], it is enough to prove that if By [14, Thms.4.4 and 7.5] the operator is a bounded linear operator satisfying where • HS is the Hilbert-Schmidt norm defined by (see [14, formula (7.1)]): for an orthonormal basis {f j } j∈N 0 of L 2 (R).The operator W F is in fact the classical Weyl operator with symbol F .Then by assumption, which implies that From (3.4) and (3.5) we finally have that F = 0 a.e. in R 2 .
The operator L defined in (1.7) is unbounded on L 2 (R 2 ) (see Remark 6.6 below) and defined (at least) in S(R 2 ) ⊂ L 2 (R).Now, since the functions (3.1) are an orthonormal basis for L 2 (R 2 ), every element F ∈ L 2 (R 2 ) can be written as where c j,k = F, ĥj,k .Then, writing we have from (3.2) The operator L is then the unbounded and densely defined operator with domain In general we shall write meaning that LF, F = +∞ if the series diverges.Note that, being { ĥj,k } j,k∈N 0 an orthonormal basis for L 2 (R 2 ), we have that F ∈ D( L) if and only if {c j,k (2k + 1)} j,k∈N 0 ∈ ℓ 2 .This implies that the series (3.7) converges (but not vice versa).

Mean-Dispersion Principle
From the results of the previous sections we obtain now an alternative formulation and a simple proof of the Shapiro's Mean-Dispersion Principle (see [10] and the references therein).To this aim let us first prove some preliminary results.Lemma 4.1.Let {h k } k∈N 0 be the Hermite functions defined in (1.5) and L as in (1.7).Then for every j ∈ N 0 we have where the last equality is the formula for the sum of all odd numbers from 1 to 2n + 1.
We use (4.9) and (4.8) in (4.4) to get Remark 4.4.As a consequence of Theorem 4.3 we have that if {f i } i∈I is such that f i = 1 for every i ∈ I, {g j } j∈J is an orthonormal system in L 2 (R) and for some constant A > 0, then J must be finite (while I may be infinite).
and the estimate is optimal, in the sense that if f k are the Hermite functions then equality holds in (4.11) and, conversely, given n 0 ∈ N, if equality holds in (4.11) for all n ≤ n 0 , then there exist Proof.The inequality (4.11) is a particular case of Theorem 4.3 for g k = f k .
In order to prove that the inequality is optimal we follow the same ideas as in [10, Thm.
and hence, by (4.3) and (2.12): We now proceed by induction on n ∈ N 0 .From (4.12) and (4.13) for n = 0 we have and hence Let us assume now that and let us prove that Thus, by (4.13) and (4.12), we have again by inductive assumption.Therefore f n , h k = 0 for all k > n (and for 0 ≤ k ≤ n − 1 by inductive assumption), which implies that From Corollary 4.5 we have, as in Remark 4.4, that if Formula (4.14) says that Corollary 4.5 is exactly a reformulation of Theorem 1.2, and in this sense Theorem 4.3 and Corollary 4.5 can be seen as Mean-Dispersion principles related with the Wigner transform.On the other hand we observe that working with the Wigner transform gives several advantages.First of all we have more generality since in Theorem 4.3 we can consider different arguments f i , g k in the cross-Wigner distribution; moreover the proofs with the Wigner transform are simpler and more self-contained with respect to [10].Another advantage is that we have information on the Wigner transform of an orthonormal sequence {f k } k∈N 0 rather than on f k and fk themselves, and this gives more possibilities on how such information can be treated and written.In Section 5 we give a Mean-Dispersion principle on the trace of the covariance matrix associated to the Wigner transform; here we start by noting that, from Corollary 2.3, the quantity µ 2 (f k ) + µ 2 ( fk ) + ∆ 2 (f k ) + ∆ 2 ( fk ) in (4.14) can be written not only as LW (f k ), W (f k ) , but also through many other operators, as we can see in the following examples.
Example 4.6.For all f ∈ L 2 (R) with f = 1 and finite associated mean and variance of f and f by Corollary 2.3(a), (b).Therefore formula (1.4) for an orthonormal sequence {f k } k∈N 0 in L 2 (R) can be rewritten as Example 4.7.For all f ∈ L 2 (R) with f = 1 and finite associated mean and variance of f and f we have from Corollary 2.3(g), (h), (k), (l): and hence for an orthonormal sequence {f k } k∈N 0 ⊂ L 2 (R) We can also combine, for example, the operators of Examples 4.6 and 4.7, or add combinations of by Corollary 2.3(e), (f ), (i), (j).

Covariance
In this section we give an uncertainty principle involving the trace of the covariance matrix of the square of the Wigner distribution |W (f )(x, ξ)| 2 , and explore its relations with Theorem 1.2.
To this aim, let us first recall some notions about mean and covariance for a function of two variables ρ(x, y) ∈ L 1 (R 2 ).We set is symetric and its trace is given by If ρ(x, y) has null means M(X) = M(Y ) = 0, then (5.3) represents the trace of the covariance matrix of ρ(x, y).
. It is then interesting to consider the quantity in (5.3) which is related to means and variances of f and f and the equality in (5.6) holds if and only if µ(f ) = µ( f ) = 0.In particular, since the Hermite functions satisfy µ(h k ) = µ( ĥk ) = 0 by [10, Ex. 2.4], from Theorem 1.2 we have the following: Moreover, given n 0 ∈ N, the equality holds for all n ≤ n 0 if and only if there exist Proof.The inequality (5.7) immediately follows from (5.6) and Theorem 1.2.If f k are multiples of the Hermite functions c k h k with |c k | = 1, then the equality holds because of (5.5), the fact that µ(h k ) = µ( ĥk ) = 0, and Theorem 1.2.
In the other direction, if the equality holds in (5.7) for all n ≤ n 0 , then from (5.5) we have, for n ≤ n 0 , and hence, from Theorem 1.2: Then we conclude from Theorem 1.2.
Let us remark that from Theorem 5.1 we immediately get the following uncertainty principle for the covariance matrix: Corollary 5.2.If {f j } j∈J is an orthonormal sequence in L 2 (R) with zero means µ(f j ) = µ( fj ) = 0, and if the trace of the covariance matrix of |W (f j )(x, ξ)| 2 is uniformly bounded in j, we have for some A > 0. In particular, J is finite.
Proof.From Corollary 2.3(c), (d) we have that by assumption, and hence from (5.3): The thesis thus immediately follows from Theorem 5.1.
Note that Corollary 5.2 can be stated also in terms of the variances of |W (f j )(x, ξ)| 2 since, in general, the variances for ρ X , ρ Y , M(X), M(Y ) defined as in (5.1)-(5.2),satisfy:

Cohen classes
Infinitely many operators playing the same role as in the previous sections may be constructed by means of the Cohen class for some tempered distribution σ ∈ S ′ (R 2 ).For f, g ∈ S(R) we have W (f, g) ∈ S(R 2 ), and then Q(f, g) is well-defined for every σ ∈ S ′ (R 2 ).As for the Wigner we define for ).Let us remark that if σ = F −1 (e −iP (ξ,η) ) then |σ| = 1 and hence, for all f 1 , f 2 , g 1 , g 2 ∈ S(R), from (2.7) and (2.12): Proof.Let us first remark that Q(f j , f k ) ∈ S(R 2 ) ⊂ L 2 (R 2 ).Moreover {Q(f j , f k )} j,k∈N 0 is an orthonormal sequence by (6.2).
Then Theorem 4.3 can be rephrased as follows, for any choice of P ∈ R[ξ, η]: for any linear partial differential operator L of the form Therefore, by Theorem 6.3, we obtain Example 6.5.Similar results can be obtained considering the operator in (5.4) instead of L and then Theorem 5.1 instead of Corollary 4.5.Indeed, for f ∈ S(R) with f = 1 we can write, by Proposition 2.1, (6.2) and Theorem 6.1: for any P 1 , P 2 as in (6.1).

It follows that if {f
, ∀n ∈ N 0 , (6.5) for any linear partial differential operator L * of the form Remark 6.6.Any linear operator T : L 2 (R 2 ) → L 2 (R 2 ) (not necessarily everywhere defined) satisfying, for some orthonormal sequence cannot be a bounded operator on L 2 (R 2 ).Indeed, assuming by contradiction that T is bounded, by Theorem 3.1 we would have, for all n ∈ N 0 : which gives a contradiction for large n.The above considerations can be applied to the partial differential operators with polynomial coefficients appearing in the various results were we have proved estimates of the kind of (6.6).This is not surprising since all non-constant differential operators with polynomial coefficients are in fact unbounded in L 2 (R n ).Indeed, assume first that P (x, D) has non-constant coefficients, i.e.
with P β (x) polynomials of degree less than or equal to m ≥ 1.We choose β 0 ∈ N n 0 , |β 0 | ≤ ℓ and a ∈ R n \ {0} such that P β 0 (ta) is a polynomial in t of maximum degree m.

Riesz bases
In this section we consider a general Riesz basis of L 2 (R) instead of an orthonormal basis.We recall that a Riesz basis in a Hilbert space H is the image of an orthonormal basis for H under an invertible linear bounded operator.In particular, if {u k } k∈N 0 is a Riesz basis for L 2 (R), we can find an invertible linear bounded operator U 1 : L 2 (R) → L 2 (R) such that  Note that the above inequality is trivial if n+1 B C1 < 1 so that from (7.3) we have, for all n ∈ N 0 , Let us now remark that, from the the continuity of U j : L 2 (R) → L 2 (R), j = 1, 2, for every k ∈ N 0 , and therefore . "Ricerca locale 2021 -linea A" (University of Torino).Jornet was partially supported by the projects PID2020-119457GB-100 funded by MCIN/AEI/10.13039/501100011033and by "ERDF A way of making Europe", and by the project GV AICO/2021/170.
2.3].If f k = h k then (4.11) is an equality by Lemma 4.1.Now, if the equality holds in (4.11) for all 0 ≤ n ≤ n 0 , then for all 0 14) by Lemma 4.2, we have obtained a simple proof of Theorem 1.2 (the sharp Mean-Dispersion Principle [10, Thm.2.3]), and then also of Theorem 1.1 (the original Shapiro's Mean-Dispersion Principle).