Salem-Zygmund Inequality for locally sub-Gaussian random variables, random trigonometric polynomials, and random circulant matrices

In this manuscript we give an extension of the classic Salem--Zygmund inequality for locally sub-Gaussian random variables. As an application, the concentration of the roots of a Kac polynomial is studied, which is the main contribution of this manuscript. More precisely, we assume the existence of the moment generating function for the iid random coefficients for the Kac polynomial and prove that there exists an annulus of width \[O(n^{-2}(\log n)^{-1/2-\gamma}), \quad \gamma>1/2\] around the unit circle that does not contain roots with high probability. As an another application, we show that the smallest singular value of a random circulant matrix is at least $n^{-\rho}$, $\rho\in(0,1/4)$ with probability $1-O(n^{-2\rho})$.


Introduction
A classical problem in Harmonic Analysis is the quantification of the magnitude of the modulus for a trigonometric polynomial on the unit circle. Erdös [10] studied the trigonometric polynomial T n (x) = n−1 j=0 α j e ijx , x ∈ [0, 2π], for choices of signs ±1 for all α j , and estimated how large |T n (x)| for x ∈ [0, 2π) can be. Salem and Zygmund [30] proved that almost all choices of signs satisfy (1) c 1 (n log n) 1 /2 ≤ max x∈ [0,2π] |T n (x)| ≤ c 2 (n log n) for some positive constants c 1 and c 2 .
Inequalities of type (1) are known as Salem-Zygmund inequality. There are different versions of Salem-Zygmund inequality that appear in many areas of modern analysis, see [8]. In a probabilistic context, the common version of Salem-Zygmund inequality is usually established when the coefficients α 1 , . . . , α n−1 of T n are iid sub-Gaussian random variables, see Chapter 6 in [14]. In the present manuscript, we give an extension of Salem-Zygmund inequality for locally sub-Gaussian random coefficients. This extension allows us to study the localization of the roots of a random Kac polynomial and the probability for the singularity of a random circulant matrix.
1.1. Roots of random trigonometric polynomials. The study of the roots of a polynomial is an old topic in Mathematics. There are formulas to compute the roots for polynomials of degree 2, degree 3 (Tartaglia-Cardano's formula), degree 4 (Ferrari's formula), but due to Galois' work, for a generic polynomial of degree 5 or more it is not possible to find explicit formulas for computing its roots in terms of radicals. For a random polynomial, Bloch and Polya [3] considered a random polynomial with iid Rademacher random variables (uniform distribution on {−1, 1}) and proved that the expected number of real zeros are O(n 1 /2 ). In a series of papers between 1938 and 1939, Littlewood and Offord gave a better bound for the number of real roots of a random polynomial with iid random coefficients for the cases of Rademacher, Uniform[−1, 1], and standard Gaussian [21]. Kac [13] established his famous integral formula for the density of the number of real roots of a random polynomial with iid coefficients with standard Gaussian distribution. Those were the first steps in the study of roots of random functions, which nowadays is a relevant part of modern Probability and Analysis. For further details, see [9] and the references therein.
The localization of the roots of a polynomial is in general a hard problem. However, there are relevant results in the theory of random polynomials [2]. For instance, for iid non-degenerate random coefficients with finite logarithm moment, the roots cluster asymptotically near the unit circle and the arguments of the roots are asymptotically uniform distributed. More precisely, Ibragimov and Zaporozhets [12] showed that for a Kac polynomial (2) G n (z) = n−1 j=0 ξ j z j for z ∈ C, with (real or complex) iid non-degenerate coefficients satisfying E (log(1 + |ξ 0 |)) < ∞, its roots are concentrated around the unit circle as n → ∞, almost surely. Moreover, they proved that the condition E (log(1 + |ξ 0 |)) < ∞ is necessary and sufficient for the roots of G n to be asymptotically near the unit circle.
For iid standard Gaussian random coefficients of G n , most of the roots are concentrated in an annulus of width 1 /n centered in the unit circle. However, the nearest root to the unit circle is at least a distance O(n −2 ), for further details see [23]. Larry and Vanderbei [20] conjectured that the last statement holds not only for standard Gaussian coefficients but also for Rademacher coefficients. This conjecture was proved by Konyagin and Schlag [19]. Our Theorem 2.3 establishes that most of the roots of G n are near to the unit circle in a distance at least O(n −2 (log n) −1/2−γ ) for γ > 1 /2 with probability 1 − O((log n) −γ+1/2 ). Konyagin and Schlag [19] showed that if G n has iid Rademacher or standard Gaussian random coefficients, then for all ε > 0 and large n the following expression min z∈C:||z|−1|<εn −2 |G n (z)| ≥ εn − 1 /2 holds with probability at least 1 − Cε, for some positive constant C. Karapetyan [15,16] studied the sub-Gaussian case, but up to our knowledge, his proof is not complete. Even so, using our extension of Salem-Zygmund inequality and the notion of least common denominator, which was developed to study the singularity of the random matrices [28], we show that for fixed t ≥ 1, with probability at least 1 − O((log n) −γ+1/2 ). The techniques using in the present paper are not the same using in Konyagin and Schlag [19]. The main result of Konyagin and Schlag only holds for Rademacher and Gaussian iid random coefficients. They did a refined analysis of the characteristic function and applying the so-called circle method. This approach is not straightforward to apply for more general random coefficients, even sub-Gaussian or with finite moment generating function (mgf for short).
The novelty of this manuscript is the use of the notion of least common denominator for cover more general random coefficients. This approach works for quite general random coefficients. However, the authors still working for relaxing the assumption of the existence of a mgf. The main obstacle for relaxing this assumption arises in the control of the maximum modulus over the unit circle of the random polynomial under the assumption of the existence of some p−moment. We emphasize that the proof is not a direct consequence of [29] since good estimates of the least common denominator typically are difficult to obtain. We remark that this result and the main result in [19], up to our knowledge, are not direct consequences of the so-called concentration inequalities.

Random circulant matrices.
Recall, an n×n complex circulant matrix, denoted by circ(c 0 , . . . , c n−1 ), has the form circ(c 0 , . . . , c n−1 ) := where c 0 , . . . , c n−1 ∈ C. For ξ 0 , . . . , ξ n−1 being random variables, we say that is an n × n random circulant matrix. The circulant matrices are a very common object in different areas of mathematics [11,17,26]. In particular, circulant matrices play a crucial role in the study of large-dimensional Toeplitz matrices [5,31]. In the theory of the random matrices, the singularity is one aspect that has been intensively studied during recent years [4,27,28]. In the case of the random circulant matrices have Rademacher entries, Meckes [22] proved that the probability of a random circulant matrix is singular tends to zero when its dimension is growing. As a consequence of our concentration result of the roots for Kac polynomials, for a random circulant matrix with iid zero-mean entries and finite mfg, it follows that for all fixed t ≥ 1 and γ > 1/2, the smallest singular value s n (C n ) of C n satisfies s n (C n ) ≥ tn −1/2 (log n) −γ with probability 1 − O (log n) −γ+1/2 . However, under weaker assumptions (see below the condition (H)), for ρ ∈ (0, 1/4) we also show The manuscript is organized as follows. In Section 2 we state the main results and their consequences. In Section 3 we give the proof of a Salem-Zygmund inequality for random variables with mgf. In Section 4 with the help of Salem-Zygmund inequality and the notion of least common denominator we prove Theorem 2.3 about the location of the roots of a Kac polynomial. Finally, in Section 5 we prove Theorem 2.6 about that the smallest singular value of a random circulant is relatively large with high probability.

Salem-Zygmund inequality.
Recall that a real-valued random variable ξ is said to be sub-Gaussian if its mgf is bounded by the mgf of a Gaussian random variable, i.e., there is b > 0 such that When this condition is satisfied for a particular value of b > 0, we say that ξ is b-sub-Gaussian or sub-Gaussian with parameter b. In particular, it is straigforward to show that the mean of a sub-Gaussian random variable is necessarily equal to zero. For more details see [6] and the references therein. According to [6], a random variable ξ is called locally sub-Gaussian when its mgf M ξ exists in an open interval around zero. Due to this, it is possible to find constants α ≥ 0, δ ∈ (0, ∞] and ν ∈ R such that M ξ (t) ≤ e νt+ 1 2 α 2 t 2 for any t ∈ (−δ, δ).
If the mean of ξ is zero and its variance σ 2 is finite and positive then we can take ν = 0 and α 2 > σ 2 for some δ > 0 as the next lemma states. for any t ∈ (−δ, δ) and α 2 > σ 2 .
Before presenting Theorem 2.2, we introduce some useful notations. For simplicity, we keep the same notation between the Euclidean norm and the modulus for the complex numbers. Denote by T the unit circle R/(2πZ). For any bounded function f : T → C, the infinite norm of f is defined as where C 0 and C 1 are positive constants that only depend on the mgf of ξ and the function φ.
Actually, under the assumption of finite second moment, a version of a Salem-Zygmund type inequality can be obtained in terms of the expected value of the infinite norm of a random trigonometric polynomial, for more details see [33]. Theorem 2.2 provides an upper bound of how large the infinite norm of a random trigonometric polynomial is in probability. Moreover, Theorem 2.2 gives a better bound than Corollary 2 in [33] as we see below.
Let {ξ k : k ≥ 0} be a sequence of iid random variables such that E (ξ 0 ) = 0 and E ξ 2 0 = σ 2 > 0. By Corollary 2 in [33] we have where C is a universal positive constant. By the Markov inequality we obtain Note that the upper bound asymptotically equals a positive constant. On the other hand, under the assumptions of Theorem 2.2 we deduce for all large n, where C 0 and C 1 are positive constants that only depend on the mgf of ξ 0 .

Kac polynomials.
For using the concept of least common denominator we introduce the following condition. We say that a random variable ξ 0 satisfies the condition (H) if The notion of concentration function was introduced by P. Lévy in the context of the study of distributions of sums of random variables. For ξ 0 being not degenerate, zero mean with mgf, one can deduce that condition (H) is valid for some M > 0 and q ∈ (0, 1). We refer to [32].
The main result of this manuscript is the following theorem.
Then for any fixed t ≥ 1, where γ > 1 /2 and the implicit constant in the O-notation depends on t and the mgf of ξ.

Random circulant matrices.
It is well-known that any circulant matrix can be diagonalized in C using a Fourier basis. Indeed, let ω n := exp i 2π n , i 2 = −1, and F n = 1 √ n (ω jk n ) 0≤j,k≤n−1 . The matrix F n is called the Fourier matrix of order n. Note that F n is a unitary matrix. By a straightforward computation it follows circ(c 0 , . . . , c n−1 ) = F * n diag G n (1), G n (ω n ), . . . , G n (ω n−1 n ) F n , where G n is the polynomial given by G n (z) := n−1 k=0 c k z k . Hence, the eigenvalues of circ(c 0 , . . . , c n−1 ) are G n (1), G n (ω n ), . . . , G n (ω n−1 n ), or equivalently Expressions like (3) appear naturally in the study of Fourier transform of periodic functions. For a complete understanding of circulant matrices, we recommend the monograph [7].
In the sequel, we consider an n × n random circulant matrix C n , i.e., C n := circ(ξ 0 , . . . , ξ n−1 ), where ξ 0 , . . . , ξ n−1 are independent random variables. The smallest singular value of the random circulant matrix C n is given by (4) s n (C n ) = min We remark that in general the smallest singular value is not equal to the smallest eigenvalue modulus. Since C n is a normal matrix, its singular values are the modulus of its eigenvalues. Thus, the following corollary is a direct consequence of Theorem 2.3. = ξ for every k ≥ 0. Let C n := circ(ξ 0 , . . . , ξ n−1 ) be an n × n random circulant matrix and let s n (C n ) be the smallest singular value of C n . Then, for all fixed t ≥ 1 and γ > 1 /2 we have It is possible to weaken the assumptions of Corollary 2.5. Using similar reasoning as in the proof of Theorem 2.3 we obtain the following theorem. Theorem 2.6. Let ξ be a non-degenerate random variable which satisfies (H). Let {ξ k : k ≥ 0} be a sequence of iid random variables with ξ k D = ξ for every k ≥ 0. Let C n := circ(ξ 0 , . . . , ξ n−1 ) be an n × n random circulant matrix. Then, for each ρ ∈ (0, 1/4) we have 3. Proof of Theorem 2.2. Salem-Zygmund inequality for locally sub-Gaussian random variables Firstly, we provide the proof of the following claim which is an important fact that we use in the proof of Theorem 2.2. Claim 1: There exists a random interval I ⊂ T of length 1 /ρn with ρ n = 3n /8 such that Recall the Bernstein inequality p ′ n ∞ ≤ n p n ∞ (see for instance Theorem 14.1.1, Chapter 14, page 508 in [25]). For any x ∈ T we have Since g is continuous, there exists x 0 ∈ T such that g(x 0 ) = g n ∞ . Moreover, by the Mean Value Theorem and relation (8) we obtain Notice that the length of I is 3 8n . The preceding inequality yields Since g(x 0 ) = g n ∞ , the triangle inequality yields ( 1 /4) g n ∞ ≤ |g n (x)| for any x ∈ I. The preceding inequality with the help of relation (6) and relation (7) implies ✷ Now, we are ready to provide the proof of Theorem 2.2.
By Claim 1, there exists a random interval I ⊂ T of length 1 /ρn with ρ n = 8n /3 such that W n (x) ≥ Sn /2 or −W n (x) ≥ Sn /2 on I. Denote by µ the normalized Lebesgue measure on T. Observe that Then, for every t ∈ (− δ /K, δ /K) we have The preceding inequality yields which implies P S n ≥ α 2 tr n + 2 t log (2ρ n l) ≤ 1 l for any l > 0 and t ∈ (− δ /K, δ /K). we obtain P S n ≥ 3 α 2 r n log (2ρ n l n ) 1 /2 ≤ 1 l n for all large n.

Note that lim
Since f j = Re(f j ) + iIm(f j ), we get for all large n Finally, since ρ n = 8n 3 , the choose of l n = 3n 2 16 yields P W n ∞ ≥ 6α √ 3 (r n log n) 1 /2 ≤ 32 3n 2 for all large n.

Proof of Theorem 2.3. Localization of the roots for Kac polynomials
The proof is based on the small ball probability of linear combinations of iid random variables introduced by Rudelson and Vershynin in [29]. Throughout the proof, · 2 denotes the Euclidean norm, | · | denotes the complex norm and det(·) the determinant function that acts on the squared matrices. We consider the module π of a real number y, y mod π, which is defined as the set of numbers x such that x − y = kπ for some k ∈ Z.
Definition 4.1 (Least common denominator (lcd for short)). Let L be any positive number and let V be any deterministic matrix of dimension 2 × n. The least common denominator (lcd) of V is defined as where dist(v, Z n ) denotes the distance between the vector v ∈ R n and the set Z n , and log + := max{log, 0}.
For more details of the concept of lcd see Section 7 of [29]. Observe that D (aV ) = ( 1 /|a|) D(V ) for any a = 0. Indeed, from the definition of D (aV ) we have that D (aV ) ≤ θ 2 for any θ ∈ R 2 such that Therefore, from the definition of D(V ) we deduce D(V ) ≤ aθ 2 = |a| θ 2 . Since a = 0, then Therefore, from the definition of D(aV ) we deduce D(aV ) ≤ θ /a 2 = θ 2/|a|. Consequently, |a|D(aV ) ≤ θ 2 . Again, from the definition of D(V ) we deduce that |a|D(aV ) ≤ D(V ). Putting all these pieces together we obtain the next useful lemma. Let X be a random vector of dimension n × 1 whose entries are iid satisfying (H). Assume det(V V T ) > 0. For any a > 0 and t ≥ 1, by Theorem 7.5 (Section 7 in [29]) we have where L ≥ 2 /q with q given in (H), D(aV ) is the least common denominator of aV , and the constant C only depends on M , q. Recall the well-known inequality (x + y) 2 ≤ 2x 2 + 2y 2 for any x, y ∈ R. By Lemma 4.2, it follows that D(aV ) = ( 1 /a)D(V ) for all a > 0. Therefore, In order to obtain a meaningful upper bound for the left-hand side of the preceding inequality, it is needed to do a refined analysis of the following quantities: a lower bound for det(V V T ) and a lower bound for D(V ). Implicitly, in the definition of the D(V ) we also need to estimate V T θ 2 for some adequate θ ∈ R 2 .

4.1.
Small ball probability analysis. The following analysis explains the reason of introducing the concept of the least common denominator, which is a crucial part along the proof of Theorem 2.3. Recall G n (z) = n−1 j=0 ξ j z j for z ∈ C.
For G n , we associate a random trigonometric polynomial where T denotes the unit circle R/(2πZ). Assume n ≥ 2 and γ > 1 /2. Let N = ⌊n 2 (log n) 1 /2+γ ⌋ and x α = α /N for α ∈ {0, 1, 2, . . . , N − 1}. Let t ≥ 1 be fixed and let C 0 > 0 be the suitable positive constant being given in Theorem 2.2. Define the following event where W ′ n denotes the derivative of W n on T. For short, we also denote by P (A, B) the probability P (A ∩ B) for any two events A and B. Recall By the Boole-Bonferroni inequality we obtain Our goal is to show that every probability on the right side of the above expression is decreasing to zero when n tends to infinity.
Using the Berstein inequality (Theorem 14.1.1 in [25]) and Theorem 2.2 for φ ≡ 1, for all large n we have P W ′ n ∞ ≥ C 0 n 3 /2 (log n) 1 /2 ≤ P W n ∞ ≥ C 0 (n log n) 1 /2 ≤ C 1 n 2 . On the other hand, using the Markov inequality we obtain P max where the last inequality follows from the following fact: for any j ∈ {0, . . . , n 2 } we have Therefore, where the implicit constant depends on the distribution of ξ 0 and t. We stress that the rate of the convergence in (11) can be improved, however, the contributed term in the right-hand side of (10) is P (M n , G n ).
In the sequel, we analyze the strategy to prove that P(M n , G n ) is small. First, we construct a set of closed balls that covers {z ∈ C : | |z| − 1| ≤ tn −2 }. For each closed ball, we reduce the event {M n , G n } to a "simple event" using Taylor's Theorem. Finally, we use the concept of lcd to show that the probability of each "simple event" is sufficiently small.
The strategy is to consider a set of balls centered at a point on the unit circle with a suitable radius. We distinguish two kind of balls. The special balls centered in 1 + 0i and −1 + 0i, where the radius r is large, r = 2tn −11/10 , and the balls centered in points z with argument satisfying n −11/10 < | arg(z) mod π| < π − n −11/10 with small radius, r = 2tn −2 .
Recall that for any x ∈ R, ⌊x⌋ denotes the greatest integer less than or equal to x. Let N := ⌊n 2 (log n) 1 /2+γ ⌋ and x α := α N for α = 0, 1, . . . , N − 1. For a ∈ C and s > 0, denote by B (a, s) the closed ball with center a and radius s, i.e., B (a, s) = {z ∈ C : |z − a| ≤ s}. Denote by S 1 the unit circle. Let

Consequently,
where

4.1.1.
Small ball analysis at the points 1 + 0i and −1 + 0i. On the two points 1 ± 0i we have the largest two closed balls, which are considered in our set of balls. This is remarkable since the number of real roots of a Kac polynomial for some common random variables is at least O( log n log log log n ) with high probability [24]. This means that the real roots of a Kac Polynomial are moving slowly to the unit circle.

4.1.2.
Small ball analysis at e i2πxα . In this part, we are focusing mainly on the complex roots of a Kac polynomial. We remark that the complex roots are more dispersed than the real roots, but they are approaching faster than the real roots to the unit circle. However, the complex roots do not approach extremely fast. Let z ∈ B(e i2πxα , 2tn −2 (log n) −1/2−γ ) and assume that G n holds. By Taylor's Theorem we obtain where R 2 (z) is the error of the Taylor approximation of order 2, and it satisfies Then Hence, where 2C 4 = 2C 0 + 4t + 1. For proving that P (G n , B α ) tends to zero as n → ∞, we rewrite the sum G n (e i2πxα ) as the product of a matrix by a vector. This simple rewriting allows us to apply lcd techniques for matrices. To be precise, we define the 2 × n matrix V α as follows and X := [ξ 0 , . . . , ξ n−1 ] T ∈ R n . Notice that which implies Let Θ = r [cos(θ), sin(θ)] T ∈ R 2 , where r > 0 and θ ∈ [0, 2π]. For fixed r, θ, we have On the other hand, we have Now, we are in the setting of inequality (9). Recall that x α satisfies n −11/10 < |2πx α mod π| < π − n −11/10 .
Since it is needed to analyze , cos (2πx α − θ) , . . . , cos (2 (n − 1) πx α − θ)] T in the definition of the least common denominator, we can assume without loss of generality that r is a positive integer. In fact, by Proposition 7.4 in [29], we can take r ≥ 1 /2. For the case 2 > r ≥ 1 /2, we can replicate the ideas in the proof of Lemma 4.4 to obtain that dist V T α Θ, Z n ≥ Cn 1−1/10 for some positive constant C. If r ≥ 2, we can use ⌊r⌋ instead of r in Lemma 4.4.
In the sequel, we prove that the right-hand side of the preceding inequality is O n −2ρ . We consider the following three cases. Case 1. The same reasoning using in Section 4.1.1 yields Case 2. gcd (k, n) > n 1/2 . By similar reasoning using in the first case of the proof of Theorem 2.3, Section 4.1.3, we deduce The combination of all the preceding cases yields P (s n (C n ) ≤ n −ρ ) = O n −2ρ for any ρ ∈ (0, 1 /4). where i is the imaginary unit. Note that P is a set of points on the unit circle which can be seen as vertices of a regular polygon with m sides inscribed in the unit circle. Since the arguments of points exp (i (j2πx − θ)) are separated exactly by a distance 2πx, the number of points exp (i (j2πx − θ)) which are in any arc on the unit circle is at least l 2πx − 2, where l is the length of the arc. Let [y, y + 3(2πx)] be a subinterval of [−1, 1] and consider the arc A on the unit circle whose projection on the horizontal axis is [y, y + 3(2πx)]. If the length of the arc A is l, then the number of values cos (j2πx − θ) which are still in (y, y + 3(2πx)) is at least l 2πx − 2 ≥ 3(2πx) 2πx − 2 = 1 since l ≥ 3 (2πx).
By the previous analysis, we have that the distance between the vector V ∈ R m whose entries are V j = r cos (j2πx − θ) for j = 0, . . . , m − 1 with x = 1 /m to Z m is at least verifying that expression 1 2r(2πx) ≥ 6 is fulfilled.