Finite free convolutions of polynomials

We study three convolutions of polynomials in the context of free probability theory. We prove that these convolutions can be written as the expected characteristic polynomials of sums and products of unitarily invariant random matrices. The symmetric additive and multiplicative convolutions were introduced by Walsh and Szegö in different contexts, and have been studied for a century. The asymmetric additive convolution, and the connection of all of them with random matrices, is new. By developing the analogy with free probability, we prove that these convolutions produce real rooted polynomials and provide strong bounds on the locations of the roots of these polynomials.


Introduction
We study three convolutions on polynomials that are inspired by free probability theory. Instead of capturing the limiting spectral distributions of ensembles of random matrices, we show that these capture the expected characteristic polynomials of random matrices in a fixed dimension. We develop the analogy with Free Probability by proving that Voiculescu's R and S-transforms can be used to prove upper bounds on the extreme roots of these polynomials. Two of the convolutions have been classically studied. The third, and the connection of all of them with random matrices, is new. We begin by defining the three convolutions and stating the algebraic identities that establish this connection, as well as basic results regarding their real-rootedness properties.

Algebraic identities and real rootedness
Symmetric additive convolution Definition 1.1 For complex univariate polynomials of degree at most d, the dth symmetric additive convolution of p and q is defined as: where D denotes differentiation with respect to x. Note that we have defined a sequence of convolutions parameterized by d, and in general p + c q = p + d q, even if both p and q have degree less than c and d; we will discuss this point more in Sect. 1.4. Observe that the definition above has the compact form: wherep andq are the unique polynomials satisfyingp(D)x d = p(x) andq(D)x d = q(x). This reveals that + d is symmetric, bilinear in its arguments, and commutes with differentiation and translation; i.e., (Dp(x)) + d q(x) = D( p(x) + d q(x)) and p(x − t) + d q(x) = ( p + d q)(x − t).
Note that the identity element for + d is the polynomial x d . For a square d × d complex matrix M, we define its characteristic polynomial in the variable x. We show that for monic polynomials, the operation + d can be realized as an expected characteristic polynomial of a sum of random matrices.
where the expectation is over a random unitary matrix Q from the Haar measure on U (d).
In fact, by unitary invariance the right hand side depends only on χ x (A) and χ x (B) and not on the further details of A and B, so we may take them to be any normal matrices with the same characteristic polynomials. The convolution (1) was studied by Walsh [1], who proved results including the following theorem (see also [2] and [3, Theorem 5.3.1]). In Theorem 1.12 we strengthen this bound on the maximum root. Our result is much tighter in the case that most of the roots of p and q are far from their maximum roots. of degree at most d, the dth symmetric multiplicative convolution of p and q is defined as: It is clear that × d is also bilinear, though it does not commute with differentiation or translation. The following compact differential form of (4), analogous to (2), was discovered by B. Mirabelli [4] who has kindly allowed us to include it here: if p(x) = P(x D)(x −1) d and q(x) = Q(x D)(x − 1) d , then Note that every polynomial of degree at most d can be written as P(x D)(x − 1) d , though it is not as obvious as in the additive case. The appearance of the polynomial (x − 1) d is explained by the fact that it is the identity element for × d , i.e., p(x) × d (x − 1) d = p(x) for every p of degree at most d.
We show that for monic polynomials the operation × d can be realized as an expected characteristic polynomial of a product of random matrices.

Theorem 1.5 If p(x) = χ x (A) and q(x) = χ x (B) for d ×d complex normal matrices A, B, then
where the expectation is over a Haar unitary Q.
The identity element (x − 1) d thereby corresponds to taking B = I in the above formula. This convolution was studied by Szegö [5], who proved the following theorem. Theorem 1. 6 If p and q have only nonnegative real roots, then the same is true of p × d q. Moreover, We give a quantitatively stronger bound in Theorem 1.13.
Asymmetric additive convolution Definition 1.7 For complex univariate polynomials of degree at most d, we define the dth asymmetric additive convolution of p and q to be: This is equivalent to the expression which may also be written for p(x) = P(Dx D)x d and q(x) = Q(Dx D)x d as the latter being an observation of [4].
We are not aware of previous studies of this convolution. We show that if p(x) and q(x) are monic with all roots real and nonnegative, then p ++ d q can be realized as an expected characteristic polynomial.
where the expectation is taken over independent Haar unitaries R and Q. We obtain bounds on the roots of p ++ d q in Theorem 1.14.

Remark 1.10
In an earlier version of this paper [6], we defined the operations + d , × d , and ++ d in terms of random matrices using the formulas (3), (6), (7), and stated the formulas appearing in Definitions 1.1, 1.4, and 1.7 as theorems by showing that they were equivalent. We have chosen to reverse this presentation since + d , × d were in fact already defined by Walsh and Szegö, albeit in a different context, and their basic properties such as bilinearity are more immediate from the purely algebraic definitions.

Motivation and related results
Before describing our analytic results on the locations of roots of the convolutions in the next section, we explain the motivations for studying them in the context of several other areas of mathematics. Free probability Free probability theory (see e.g. [7][8][9]) studies among other things the large d limits of random matrices such as those in (1.1), (1.4), and (1.7). In particular, it allows one to calculate the limiting spectral distribution of a sum or product of two unitarily invariant random matrix ensembles in terms of the limiting spectral distributions of the individual ensembles. For both the sum and the product, free probability provides a transform of the moments of the spectra of the individual matrices from which one can easily derive the transform of the moments of the limiting spectra of the resulting matrices-these are known as the R-and S-transforms, which may be viewed as generating functions for certain polynomials in the moments (known as free cumulants) which are linear in the convolutions. We show in Theorems 1.12, 1.13, and 1.14 (discussed in Sect. 1.3) that for our "finite free convolutions" the same transforms provide upper bounds on the roots of the corresponding expected polynomials in finite dimensions, when evaluated at specific points on the real line.
Following the definitions in this paper, [10,11] have shown that by taking appropriate limits our finite free convolutions yield the standard free convolutions in free probability theory. Thus, expected characteristic polynomials provide an alternative "discretization" of free convolutions from the typical one involving random matrices.
The paper [12] shows that the real zeros of p + d q and p × d q may be interpreted as the β → ∞ limit of certain generalizations of β−ensembles in random matrix theory.
Combinatorics The original motivation for this work is the method of interlacing families of polynomials [13][14][15], which reduces various combinatorial problems concerning eigenvalues to problems of bounding the roots of the expected characteristic polynomials of certain random matrices. In particular, the paper [15] studies bipartite random regular graphs, whose expected characteristic polynomials turn out to be of the type appearing in (1.7). The bound of Theorem 1.14 on the roots of these polynomials then implies the existence of bipartite Ramanujan graphs of every size and every degree (a result that was later turned into a polynomial time algorithm by Cohen [16]). Hall, Puder, and Sawin [17] used some of the techniques in this paper to prove the related result that every bipartite graph has a k−cover which is Ramanujan, for every k ≥ 2, generalizing [13].

Remark 1.11 (Renumbered Theorems)
The papers [15][16][17] cite theorems in the original arxiv version of this work [6], which are numbered differently from the ones in this paper. In particular, Theorems 1.7 and 1.8 of [6] correspond to Theorems 1.12 and 1.14 of this paper.
Representation theory As shown in Sect. 2, the unitary group may be replaced by the orthogonal group or the group of signed permutations in Theorems 1.2, 1.5, and 1.8 without changing the expected characteristic polynomial and therefore any of their statements. The ability to compute the average of a matrix function over the group of unitary matrices by instead computing an average over some smaller group of matrices is a phenomenon we refer to as quadrature (see [15] for more details).
The proofs given Sect. 2 are different from those that appeared in the first version [6] of this paper. The original proofs treat each of the convolutions as (specifically) being an average over the unitary group, and then by explicit calculations show that one can (in some cases) replace the unitary group with the signed permutation matrices and get the same result. After posting that preprint, we were informed [17] that our quadrature results were actually a manifestation of well-studied concepts in representation theory (the Peter-Weyl theorem); the work [17] gave a more general sufficient condition for a subgroup of the unitary group to have this property .
One subgroup that has been used specifically in an application is the n × n matrices corresponding to the standard representation of S n+1 (the symmetric group on n + 1 elements). The fact that this group is a valid quadrature group plays a pivotal role in the results on Ramanujan graphs [15] and [17] mentioned above.
Geometry of polynomials Theorem 1.3 implies that if p(x) is real-rooted of degree d, then the linear transformation p + d (·) preserves real-rootedness of all real polynomials of degree at most d. Leake and Ryder [18] observe that a partial converse is also true: every differential operator T : R ≤d [x] → R ≤d [x] preserving real-rootedness can be written as T (q) = p + d q for some p of degree at most d. Thus, our bounds on the extreme roots of the additive convolution imply bounds on how much any such operator can perturb the roots of its real rooted inputs. They also generalize our analytic bounds on the roots of symmetric additive convolutions (Theorem 1.12) by showing that they are a special case of submodularity relation.
Random matrix theory The expected characteristic polynomials of symmetric Gaussian random matrices are Hermite polynomials (see, e.g. [19,Theorem 4.1]). If R is an d-by-d matrix of independent Gaussian random variables of variance 1, then is the dth Hermite polynomial. By applying Theorem 1.2, but taking the expectation over orthogonal matrices, or by applying the formula (2), we may conclude that for positive a and b and c = √ a 2 + b 2 , Similarly, the expected characteristic polynomial of R R T is the dth Laguerre polynomial [20,Sect. 9] Thus, both Theorem 1.8 and the definition (7) can be used to show that for postive a and b and c = a + b,

Transforms and root bounds
In free probability, each of the three convolutions comes equipped with a natural transform of probability measures. We define analogous transforms on polynomials and use them to bound the extreme roots of the convolutions of polynomials.
We will identify a vector (λ 1 , . . . , λ d ) with the discrete distribution that takes each value λ i with probability 1/d. The Cauchy/Hilbert/Stieltjes transform of such a distribution is the function Given a polynomial p with roots λ 1 , . . . , λ d , we similarly define We will prove theorems about the inverse of the Cauchy transform, which we define for real w > 0 by For a real rooted polynomial p, and thus for real λ 1 , . . . , λ d , this is the value of x that is larger than all the λ i for which G p (x) = w. Since G p (x) is decreasing above the largest root of p, the maximum is uniquely attained for each w > 0.
This tells us that where we define Voiculescu's R-transform of the probability distribution that is uniform on λ is given by R λ (w) = K λ (w) − 1/w (though in free probability the inversion is typically done at the level of power series). We use the same notation to define a transform on polynomials If λ and μ are compactly supported probability distributions on the reals, then their free convolution λ + μ satisfies [7]: For our finite additive convolution, we obtain an analogous inequality for w > 0. Theorem 1.12 For w > 0 and real-rooted polynomials p and q of degree d, with equality if and only if p(x) or q(x) has the form (x − λ) d .
We will often write (9) as: where α = 1/wd. To bound the roots of the finite multiplicative convolution, we employ a variant of Voiculescu's S−transform. We first define a variant of the moment transform, which we write as a power series in 1/z instead of in z: For a polynomial p having only nonnegative real roots and a z > 0, We define the inverse of this transform, M This is the reciprocal of the usual S-transform. We prove the following bound on this transformation in Sect. 4.2.

Theorem 1.13
For degree d polynomials p and q having only nonnegative real roots and w > 0, with equality only when p or q has only one distinct root.
One can ask whether an inequality similar to Theorem 1.13 could hold in more generality (i.e., for a larger collection of polynomials than just those with nonnegative roots). While this seems possible, in this paper we will restrict to the case that both polynomials have nonnegative roots (see Remark 4.10 in Sect. 4.2). To define the relevant transform for the asymmetric additive convolution, we define S to be the linear map taking a polynomial p(x) to p(x 2 ). If p has only positive real roots λ i , then S p has roots ± √ λ i . If λ is a probability distribution supported on the nonnegative reals, then we use Sλ to denote the corresponding symmetric probability distribution on ± √ λ i . If λ and μ are compactly supported probability distributions on the positive reals, then Benaych-Georges [21] showed that their appropriately defined rectangular convolution λ ++ μ satisfies: In Sect. 4.3 we derive an analog of this result in the form of the following inequality. Theorem 1.14 For degree d polynomials p and q having only nonnegative real roots and w > 0, Remark 1. 15 The formulas above are stated only for polyomials of degree exactly d, but they may be applied to polynomials of degree at most d by first applying the degree-reduction formulas outlined in the next section.

Polynomials of different degrees
The operation p + d q is defined above for pairs of polynomials p and q of degree at most d, but if one or both of the polynomials has degree c < d, it may be written in terms of the lower degree operation + c .
Proof By (2), the differential operator This relationship between convolutions of different grades turns out to be a key tool in the analytic proofs in Sect. 4, which induct on the degrees of the polynomials. We prove similar degree reduction formulas for × d and ++ d in Lemmas 4.9 and 4.16, but have chosen to present them in context along with the inductive proofs in Sect. 4. It turns out that the operation + d can be naturally defined for polynomials of degree strictly greater than d via the random matrix identity (3). The key observation is that the expected characteristic polynomial of a random restriction of a d × d matrix is proportional to a derivative of its characteristic polynomial.

Lemma 1.17 If a > d, A is an a × a matrix and Q is a random d × a complex matrix with orthonormal rows (i.e., sampled from the Haar measure on the complex d × a Stiefel manifold C d a ), then
We prove this lemma in Sect. 2. Lemma 1.17 yields the following corollary, which may be viewed as a generalization of the definition of + d to polynomials of degree greater than d.

Corollary 1.18 If A and B are a × a and b × b Hermitian matrices with a, b ≥ d and characteristic polynomials p(x) and q(x) respectively, and R, Q are uniformly random from C d a and C d b respectively, then
Proof Since the formula (1) is bilinear in the characteristic polynomials χ x (A) and χ x (B), we have for fixed R: Averaging over R and invoking bilinearity and Lemma 1.17 once more finishes the proof.
Note that the definition (11) is consistent with Lemma 1.16, e.g. if b = d then the right hand side of (11) is equal to p(x) + a q(x). As differentiation preserves realrootedness, the generalized definition of + d preserves real-rootedness of polynomials of all degrees, and bounds on their roots may be obtained from our results by reducing to the equal degree case by differentiating sufficiently many times. While Lemma 1.17 can be used in conjunction with × d and ++ d just as easily due to their bilinearity, this does not correspond to the appropriate degree-reduction operators (see Lemmas 4.9 and 4.16) for those cases, so it does not lead to a satisfying generalization of the definitions to higher degrees.

Notation and organization
Let P(d) be the family of real rooted polynomials of degree exactly d with positive leading coefficient, and let P be the union over d of P(d). Let P + (d) be the subset of these polynomials having only nonnegative roots. We let P + be the union of P + (d) over all d ≥ 1.
For a function f (x), we write the derivative as D f (x). For a number α, we let U α be the operator that maps f to f − α D f . That is, U α is multiplication by 1 − α D.

Equivalence of convolutions and E
The goal of this section is to prove Theorems 1.2, 1.5, and 1.8 relating the three convolutions to random matrices. While we have so far only considered averages over unitary matrices, it turns out that one can average over various other collections of matrices and get the same formula. In Sect. 2.1, we will define a property that we call minor-orthogonality and then in Sect. 2.2 we will show that the coefficients we are interested can be computed using an average over any collection of minor-orthogonal matrices. Also in Sect. 2.1, we will show that the collection of n ×n signed permutation matrices (under a uniform distribution) is minor-orthogonal, and then we will use this to show that the orthogonal matrices (under the Haar measure) is minor-orthogonal.
There are some advantages to being able to express the convolutions as averages over different collections of matrices; in particular, a formula that is easily derived by replacing a unitary average by one over signed permutation matrices will be used in the proof of Lemma 4.25.

Minor-orthogonality
We will write [n] to denote the set {1, . . . , n} and for a set S, we write S k to denote the collection of subsets of S that have exactly k elements. When our sets contain integers (which they always will), we will consider the set to be ordered from smallest to largest. Hence, for example, if S contains the elements {2, 5, 3}, then we will write Lastly, for a set of integers S, we will write Let m, n be positive integers. Given an m × n matrix A and sets S ⊆ [m], T ⊆ [n] with |S| = |T |, we will write the (S, T )-minor of A as By definition, we will set [A] ∅,∅ = 1. A well-known theorem of Cauchy and Binet relates the minor of a product to the product of minors ( [22]):

Theorem 2.2 For integers m, n, p, k and m × n matrix A and n × p matrix B, we have
for any sets S ∈ [m] k and T ∈ [ p] k . Definition 2. 3 We will say that an m × n random matrix R is minor-orthogonal if for all integers k, ≤ max{m, n} and all sets S, T , U , V with |S| = |T | = k and |U | = |V | = , we have Given a minor-orthogonal matrix R it is easy to see from the definition that 1. R * is minor orthogonal 2. any submatrix that preserves the largest dimension of R is minor orthogonal

Lemma 2.4 If R is minor-orthogonal and Q is a fixed matrix for which Q Q
where the last line comes from the fact that The other case R Q follows by repeating the argument with R * and noting that R Q = (Q * R * ) * .

Definition 2.5
A signed permutation matrix is a matrix that can be written E P where E is a diagonal matrix with ±1 diagonal entries and P is a permutation matrix.

Lemma 2.6 A uniformly random n×n signed permutation matrix is minor-orthogonal.
Proof We can write a uniformly random signed permutation matrix Q as Q = E χ P π where P π is a uniformly random permutation matrix and E χ is a uniformly random diagonal matrix with χ ∈ {±1} n on the diagonal (and the two are independent). Hence for |S| = |T | = k and |U | = |V | = , we have where the penultimate line uses the fact that a diagonal matrix X satisfies and so we have Furthermore, [P π ] S,T = 0 except when T = π(S), so in order for both [P π ] S,T and [P π ] S,U to be nonzero simultaneously requires U = T . In the case that U = T = π(S), [P π ] S,T = ±1, and so we have The probability that a permutation of length n maps a set S to a set T with |S| = |T | = k is and so for |S| = |T | = k, we have as required. The case m > n follows by considering R * instead of R.

Corollary 2.7 An m × n Haar random matrix from the Stiefel manifold C m n is minororthogonal.
Proof Let R be such a random matrix, and assume m ≤ n. As a signed permutation matrix is unitary, R Q is also Haar distributed for any fixed signed permutation matrix Q. By Lemma 2.4 it is also minor-orthogonal. Hence and so where Q is a uniform signed permutation independent from R. By Lemma 2.6, Q is minor-orthogonal and so, for fixed R, Lemma 2.4 implies that R Q is minor-orthogonal. So as required.

Formulas
We begin this section by mentioning a well-known formula for the determinants of a sum of matrices (see [23]):

Theorem 2.8 For integers k ≤ n, n × n matrices A, B, and sets S, T ∈ [n]
k , we have We denote the coefficient of (−1) k x d−k of the characteristic polynomial of a ddimensional matrix A by σ k (A), which we recall is the kth elementary symmetric function of the eigenvalues of A. We will repeatedly use the fact that Lemma 2.9 For integers m ≤ n, let B be an n × n matrix and let R be a m × n minor-orthogonal matrix independent from B. Then for all |S| = |T | = k, we have The above lemma yields a quick proof of Lemma 1.17.

Proof of Lemma 1.17
Let A be an a × a matrix and let Q be Haar distributed on C d a . The k th coefficient of Eχ x (Q AQ * ) is equal to by Lemma 2.9 and minor-orthogonality of Q

Symmetric additive and multiplicative convolutions
Using Lemma 2.9, we can easily prove Theorems 1.2 and 1.5 by showing equality of each coefficient as per (14).
Proof By (14), Theorem 2.8, and then Lemma 2.9, we have where the last equality uses the fact that as d−i k−i is the number of times a set V appears as V = U (S) for some S and some U . That is, the number of ways we can add elements to a set of size V to obtain a set of size k.
Using Theorem 2.10 and Lemma 2.6 we can derive another useful formula for the symmetric additive convolution, this time as a function of the roots. It states that p(x) + d q(x) is the average of all polynomials you can form by adding the roots of p and q pairwise.
where the sum is over permutations σ of [d].
Proof Let A be the diagonal matrix with diagonal elements {a i } and let B be the diagonal matrix with diagonal elements {b i }. By Theorem 2.10, we have where the expectation can be taken over any minor-orthogonal random matrix R. By Lemma 2.6, we can take R to be a (uniformly random) signed permutation matrix.
Since A and B are diagonal, it is easy to compute (for each fixed value of R) where σ is the permutation part of R (all of the signs cancel). Averaging over these gives the result.
Proof By Theorem 2.2 and then Lemma 2.9, we have

Asymmetric additive convolution
The proof of the asymmetric convolution is a bit more involved, due to the appearance of a second random matrix.

Lemma 2.13 Let B be a d × d matrix and let Q and R be d × d independently random minor-orthogonal matrices. Then
for any |S| = |T | = k and |U | = |V | = .
Proof By Theorem 2.2 we have Proof Let |U | = k. Then by Theorem 2.2 we have By Theorem 2.8, Applying Lemma 2.13 then gives where again we use the fact that A(X ) = B(X ) if and only if A = B. Hence Similar to Theorem 2.10, we have |U |=|V |=k |W |=|Z |=i where θ i,k is the number of ways to complete S to a set of size k and complete T to a set of size k. Since S ⊆

Real rootedness of the asymmetric additive convolution
We will use the theory of stable polynomials to prove Theorem 1.9 (see e.g. [24] for an introduction). Stability is a multivariate generalization of real-rootedness which is preserved under a rich and well-understood class of linear transformations, and our approach is to realize p ++ d q(x) as a univariate restriction of a multivariate stable polynomial. For this theorem, we will require Hurwitz stable polynomials. We recall that a multivariate polynomial p(z 1 , . . . , z m ) ∈ IR[z 1 , . . . , z m ] is Hurwitz stable if it is identically zero or if whenever the real part of z i is positive for all i, p(z 1 , . . . , z m ) = 0.
Instead of proving Theorem 1.9 directly, we prove the following theorem from which it follows by substituting −x for x. We define P − (d) to be the subset of polynomials in P(d) having only nonpositive roots and P − to be the union over P − (d) for d ≥ 1.

Theorem 3.1 Let
We will use the following result to prove that a polynomial is in P − .

Lemma 3.2 Let r (x) be a polynomial such that h(x, y) = r (x y) is Hurwitz stable.
Then, r ∈ P − . Proof Let ζ be any root of r . If ζ is neither zero or negative, then it has a square root with positive real part. Setting both x and y to this square root would contradict the Hurwitz stability of h.
We will prove that r (x) is in P − by constructing a Hurwitz stable polynomial and applying Lemma 3.2. To this end, we need a few tools for constructing Hurwitz stable polynomials.
The first is elementary.

Claim 3.3 If p(x) ∈ P − , then the polynomial f (x, y) = p(x y) is Hurwitz stable.
Proof If both x and y have positive real part, then x y cannot be a nonpositive real, and thus cannot be a root of p.
The second tool is the following result of Borcea and Brändén, which is a consequence of Corollary 5.9 of [25].  . . . , x d , y 1 , . . . , y d The polynomial P is called the polarization of p. We remark that P(x, . . . , x, y, . . . , y)  = p(x, y).
The last result we need is due to Lieb and Sokal [26] (see also [27,Theorem 8.4] is Hurwitz stable.
It is immediate that h is Hurwitz stable too. We will prove that h(x, y) = r (x y), which by Lemma 3.2 implies that r is in P − . It will be convenient to know that We may now compute So, r (x y) = h(x, y) and therefore must have only nonpositive real roots.

Transform bounds
In this section we prove Theorems 1.12, 1.13, and 1.14. All of our transform bounds are proved using the following lemma. It allows us to pinch together two of the roots of a polynomial without changing the value of the Cauchy transform at a particular point. Through judicious use of this lemma, we are able to reduce statements about arbitrary polynomials to statements about polynomials with just one root.

In particular, if d ≥ 3 then p has at least two distinct roots.
Proof Let t = maxroot (U α p) and set We have chosen μ so that and thus maxroot (U α p) = t. Our choice of μ also guarantees that t − μ is the harmonic average of t − λ 1 and t − λ k . Thus, μ must lie strictly between λ 1 and λ k , which implies part b. As the harmonic mean of distinct numbers is less than their average, t − μ < (1/2)(2t − (λ 1 + λ k )), which implies that We have This and inequality (15) imply that p(x) ∈ P(d − 1). As U α is linear, we also have (U α p)(t) = 0. To finish the proof of part a, we need to show that the maximum root of U α p is less than t. The one root of p that is not a root of p is To see that t > ρ, compute which we know is greater than 0 because of (15) and the fact that μ is between λ 1 and λ k . This completes the proof of part a.
To prove part c, note that 2μ − (λ 1 + λ k ) > 0, and The following lemma provides one of the facts we exploit about the decomposition

Lemma 4.2 Let f , g, h be real rooted polynomials with positive leading coefficients
Then with equality if and only if Proof Note that equality in (17) clearly implies equality in (16). Now, assume by way of contradiction that (16) is false, and let x = maxroot ( f ). Since g and h have positive leading terms, g(x), h(x) > 0. Thus, f (x) = g(x) + h(x) > 0, a contradiction .

Symmetric additive convolution
We now prove the upper bound on the R-transform of p + d q.
with equality if and only if p = (x − λ) d for some real number λ. 1)α, giving equality in (18). We now prove the rest of the lemma by induction on d, with d = 2 being the base case. To establish the lemma in the case that d = 2 and p has roots λ 1 > λ 2 , note that As this polynomial has a positive leading term, the fact that maxroot (U α p) > r follows from For a real rooted polynomial p, define We will prove by induction on d that φ( p) > 0 for all polynomials p ∈ P(d) that have more than one root.
Assume by way of contradiction that there exists a monic (without loss of generality, since φ is independent of scaling) polynomial p ∈ P(d) with at least two distinct roots for which φ( p) ≤ 0. Let [−R, R] be an interval containing all of the roots of p, and define P(d)[−R, R] to be all monic polynomials in P(d) with all roots in this interval. Since [−R, R] d is a compact set and φ is a continuous function of the roots of p, there is a monic polynomial p 0 ∈ P(d)[−R, R] at which φ obtains its minimum. Let p 0 be such a polynomial, so φ( p 0 ) ≤ φ( p) ≤ 0. We may assume that p 0 has at least two distinct roots, because it is true if φ( p 0 ) < 0 whereas if φ( p 0 ) = 0 we may assume p 0 = p.
by the inductive hypothesis, as p has degree d−1. As this would contradict our assumption that φ( p) ≤ 0, we may assume maxroot (U α Dp 0 ) > maxroot (U α D p) and apply Lemma 4.2 to conclude maxroot (U α Dp 0 ) < maxroot (U α D p). This implies contradicting the minimality of p 0 . Thus, we may conclude that φ( p) > 0 for all p ∈ P(d) with at least two roots.

Lemma 4.5
For α ≥ 0, q = (x − λ) d for some real λ and p ∈ P(d), Proof We can prove this either by manipulating the identity in (1) and those following it, or by going through the defintion (1.1). To pursue the latter route, let A be a Hermitian matrix whose characteristic polynomials is p and let B = λI . We then have Thus, On the other hand, maxroot (U α q) = λ + αd. Proof of Theorem 4.3 Lemma 4.5 proves the theorem in the case that either p or q can be written in the form (x − λ) d . So, we will prove that if neither p nor q is of the form We will prove this by induction on d, the maximum degree of p and q. The base case d = 1 is handled by Lemma 4.5. Assume d ≥ 2 and fix a polynomial q ∈ P(d) with at least two roots. For any polynomial p in P(d), define As before, assume for contradiction that there exists a monic polynomial p with at least two roots for which φ( p) ≤ 0. Let [−R, R] be an interval containing the roots of p and let p 0 minimize φ over all monic polynomials whose roots are contained in this interval. We may assume that p 0 has at least two roots because Lemma 4.5 says it must if φ( p 0 ) < 0, and otherwise we may take p 0 = p. Thus, we can apply Lemma 4.1 to p 0 to obtain polynomials p ∈ P(d − 1) and p ∈ P(d) such that a. p 0 = p + p, which by the linearity of + d and U α implies b. the roots of p lie inside [−R, R], and so φ( p) ≥ φ( p 0 ); and c. maxroot (U α p) = maxroot (U α p) = maxroot (U α p 0 ).

By Lemma 1.16
As the degree of Dq is less than d, and Dq has at least two distinct roots by Lemma 4.6 we may apply our inductive hypothesis to conclude that as φ( p 0 ) ≤ 0. Thus, property (a) above and Lemma 4.2 imply that As maxroot (U α p) = maxroot (U α p 0 ), this implies φ( p) < φ(p 0 ), a contradiction. Thus, (19) holds when both polynomials have at least two roots.

Symmetric multiplicative convolution
In this section we prove the following upperbound on the variant of the S−transform of p × d q, defined in Sect. 1.3. Theorem 4.7 (Restatement of Theorem 1.13) For p, q ∈ P + (d) having only nonnegative real roots and w > 0, with equality only when p or q has only one distinct root.
We begin by considering the case in which p = (x − λ) d . We then have that Thus, and S p (w) = λ.
The finite multiplicative convolution of polynomials of different degrees may be computed by taking the polar derivative with respect to 0 of the polynomial of higher degree. We recall that the polar derivative at 0 of a polynomial p of degree d is given by dp − x Dp (see [2, p. 44]). Lemma 4.9 (Degree Reduction for × d ) For p(x) ∈ P(d) and q(x) ∈ P(k), for k < d, Proof Follows from equation (5) by an elementary computation.

Remark 4.10
If we let R be the operation on polynomials in P + (d) that maps p(x) to x d p(1/x). The polar derivative of a degree d polynomial may be expressed in terms of R by dp − x Dp = R D Rp.
In particular, the polar derivative has a discontinuity at ∞ that occurs as a root of D Rp passes 0, causing technical issues when considering polynomials p(x) with both positive and negative roots (especially given that the root we are concerned in is the largest one). This prevents our proof method (which we chose to highlight the parallel between the additive and multiplicative case) from generalizing to a larger collection of polynomials. We therefore leave the possibility of a generalization as an open problem.
In the special case of p ∈ P + (2) with distinct roots, strict inequality holds: Proof The first part is a simple calculation. Inequality (20) follows from the fact that p ∈ P + (d) implies that Rp ∈ P + (d) and the fact that the roots of D Rp interlace those of Rp. To see that (x D − d) p ∈ P + (d − 1), observe that its lead coefficient is positive. The last claim follows by noting that Rp is quadratic polynomial with distinct roots and so D Rp strictly interlaces Rp.
As we did with the symmetric additive convolution, we relate the M-transform to the maximum root of a polynomial. We have We therefore define the operator V w by which gives Note that the polar derivative is dV 0 . Our proof of Theorem 4.7 will also employ the following consequence of Lemma 4.1.

Corollary 4.12
Let w > 0, d ≥ 2, and let p(x) ∈ P + (d) have at least two distinct roots. Then there exist p ∈ P + (d) and p ∈ P + (d − 1) so that p(x) = p(x) + p(x), the largest root of p is at most the largest root of p, and if d ≥ 3 then p has two distinct roots.
Proof To derive this from Lemma 4.1, let t = maxroot (V w p) and set The polynomials p and p constructed in Lemma 4.1 now satisfy as desired.
Proof of Theorem 4. 7 We proceed by induction on d, the maximum degree of p and q. The theorem is true for d = 1 by Lemma 4.8. As we have already shown that equality holds when one of p or q has just one root, we need to show that when both p and q have at least two distinct roots: Fix q ∈ P + (d) with at least two distinct roots, and for p ∈ P + define: As before, we assume (for contradiction) that there exists a monic p with two distinct roots and φ( p) ≤ 0. Choose an interval [0, R] containing the roots of p, and let p 0 minimize φ over all degree d monic polynomials with roots in this interval. Observe that we can choose p 0 having two distinct roots: if φ( p 0 ) < 0 this is implied by Lemma 4.8 and if φ( p 0 ) = 0 we can take p = p 0 . Thus we may apply Corollary 4.12 to obtain polynomials p ∈ P + (d) and p ∈ P + (d − 1) with p 0 = p + p, and maxroot ( p) ≤ maxroot ( p 0 ). By Lemma 4.2, we have with equality only if all three are equal. However, noting that p and (x D − d)q = −R D Rq have two distinct roots whenever d ≥ 3: since φ( p 0 ) ≤ 0. Since at least one of the inequalities above is strict for all d ≥ 2, we must have maxroot (V w ( p 0 × d q)) < maxroot (V w ( p × d q)), which implies φ( p) < φ(p 0 ), contradicting the minimality of p 0 .

Asymmetric additive convolution
In this section we prove the rectangular analogue of Theorem 1.12.
Theorem 4.13 (Restatement of Theorem1.14) Let p(x) and q(x) be in P + (d) for d ≥ 1. Then for all α ≥ 0, with equality if and only if p or q equals x d .
We remark that if q(x) = x d , then p ++ d q = p, and This is why Theorem 4.13 holds with equality when q(x) = x d .
The following lemma tells us that it suffices to prove Theorem 4.13 in the case that α = 1.

Lemma 4.14 For a real-rooted polynomial p(x),
Then, Our proof of Theorem 4.13 will use the following lemma to pinch together roots of p to reduce the analysis to a few special cases.

Corollary 4.15
Let α > 0, d ≥ 2, and let p(x) ∈ P + (d) have at least two distinct roots. Then there exist p ∈ P + (d) and p ∈ P + (d − 1) so that p(x) = p(x) + p(x), the largest root of p is at most the largest root of p, p has a root larger than 0, and Apply Lemma 4.1 with 2αt in the place of α to construct the polynomials p and p. They satisfy which implies (23).
We will build up to the proof of Theorem 4.13 by first handling three special cases: As with the other convolutions, we may compute the asymetric additive convolution of two polynomials by first applying an operation to the polynomial of higher degree. In this case it is Dx D, also known as the "Laguerre Derivitive". Lemma 4.16 (Degree Reduction for ++ d ) Let p ∈ P + (d) and let q ∈ P + (k) for k < d. Then, Proof Follows from Theorem 1.8.
The following characterization of the Laguerre derivitive also follows from Theorem 1.8.
Proof By Lemma 4.14, it suffices to consider the case of α = 1. As S p(x) = (x 2 −λ) d , So, the largest root of this polynomial is the largest root of We may also compute We now prove that maxroot (q λ ) ≤ maxroot (r λ ) − 2, with equality only if λ = 0.
We first argue that q λ (x) is real rooted. This follows from that fact that it is a factor of U 1 SDx D(x − λ) d . For λ ≥ 0 all of the roots of Dx D(x − λ) d are nonnegative, and so applying S to it yields a polynomial with all real roots.
We now compute Define Elementary algebra gives So, q λ (μ λ ) ≥ 0, with equality only when λ = 0. With just a little more work, we will show that μ λ is an upper bound on the roots of q λ for all λ.
For q λ to have a root larger than μ λ , it would have to have two roots larger than μ λ . When λ = 0, the polynomial q λ has one root at μ 0 and a root at 0 with multiplicity 3. As q λ is real rooted for all λ ≥ 0 and the roots of q λ are continuous functions of its coefficients, and thus of λ, we can conclude that for small λ all but one of the roots of q λ must be near 0. Thus, for sufficiently small λ, q λ can have at most one root greater than μ λ , and so it must have none. As the largest root of q λ and μ λ are continuous function of λ, maxroot (q λ ) > μ λ for all sufficiently small λ. As q λ (μ λ ) > 0 for all λ ≥ 0, we can conclude that maxroot (q λ ) > μ λ for all λ ≥ 0.
To see that Lemma 4.18 is equivalent Theorem 4.13 in the case of q = x d−1 , note that for q(x) = x d−1 , The equivalence now follows from Claim 4.17 and the fact that the the largest root of this polynomial is 2(d − 1)α.

Lemma 4.19
For α ≥ 0, d ≥ 2 and p ∈ P + (d), Proof For every p ∈ P + , define We will show that φ( p) ≥ 0 for every polynomial p ∈ P + of degree at least 2, with equality only when p = x d . Our proof will be by induction on the degree of p. Assume that there exists a polynomial p ∈ P + (d) with φ( p) < 0, let [0, R] be an interval containing the roots of p, and let p 0 minimize φ over polynomials with roots in that interval. By Lemma 4.18, p 0 must have at least 2 distinct roots, and so we can apply Corollary 4.15 to obtain polynomials p and p.
Let x = maxroot (U α SDx Dp). If d = 2, then p has degree 1 and so U α SDx D p equals the lead coefficient of p, which implies For d ≥ 3, we can then assume by induction that φ( p) > 0, which then implies (24) Then for all α ≥ 0, We now use Lemma 4.20 to prove Theorem 4.13 through a variation of the pinching argument employed in the proof of Lemma 4.18.
Proof of Theorem 4. 13 We will prove this by induction on the maximum degree of p and q, which we call d. Our base case of d = 1 is handled by Lemma 4.20.
Lemma 4.18 also tells us that equality is only achieved when p = x d . We now consider the case in which both p and q have degree d. For polynomials p and q in P + (d), define φ( p, q) = maxroot (U α S p) + maxroot (U α Sq) − 2αd − maxroot (U α S( p ++ d q)) .
We will prove that φ( p, q) ≥ 0 for all such polynomials.
Assume (for contradiction) that there exist polynomials p, q with φ( p, q) < 0 and let [0, R] be an interval containing the roots of p and q. Again, φ is a continuous function (this time on the compact set [0, R] 2d ) so let p 0 , q 0 be a minimizer. If both p 0 and q 0 have at most 1 distinct root, then Lemma 4.20 implies φ( p 0 , q 0 ) ≥ 0, with equality only if one of them equals x d (a contradiction). Hence we can assume without loss of generality that p 0 has at least 2 distinct roots and so Corollary 4.12 provides polynomials p and p with maxroot (U α S p 0 ) = maxroot (U α S p) = maxroot (U α S p) and by Lemma 4.2 maxroot (U α S( p 0 ++ d q 0 )) ≤ max{maxroot (U α S( p ++ d q 0 )) , maxroot (U α S( p ++ d q 0 )) which in turn implies φ( p 0 , q 0 ) ≥ min φ( p, q 0 ), φ( p, q 0 ) with equality if and only if all are equal. Again equality cannot occur, since φ( p 0 , q 0 ) < 0 by assumption and φ( p, q 0 ) ≥ 0 by the inductive hypothesis and φ( p 0 , q 0 ) > φ( p, q 0 ) cannot occur for the same reason. But this implies φ( p 0 , q 0 ) > φ( p, q 0 ), which contradicts the minimality of the pair ( p 0 , q 0 ).

Ultraspherical polynomials
This section is devoted to the proof of Lemma 4.20. It is a consequence of the following lemma.

Lemma 4.21
For d ≥ 0 and positive λ and μ, Lemma 4.20 then follows from Theorem 4.3. We will prove Lemma 4.21 by showing that the polynomial on the left is a scaled Chebyshev polynomial of the second kind, and that the polynomial on the right is a Legendre polynomial with the same scaling. We then appeal to known relations between the roots of these polynomials. The Cheybshev and Legendre polynomials are both Ultraspherical (also called Gegenbauer) polynomials. These are special cases of Jacobi polynomials. It is known that their roots all lie between −1 and 1 and that they are symmetric about zero (see Theorem 3.3.1 of [28]).
The Ultraspherical polynomials with parameter α are defined by the generating function n C (α) n (x)t n = Two special instances of these polynomials are Stieltjes [29] (see Theorem 6.21.1 of [28]) established the following relation between the zeros of the Chebyshev and Legendre polynomials.
The relationship between the asymmetric additive convolution and the Chebyshev polynomials will make use of the generating function (25). To aid in this endeavor, we recall the following well-known generalization of the binomial theorem (see, for example, [30]).

Theorem 4.23
The function (1 + z) −k has the formal power series expansion

Lemma 4.24
For d ≥ 0 and positive λ and μ, Proof By (25), we have where the last equality follows by reversing the previous reasoning. This implies that where w = 1/αd.

Declaration
Declaration The original version of this paper was posted to arXiv in 2015 as paper number 1504.00350. This version is a substantial revision of the original.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.