On positivity of orthogonal series and its applications in probability

We give necessary and sufficient conditions for an orthogonal series to converge in the mean-squares to a nonnegative function. We present many examples and applications, in analysis and probability. In particular, we give necessary and sufficient conditions for a Lancaster-type of expansion ∑n≥0cnαn(x)βn(y)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sum _{n\ge 0}c_{n}\alpha _{n}(x)\beta _{n}(y)$$\end{document} with two sets of orthogonal polynomials αn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \alpha _{n}\right\} $$\end{document} and βn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \beta _{n}\right\} $$\end{document} to converge in means-squares to a nonnegative bivariate function. In particular, we study the properties of the set C(α,β)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(\alpha ,\beta )$$\end{document} of the sequences cn,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ c_{n}\right\} ,$$\end{document} for which the above-mentioned series converge to a nonnegative function and give conditions for the membership to it. Further, we show that the class of bivariate distributions for which a Lancaster type expansion can be found, is the same as the class of distributions having all conditional moments in the form of polynomials in the conditioning random variable.


Notation, terminology and basic settings
First, let us fix notation that mostly comes from the measure theory. All signed measures considered in the paper will be σ −finite, consequently, the Radon-Nikodym theorem can be applied. If χ is a signed measure and χ = χ + − χ − is its Hahn-Jordan decomposition then |χ | = χ + + χ − is a measure. Obviously, a signed measure χ is a measure if χ − = 0. We will use the notation f (x)dμ(x) interchangeably with f (x)μ(dx) or even f dμ if the set of integration is evident, to denote integral with respect to the (possibly signed) measure μ. Sometimes dμ(.) will denote measure μ itself.
In the sequel, we will be interested only in signed measures that have onedimensional marginal measures that are identified by their moments (for the definition and basic properties see the Appendix below). Following [5] or [24] this is assured for those one-dimensional measures μ that they satisfy the so-called Cramer's condition 1 , that is that there exists δ > 0 such that exp(δ |x|)d |μ| (x) < ∞. (1.1) In fact condition (1.1) can have a weaker form (i.e. the so-called Hardy's condition if the measure μ has support contained in {x : x ≥ 0}. But we will not go into these details. Let us denote by Cra the set of all signed measures χ on R such that satisfy the condition (1.1) for some positive number δ . Notice that Cra contains all measures with bounded supports.
Further, let us introduce the following set of signed measures AC2(μ), generated by a measure μ: In other words, the set AC2(μ) contains all signed measures ν that are absolutely continuous with respect to μ with their Radon-Nikodym derivative dν dμ (i.e. function f ) being square integrable with respect to the measure μ. Note, that in the definition of the set AC2(μ), μ can be a multidimensional σ −finite measure.
We have the following simple observation: Proposition 1 If a one-dimensional measure χ belongs to the set Cra, then AC2 (χ ) ⊂ Cra.
Proof Let f ∈ L 2 (χ ), then by by the Cauchy-Schwarz inequality we have: Consequently, f dχ satisfies condition (1.1), that is, it belongs to the set Cra.
In other words, if a signed measure μ is identifiable by moments, then every element of AC2 (μ) is identifiable by moments.
Let μ be a measure from the set Cra, by AC2 + (μ) let us denote the subset of Now if μ is a measure, { p n } n≥0 denotes the set of polynomials that are orthogonal with respect to the measure μ. Let us also define numbersp n by the following orthogonality relationship with δ nm denoting traditionally, the Kronecker's delta. Let us agree that in the sequel the "hat" over the symbol of a polynomial will denote the positive number defined by (1.2). Further, let {c n } n≥0 be the sequence of reals. The infinite series of the form n≥0 c n p n (x), is called an orthogonal series. It is known (see, e.g. [2]), that if the following condition is satisfied, then the series (1.3) converges in L 2 (μ).

Remark 1
Let us recall, that basically, all series considered will converge in the mean-square sense for some specified measure. However, let us recall that due to the Rademacher-Men'shov theorem (see, e.g. [2]), assuming sometimes only little stronger conditions, one can obtain the convergence almost everywhere with respect to the specified measure. More precisely, the Rademacher-Meshov theorem states that if the following condition is satisfied n≥0 c 2 np n log 2 (n + 1) < ∞, (1.5) then the series (1.3) converges in the means-squares and almost everywhere modμ.

The problem
The main idea of the paper is to present necessary and sufficient conditions for the positivity of the sum of the orthogonal series (1.3) for almost all x belonging to the closed subset M of the support of the positive measure μ with respect to which polynomials { p n } are orthogonal. The sufficient part of the theorem has been presented in 2011 in [17]. Later over the years, slight generalizations of the original formulation and many examples were presented in [15,16]. However, only recently I have realized, that the sufficient conditions for the coefficients c n to assure positivity of (1.3), are also necessary.
The paper is organized as follows. In the next Sect. 2, we present our main result together with its simple proof. We also quote papers where many examples illustrating the assertions of the theorem are presented. The last Sect. 3, presents applications of our result to probability theory in particular to the so-called Lancaster expansions. There is also an appendix in which we recall basic facts about the moments and the moments' problem and moment sequences.

General results
Our main result is the following: Consequently, the assertion of the Theorem 1 can be rephrased in the following way. An orthogonal series (1.3) with coefficients satisfying condition (1.4), is nonnegative for almost all (mod μ) x ∈ M if and only if another sequence {r n } of orthogonal polynomials can be found such that considering connection coefficient expansion of p n (x) in terms of {r n } given by (2.2) we have: Proof of Theorem 1 b) ⇒ a). First, let us assume that the coefficients c n are given by (2.1) and let us denote by f (x) the sum of (1.3). By assumptions, we know that it exists and it is square-integrable with respect to μ. We have Knowing numbers c npn , n = 0, 1, 2 . . . and the form of polynomials { p n (x)}, we can find numbers M x n dν(x) n≥0 and M x n f (x)dμ(x) . We see that they are identical and the two measures are, by assumption, identifiable by moments so the two measures must be identical i.e.
But ν was chosen to be nonnegative. So f (x) ≥ 0 on the M mod μ. Besides, we see that a) ⇒ b). Now, let us assume, that we want to find an expansion of the Radon-Nikodym derivative of two nonnegative measures ν << μ that is additionally squareintegrable (mod μ) in an infinite orthogonal series. That is, we are looking for the coefficients of the expansion of the form of (1.3). Then, following our assumptions, we have where {r n (x)} are polynomials orthogonal with respect to ν. These polynomials exist since for every positive measure satisfying condition (1.1) one can define such polynomials. Naturally, having two sets of orthogonal polynomials one has a set of connection coefficients between them. Since f is square-integrable with respect to dμ(x) we know that the coefficients c n are defined uniquely. Besides we have There are numerous examples of expansions of the type (1.3). They appeared over the years in [17] (Sect. 5 concerning mostly polynomials from the so-called Askey-Wilson scheme) or recently in [20], as well as in [15,18,19] (concerning. among others, Charlier (3.7) or Jacobi (3.6) polynomials).

Remark 4
Notice also that coefficient γ n,0 is equal to where the coefficients π n. j are defined by the expansion while the numbers m j form a moment sequence of some distribution absolutely continuous with respect to the measure μ. This observation can be derived directly from (2.1) or from the formula given by Lemma 1 of [16]). This observation leads also to the following method of checking if a given sequence {c n } applied in the series (1.3) can result in the series' positive-sum. Namely, considering formulae (2.4 ) and (2.3) we can find a sequence {m n } by recursively solving a sequence of equations: for n ≥ 0. Since {m n } has to be a moment sequence, we can apply one of the known criteria some of which are presented in the Appendix.

Remark 5
Continuing the previous remark, the assertion of the theorem (in case when M =supp μ can be expressed in the following way. There exists a linear map: K : L 2 (μ) −→ L 2 (μ) that can be symbolically expressed by the following formula: that maps every function f ∈ L 2 (μ) on itself, since, as it is easily seen, we have:

Probabilistic aspects
In this section, to avoid confusion, we will assume that all considered measures will be probabilistic that is they will integrate up to 1. Further, we will consider bivariate distributions d F(x, y) with marginal distributions dμ(x) and dν(y) (i.e. dμ(x) = F(x, dy) and similarly for dν). Naturally, we will assume, that both marginal measures belong to the set Cra in order to be identified by their moments. Moreover, we will consider only such bivariate distributions F satisfying the following condition: where dμ × dν denotes the product measure of dμ and dν.
Let us denote by {α n (x)} and {β n (y)} two sets of polynomials orthogonal with respect to the measures respectively dμ(x) and dν(y). Now, for all distributions satisfying (3.1) the following expansion is valid: Conditional distributions ζ(dx|y) and ξ(dy|x) are defined respectively, for almost all y (modv) and almost all x (modμ) by the following relationships: One shows that both these distributions do exist and are defined uniquely respectively modv and modμ. Notice that making use of the definition of marginal distribution and the orthogonal polynomials and changing, if necessary, the order of integration that we have: for all n ≥ 1 and likewise for polynomials {α n } . Now applying the above-mentioned definitions and properties to the expansion (3.2) we deduce that supp μ×supp v d F(x, y) = λ 0,0 = 1, ∀n ≥ 1 : λ 0,n = λ n,0 = 0, (3.4) and also, that: (3.5) We can now rephrase the above-mentioned Theorem 1 in the form that is important for the probabilists.
Theorem 2 Let μ ∈ Cra, and let {α n } be a set of polynomials orthogonal with respect to μ. Then, the orthogonal series: where, as above,α n is defined by (1.2) and such that m≥0 for all y belonging to some closed set supp ν, converges in mean square (modμ) to a nonnegative function iff there exists a family of probability measures ζ(.|y) indexed by y, such that for all y ∈ supp ν, ζ(.|y) << μ and ∀n ≥ 0 : If additionally there exists a probability measure ν such that for ∀n ≥ 0: then one can define a bivariate measure F << μ × ν by the formula and for which for all Borel subsets A of supp (μ) almost everywhere.
Proof Suppose, that F satisfies (3.1). Let μ and ν denote its marginal measures and let the sets {α n } and {β n } denote sets of polynomials orthogonal with respect to measures μ and ν respectively. Let the conditional distributions be defined by (3.5). Notice that by (3.4) and ( 3.5) we have supp μ ζ(dx, y) = 1, a.e. (mod ν and similarly for the ξ (dy, x)). Now, changing the order of summation and denoting by Further, utilizing (3.3) and (3.4) we have: with δ n,m denoting traditionally Kronecker's delta. By assumptions concerning polynomials {α n } and by Theorem 1 we see that for all y ∈ supp ν, we have: a.e. modv. Now let us assume the converse statement, i.e. that we have the converging to a nonnegative function in mean-square, series (3.6) with polynomials {α n } and the measure μ, as described in the assumptions, together with the condition (3.7) satisfied for almost every y belonging to some closed set that we will denote by supp ν. By Theorem 1, we deduce that if the series (3.6) converges to a nonnegative function, then there exists a family dζ(x|y) of positive measures absolutely continuous with respect to μ such that ∀n ≥ 0: The rest of this section will be dedicated to the so-called Lancaster expansions. In particular, we will be able to give now necessary and sufficient conditions for these types of expansions to exist. Let us recall that Lancaster, in the series of papers [8][9][10][11], considered and developed the following question: given a bivariate distribution say d F(x, y), its two marginal distributions say dμ(x) and dν(y) and the two sets of polynomials, when is it possible to find the set of numbers {c n } such that almost everywhere in supp (μ) × supp (ν) with respect to the product measure. In fact, Lancaster in his papers and also his followers in their papers confined the problem to such bivariate distributions d F satisfying condition (3.1).
Definition 1 A class of bivariate distributions with margins identifiable by moments, satisfying (3.1) and having expansion (3.9) will be called Lancaster class (of bivariate distributions), briefly (LC distributions).

Remark 6
Notice, that if F is of LC distribution, then we have: In other words, in terms used in probability, we can easily deduce that ∀n ≥ 1 the conditional moments, i.e.: respectively modν and modμ, where p n and q n , are some polynomials of the full order 2 n.

Definition 2
Class of bivariate distributions with margins identifiable by moments, having the property that all its conditional moments of the order, say, n are polynomials of the full order n will be called polynomial class (of distributions) briefly PC distributions.
As a corollary we have the following characterization of the Lancaster class of distributions.

Theorem 3 Let us consider a bivariate distribution F satisfying (3.1) with margins identifiable by moments. Then F is an LC distribution iff it is a PC distribution.
Proof The fact that every distribution of the Lancaster class belongs also to the PC class was noted in Remark 6. So now, let us assume that F belongs to the PC class. By Theorem 2 we know that it can be expanded in the series (3.6). Now we see that ∀n ≥ 1 : h n (y) = α n (x)ζ (dx, y) = E((α n (X )|Y = y). But, by our assumption, h n (y) has to be a polynomial of the full order n i.e.
where {β n } are the polynomials orthogonal with respect to the marginal measure ν. Hence we musta have γ n, j = 0 for j > n. Now changing the order of summation in (3.6) we get: But by our assumption E((β j (Y )|X = x) is a polynomial of the full order j. So we have: Now, by the uniqueness of expansion, we deduce that ∀n > j : γ n, j = 0.
Szabłowski in the series of papers [21][22][23] considered Markov, stochastic processes having two-dimensional finite distributions belonging to PC class. Hence, now, in light of the above-mentioned theorem, there is a possibility of expanding in the Lancasterlike series, the transition functions of such Markov processes.
Let us apply Theorem 2 to the analysis of the LC distributions or more precisely to the analysis when the series n≥0 c n α n (x)β n (y) (3.10) converges to a nonnegative function of (x, y) , where polynomials {α n } and {β n } are defined as in the introduction to Sect. 3. To simplify the formulation of the theorem and the applications following it, let us assume additionally that both families of polynomials {α n } and {β n } are orthonormal with respect to the measures μ and v respectively. Let us also denote by C(α, β) set of all sequences {c n } for which the sum (3.10) exists and is positive. Koudu in [12] showed that this set is convex (which is trivial, see, e.g. Appendix) and moreover, compact with respect to the weak topology. Hence the Choquet's theorem about extreme points can be applied. As a corollary of the Theorem 2 we have the following result: Theorem 4 Let the numbers a n, j and b n, j be defined by the polynomials α n and β n in the following way: α n (x) = n j=0 a n, j x j , β n (y) = n j=0 b n, j y j . (3.11) The series (3.10)  Proof Firstly, under our assumptions we haveα n =β n = 1, hence following the previous theorem we deduce that the series (3.10) converges in mean squares to some positive function iff Eα n (x) = c n β n (y), where the expectation is taken with respect to some absolutely continuous measure that is additionally parametrized by the parameter y, that belongs to the supp v. From this remark follows directly the first of the above-mentioned equation. By a similar argument we deduce that the second equation holds.

Remark 8
In fact, the equations (3.12) and (3.13) should be written in the following, less legible but more precise, recursive way: m (a) n (y) = c n b n,n a n,n y n + n−1 j=0 (c n b n, j y j − a n, j m (a) j (y))/a n,n , (3.14) m (b) n (y) = c n a n,n b n,n y n + n−1 j=0 (c n a n, j y j − b n, j m (b) j (y))/b n,n . (3.15) for n ≥ 0 with m 0 (x) = m 0 (y) = 1.

Corollary 1
Let coefficients a n, j and b n, j be defined by (3.11). If the series (3.10) converges to a positive sum, then a) n≥0 c 2 n < ∞, further additionally: b) if {0} ∈ supp μ and supp v then, we have: ∞ > n≥0 c n a n,0 b n,0 ≥ 0, c) if supp μ is unbounded, then the sequence c n a n,n b n,n , is a moment sequence, if additionally supp v is also unbounded, then c 2 n must be a moment sequence. d) if measures μ and v are the same and have unbounded supports, then {c n } must be a moment sequence.
Proof Part a) follows the fact that the series (1.3) converges in mean-squares and that p n = 1 since we consider only orthonormal polynomials. b) is obvious. c) Firstly, notice that in all cases from the system of equations (3.14) and (3.15) it follows that the leading coefficient in m n (y) must be respectively c n b n,n /a n,n and c n a n,n /b n,n . Now, if, say supp μ is unbounded, then from the fact that m (b) n (y) must be a moment sequence, then so must be the sequence y n c n a n,n /b n,n . From this fact the first assertion follows immediately. Now, if both sequences c n a n,n b n,n , c n b n,n a n,n are the moment ones, then is their their product (see the Appendix, below). Part d) follows directly from c).

Remark 9
The assertion d) of the above-mentioned corollary repeats in fact the result of Tyan et al. presented in [25].

Remark 10
Theorem 4, at least theoretically, closes the problem of finding conditions for the convergence to a positive bivariate function of the infinite series (3.10). Namely, having two sequences of moments (what is important given by the recursive formula) one can find their two Laplace transforms of the distributions identified by these sequences and invert them obtaining two conditional measures χ(.|x) and ζ(.|y) that are also defined by the conditions β n (y)dχ(y|x) = c n α n (x), α n (x)dζ(x|y) = c n β n (y).
The procedure to get these inverses is very difficult and long. On the way, the procedure utilizes Nevalinna's theory as described say in [1]. Now, the question of summing the series (3.10) is solved by the formula (3.8).
Example 1 We will now present an example in which we show how having a given family of orthogonal polynomials, and a moment sequence {c n } , one finds a sequence of moments {m n (y)}. Then, having this sequence one finds a sequence of orthogonal polynomials parametrized by y. Then, by different means, including the analysis of the three-term recurrence of this sequence, one finds the properties of the measure having moment sequence {m n (y)} and thus conclude, basing on Theorem 4, that the series (3.10) with α n (x) = β n (x) converges to a positive sum. The way is long and it seems that each case would be enough for an article. To shorten the conclusions and the description, the example will concern Hermite polynomials that are well known and the sequence c n = ρ n for some |ρ| < 1, just to illustrate the process of calculation. Note, that such sequence {c n } is a moment sequence. On the way, we will make use of the well-known properties of these polynomials. Besides, in this case it is easy just to guess the conditioning measure dχ(.|y).
Let us recall that the so-called (probabilistic) Hermite polynomials are defined by the following three-term recurrence Moreover, for all complex x, y and a the following expansions are true: Notice that and also that the orthonormal version of Hermite polynomials is equal to H n (x)/ √ n!, so for even n we have We used here the well-known approximation 2k . Thus, applying Corollary 1, we see that every applicable sequence {c n } must satisfy the following conditions:  [14]. Later this result was generalized by Griffith in [7] and Koudu in [13] with Hermite polynomials replaced by polynomials orthogonalizing gamma distribution (Griffith) and Poisson and negative binomial (Koudu). The Koudu's results were later applied to parameter testing of the chosen Lancaster bivariate distributions by Chen in [6].
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix: A few facts of the moment problem and the orthogonal polynomials
Let be α be a signed, finite measure on the real line. Then the sequence of reals {m n } n defined by: m n = x n dα(x), is called a sequence of moments or the moment sequence (sm) of the measure α. Below we have the surprising general result by Boas [3]. In other words any sequence of numbers is a moment sequence of some signed measure.
We will be interested in sequences of moments of positive measures. It turns out (see, e.g. [24]), that the sequence {m n } is a moment sequence of some nonnegative measure α, iff it satisfies the following condition for ∀n ≥ 0 : The sequence {d n } related to the sequence {m n } and defined by (3.19), is called the Hankel transform of the sequence {m n } . It is also known (see, e.g. [24]) that if the sharp inequality in (3.19) holds for all n ≥ 1, then the support of α is of infinite cardinality. If additionally ∀n ≥ 0: Sequences {m n } that are the moment sequences of some nonnegative measures will me called p(ositive)m(oment) sequences i.e. pm sequences.
Let us mention the two simple necessary conditions for a sequence of reals to be a pm sequence.
is non-decreasing.
Proof Assertion a) is obvious, b) follows directly Cauchy-Schwarz inequality while c) follows Jensen's inequality.
In the sequel we will assume that the measure α is a probability measure i.e. dα(x) = 1 which results in the fact that all pm sequences will have m 0 = 1. The generating function ϕ of the pm sequence is defined by the following formula: Hence, if the moment problem is determinate (i.e. the measure α is identified by its sequence of moments), this Laplace transform exists for even small neighborhood of zero.
For the aims of this paper, this criterion of determinacy is more important. However, for the sake of completeness, let us remark that there exists, however, another criterion given by Carleman (see, e.g. [1] or [5]) where the determinacy follows the properties of the moment sequence itself. Namely, Carleman's criterion reads: If only n≥1 m −1/(2n) 2n = ∞, then the sequence of moments {m n } defines uniquely the measure that created this sequence.
It is known (see, e.g. [1] or [5]), that the sequence of polynomials orthogonal with respect to the measure that produced a given moment sequence {m n } is given by the following sequence: Further, it is also known (see, e.g. [1] or [5]) that for every orthogonal polynomial sequence { p n } one can define three sequences of numbers {A n } , {B n } , {C n }, such that for every n ≥ 0: p n+1 (x) = (A n x + B n ) p n (x) − C n p n−1 (x), and for n ≥ 1 : C n A n A n−1 > 0, provided the support of the measure making these polynomials orthogonal is of infinite cardinality. These real sequences are defined by numbers {d n } given by (3.19). For details see again [1] or [5].
We have also the following simple observations concerning the properties of pm sequences. Proof 1. The arguments are probabilistic. Let X and Y be two independent random variables having respectively moments {a n } and {b n } . Then {a n b n } is the moment sequence of XY , n i=0 (±1) i n i α i β n−i a i b n−i n≥0 is the moment sequence of (βY ± α X ). Let Z have the so-called mixture distribution of the distributions of random variables X and Y having moment sequences, respectively {a n } and {b n } . Then { pa n + b n (1 − p)} n≥0 is the moment sequence of Z .
2. {a kn } n≥0 is the moment sequence of X k . To get remaining statements of this we consider special mixtures of the independent copies of X and −X to get first statement and √ X and − √ X to get the second. x , x ∈ (0, 4) . {n!} n≥0 are moments of distribution with the density exp(−x), x ≥ 0, {1/(n + 1) k+1 } n≥0 are the moment sequence of distributions with the densities (− log(x)) k / (k+1), x ∈ (0, 1) , k > −1. For sequences composed of Fibonacci numbers, see [4].
Hence in particular the following families of polynomials are the pm sequences for every x ∈ R: n k=0 n k a k (±1) n−k x n−k n≥0 , where {a n } n≥0 is the pm sequence hence some of the Appell polynomial sequences are pm.