Abstract
In this work, some new functional of Jensen-type inequalities are constructed using Shannon entropy, f-divergence, and Rényi divergence, and some estimates are obtained for these new functionals. Also using the Zipf–Mandelbrot law and hybrid Zipf–Mandelbrot law, we investigate some bounds for these new functionals. Furthermore, we generalize these new functionals for m-convex function using Lidstone polynomial.
1 Introduction and preliminary results
The most commonly used words, population ranks of cities in various countries, corporation sizes, income rankings can be described in terms of Zipf’s law. The f-divergence measures the difference between two probability distributions by making an average value, which is weighted by a specified function. There are other probability distributions like Csiszar f-divergence [10, 11], a special case of which is Kullback–Leibler-divergence which is used to find the appropriate distance between the probability distributions (see [18, 19]). The notion of distance is stronger than divergence, because it gives the properties of symmetry and triangle inequalities. Probability theory has application in many fields and the divergence between probability distributions has many applications in these fields.
Many natural phenomena such as distributions of wealth and income in a society, Facebook likes, football goals, and city sizes follow power-law distributions (Zipf’s Law). Auerbach [2] was the first to explore the idea that the distribution of city size can be well approximated with the help of Pareto distribution (power-law distribution). This idea was well refined by many researchers, but Zipf [27] worked significantly in this field. The distribution of city sizes is investigated by many scholars of the urban economics, like Rosen and Resnick [24], Black and Henderson [3], Ioannides and Overman [17], Soo [25], Anderson and Ge [1], and Bosker et al. [4]. Zipf’s law states that: “The rank of cities with a certain number of inhabitants varies proportional to the city sizes with some negative exponent, say that is close to unit". In other words, Zipf’s Law states that the product of city sizes and their ranks appears roughly constant. This indicates that the population of the second largest city is one half of the population of the largest city and the third largest city equal to one-third of the population of the largest city and the population of n-th city is \(\frac{1}{n}\) of the largest city population. This rule is called rank, size rule, and also named as Zipf’s Law. Hence, Zip’s law shows that the city size distribution follows the Pareto distribution and, in addition, that the estimated value of the shape parameter is equal to unity.
Horváth et al. [16] introduced and obtained some estimates for new functionals based on the f-divergence functionals. and obtained some estimates for the new functionals. They obtained f-divergence and Rényi divergence by applying a cyclic refinement of Jensen’s inequality. They also constructed new inequalities for Rényi and Shannon entropies and used Zipf–Madelbrot law to illustrate the results.
The inequalities involving higher order convexity are used by many physicists in higher dimension problems since the founding of higher order convexity by Popoviciu (see [22, p. 15]). It is quite interesting that some results are true for convex functions, but in higher order convexity, they are not valid anymore.
In [22, p. 16], the following criteria are given to check the m-convexity of the function.
If \(f^{(m)}\) exists, then f is m-convex if and only if \(f^{(m)} \ge 0\).
In recent years, many researchers have generalized the inequalities for m-convex functions. For instance, Butt et al. generalized the Popoviciu’s inequality for m-convex function using Taylor’s formula, Lidstone polynomial, Montgomery identity, Fink’s identity, Abel–Goncharov interpolation, and Hermite interpolating polynomial (see [5,6,7,8,9]).
For many years, Jensen’s inequality has been of great interest. It was refined by defining some new functions (see [14, 15]). Horváth and Pečarić ([12, 15], see also [13, p. 26]) gave a refinement of Jensen’s inequality for convex function. They defined some essential notions to prove the refinement given as follows:
Let X be a set, and: \(P(X):=\) Power set of X, |X|:= Number of elements of X, \({\mathbb {N}}\):= Set of natural numbers with 0. Consider \(q \ge 1\) and \(r \ge 2\) be fixed integers. Define the functions:
and
by
and
Next, the function
is defined by:
For each \(I \in P(\{1, \ldots , q\}^r)\), let
\(\left( H_1\right) \) Let n, m be fixed positive integers, such that \(n\ge 1\), \(m\ge 2\), and let \(I_m\) be a subset of \(\{1, \ldots , n \}^m\), such that:
Introduce the sets \(I_{l}\subset \{1, \ldots , n\}^{l} (m-1 \ge l \ge 1)\) inductively by:
Obviously, the sets \(I_1= \{1, \ldots , n\}\), by \((H_1)\), and this insures that \(\alpha _{I_1, i}=1(1 \le i \le n)\). From \((H_1)\), we have \(\alpha _{I_l, i} \ge 1(m-1 \ge l \ge 1, 1 \le i \le n)\).
For \(m \ge l \ge 2\), and for any \((j_1, \ldots , j_{l-1})\in I_{l-1}\), let:
With the help of these sets, they define the functions \(\eta _{I_m, l}: I_l \rightarrow {\mathbb {N}}(m \ge l \ge 1)\) inductively by:
They define some special expressions for \(1 \le l \le m\), as follows:
and prove the following theorem.
Theorem 1.1
Assume \((\mathrm{H}_1)\), and let \(f: I \rightarrow {\mathbb {R}}\) be a convex function where \(I \subset {\mathbb {R}}\) is an interval. If \(x_1, \ldots , x_n \in I\), and \(p_1, \ldots , p_n\) are positive real numbers, such that \(\sum \nolimits _{i=1}^{n}p_i=1\), then
We define the following functionals by taking the differences of refinement of Jensen’s inequality given in (1):
Under the assumptions of Theorem 1.1, we have:
Inequalities (4) are reversed if f is concave on I.
1.1 Lidstone polynomial
We generalize the refinement of Jensen’s inequality for higher order convex function using Lidstone interpolating polynomial. In [26], Widder gives the following result.
Lemma A
If \(g \in C^{\infty }([0, 1])\), then:
where \({\mathfrak {F}}_{l}\) is a polynomial of degree \(2l+1\) defined by the relation:
and
is a homogeneous Green’s function of the differential operator \(\frac{\mathrm{d}^2}{\mathrm{d}^2s}\) on [0, 1], and with the successive iterates of G(u, s):
The Lidstone polynomial can be expressed in terms of \(G_{m}(u, s)\) as:
Lidstone series representation of \(g \in C^{2m}[\alpha _1, \alpha _2]\) is given by:
2 Inequalities for Csiszár divergence
In [10, 11], Csiszár introduced the following notion.
Definition 2.1
Let \(f : {\mathbb {R}}^{+} \rightarrow {\mathbb {R}}^{+}\) be a convex function, let \({\mathbf {r}}=\left( r_1, \ldots , r_n\right) \), and \({\mathbf {q}}=\left( q_1, \ldots , q_n\right) \) be positive probability distributions. Then, f-divergence functional is defined by:
And he stated that by defining:
we can also use the non-negative probability distributions, as well.
Horv́ath et al. [16] gave the following functional based on the previous definition.
Definition 2.2
Let \(I \subset {\mathbb {R}}\) be an interval and let \(f: I \rightarrow {\mathbb {R}}\) be a function. Let \({\mathbf {r}}=(r_1, \ldots , r_n)\in {\mathbb {R}}^n\) and \({\mathbf {q}}=(q_1, \ldots , q_n)\in (0, \infty )^{n}\), such that:
Then, they define the sum \({\hat{I}}_{f}({\mathbf {r}}, {\mathbf {q}})\) as:
We apply Theorem 1.1 to \({\hat{I}}_{f}({\mathbf {r}}, {\mathbf {q}})\).
Theorem 2.3
Assume \((H_1)\), let \(I \subset {\mathbb {R}}\) be an interval and let \({\mathbf {r}}=\left( r_1, \ldots , r_n\right) \) and \({\mathbf {q}}=\left( q_1, \ldots , q_n\right) \) be in \((0, \infty )^{n}\), such that
(i) If \(f: I \rightarrow {\mathbb {R}}\) is a convex function, then:
where
If f is a concave function, then inequality signs in (10) are reversed.
(ii) If \(f: I \rightarrow {\mathbb {R}}\) is a function, such that \(x \rightarrow xf(x) (x \in I)\) is convex, then:
where
Proof
(i) Consider \(p_{s} = \frac{q_{s}}{\sum _{s=1}^{n}q_s}\) and \(x_{s} = \frac{r_s}{q_s}\) in Theorem 1.1, we have:
And taking the sum \(\sum _{s=1}^{n}q_{i}\), we have (10).
(ii) Using \(f:=id f\) (where “id” is the identity function) in Theorem 1.1, we have:
Now, using \(p_s = \frac{q_s}{\sum _{s=1}^{n}q_s}\) and \(x_s = \frac{r_s}{q_s}, \,\ s = 1, \ldots , n\), we get:
By taking sum \(\sum _{s=1}^{n}q_s\) on both sides, we get (12). \(\square \)
3 Inequalities for Shannon entropy
Definition 3.1
(See [16]) The Shannon entropy of positive probability distribution \({\mathbf {r}}=(r_1, \ldots , r_n)\) is defined by:
Corollary 3.2
Assume \((\mathrm{H}_1)\).
-
(i)
If \({\mathbf {q}}=(q_1, \ldots , q_n) \in (0, \infty )^{n}\), and the base of \(\log \) is greater than 1, then:
$$\begin{aligned} S \le A_{m,m}^{[3]} \le A_{m,m-1}^{[3]} \le \cdots \le A_{m,2}^{[3]} \le A_{m,1}^{[3]} = \log \left( \frac{n}{\sum _{s=1}^{n}q_s}\right) \sum _{s=1}^{n}q_s, \end{aligned}$$(17)where
$$\begin{aligned}&A_{m,l}^{[3]} = - \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l)\left( \sum \limits _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_m, i_j}}\right) \log \left( \sum \limits _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_m, i_j}}\right) . \end{aligned}$$(18)If the base of \(\log \) is between 0 and 1, then inequality signs in (17) are reversed.
-
(ii)
If \({\mathbf {q}}= (q_1, \ldots , q_n)\) is a positive probability distribution and the base of \(\log \) is greater than 1, then we have the estimates for the Shannon entropy of \({\mathbf {q}}\):
$$\begin{aligned}&S \le A_{m,m}^{[4]} \le A_{m,m-1}^{[4]} \le \cdots \le A_{m,2}^{[4]} \le A_{m,1}^{[4]} = \log (n), \end{aligned}$$(19)where
$$\begin{aligned} A_{m,l}^{[4]} = - \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_m, i_j}}\right) \log \left( \sum \limits _{j=1}^{l} \frac{q_{i_j}}{\alpha _{I_m, i_j}}\right) . \end{aligned}$$
Proof
(i) Using \(f:= \log \) and \({\mathbf {r}} = (1, \ldots , 1)\) in Theorem 2.3 (i), we get (17).
(ii) It is a special case of (i). \(\square \)
Definition 3.3
(See [16]) The Kullback–Leibler divergence between the positive probability distribution \({\mathbf {r}}=(r_1, \ldots , r_n)\) and \({\mathbf {q}}= (q_1, \ldots , q_n)\) is defined by:
Corollary 3.4
Assume \((\mathrm{H}_1)\).
-
(i)
Let \({\mathbf {r}} = (r_1 , \ldots , r_n) \in (0, \infty )^{n}\) and \({\mathbf {q}} : = (q_1, \ldots , q_n) \in (0, \infty )^{n}\). If the base of \(\log \) is greater than 1, then:
$$\begin{aligned}&\sum _{s=1}^{n}r_s \log \left( \sum _{s=1}^{n}\frac{r_s}{\sum _{s=1}^{n}q_s}\right) \le A_{m, m}^{[5]} \le A_{m, m-1}^{[5]} \le \cdots \le A_{m, 2}^{[5]} \le A_{m, 1}^{[5]} = \sum _{s=1}^{n}r_s \log \left( \frac{r_s}{q_s}\right) = D({\mathbf {r}}, {\mathbf {q}}),\nonumber \\ \end{aligned}$$(21)where
$$\begin{aligned}&A_{m, l}^{[5]} = \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l)\in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{{q_{i_j}}}{\alpha _{I_m, i_j}}\right) \left( \frac{\sum _{j=1}^{l} \frac{r_{i_j}}{\alpha _{I_{m}, i_j}}}{\sum _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_{m}, i_j}}} \right) \log \left( \frac{\sum _{j=1}^{l} \frac{r_{i_j}}{\alpha _{I_{m}, i_j}}}{\sum _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_{m}, i_j}}} \right) . \end{aligned}$$If the base of \(\log \) is between 0 and 1, then inequality in (21) is reversed.
-
(ii)
If \(\mathbf{r }\) and \(\mathbf{q }\) are positive probability distributions, and the base of \(\log \) is greater than 1, then we have:
$$\begin{aligned}&D(\mathbf{r }, \mathbf{q }) = A_{m, 1}^{[6]} \ge A_{m, 2}^{[6]} \ge \ldots \ge A_{m, m-1}^{[6]} \ge A_{m, m}^{[6]} \ge 0, \end{aligned}$$(22)where
$$\begin{aligned} A_{m, l}^{[6]}= & {} \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l)\in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{{q_{i_j}}}{\alpha _{I_m, i_j}}\right) \left( \frac{\sum _{j=1}^{l} \frac{r_{i_j}}{\alpha _{I_{m}, i_j}}}{\sum _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_{m}, i_j}}} \right) \log \left( \frac{\sum _{j=1}^{l} \frac{r_{i_j}}{\alpha _{I_{m}, i_j}}}{\sum _{j=1}^{l}\frac{q_{i_j}}{\alpha _{I_{m}, i_j}}} \right) . \end{aligned}$$If the base of \(\log \) is between 0 and 1, then inequality signs in (22) are reversed.
Proof
(i) On taking \(f: = \log \) in Theorem 2.3 (ii), we get (21).
(ii) It is a special case of (i). \(\square \)
4 Inequalities for Rényi divergence and entropy
The Rényi divergence and entropy come from [23].
Definition 4.1
Let \(\mathbf{r } := (r_1, \ldots , r_n)\) and \(\mathbf{q } : = (q_1, \ldots , q_n)\) be positive probability distributions, and let \(\lambda \ge 0\), \(\lambda \ne 1\).
- (a):
-
The Rényi divergence of order \(\lambda \) is defined by:
$$\begin{aligned} D_{\lambda }(\mathbf{r }, \mathbf{q }) : = \frac{1}{\lambda - 1} \log \left( \sum _{i=1}^{n}q_{i}\left( \frac{r_i}{q_i}\right) ^{\lambda } \right) . \end{aligned}$$(23) - (b):
-
The Rényi entropy of order \(\lambda \) of \(\mathbf{r }\) is defined by:
$$\begin{aligned} H_{\lambda }(\mathbf{r }) : = \frac{1}{1 - \lambda } \log \left( \sum _{i=1}^{n} r_{i}^{\lambda }\right) . \end{aligned}$$(24)
The Rényi divergence and the Rényi entropy can also be extended to non-negative probability distributions. If \(\lambda \rightarrow 1\) in (23), we have the Kullback–Leibler divergence, and if \(\lambda \rightarrow 1\) in (24), then we have the Shannon entropy. In the next two results, inequalities can be found for the Rényi divergence.
Theorem 4.2
Assume \((\mathrm{H}_{1})\), let \(\mathbf{r } = (r_1, \ldots , r_n)\) be \(\mathbf{q } = (q_1, \ldots , q_n)\) be probability distributions.
-
(i)
If \( 0 \le \lambda \le \mu \), such that \( \lambda , \mu \ne 1\), and the base of \(\log \) is greater than 1, then:
$$\begin{aligned}&D_{\lambda } (\mathbf{r }, \mathbf{q }) \le A_{m, m}^{[7]} \le A_{m, m-1}^{[7]} \le \cdots \le A_{m, 2}^{[7]} \le A_{m, 1}^{[7]} = D_{\mu } (\mathbf{r }, \mathbf{q }), \end{aligned}$$(25)where
$$\begin{aligned} A_{m, l}^{[7]}= & {} \frac{1}{\mu -1}\log \left( \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \right. \\&\times \left. \left( \frac{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\left( \frac{r_{i_j}}{q_{i_j}}\right) ^{\lambda - 1}}{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}}\right) ^{\frac{\mu - 1}{\lambda - 1}}\right) . \end{aligned}$$The reverse inequalities hold in (25) if the base of \(\log \) is between 0 and 1.
-
(ii)
If \(1 < \mu \) and the base of \(\log \) is greater than 1, then:
$$\begin{aligned}&D_{1} (\mathbf{r }, \mathbf{q }) = D (\mathbf{r }, \mathbf{q }) = \sum _{s=1}^{n}r_s\log \left( \frac{r_s}{q_s}\right) \le A_{m, m}^{[8]} \le A_{m, m-1}^{[8]} \le \cdots \le A_{m, 2}^{[8]} \le A_{m, 1}^{[8]} = D_{\mu } (\mathbf{r }, \mathbf{q }),\nonumber \\ \end{aligned}$$(26)where
$$\begin{aligned} A_{m, l}^{[8]}= & {} \le \frac{1}{\mu -1}\log \left( \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \exp \right. \\&\times \left. \left( \frac{(\mu -1)\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}} \log \left( \frac{r_{i_j}}{q_{i_j}}\right) }{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}} \right) \right) ; \end{aligned}$$here the base of \(\exp \) is same as the base of \(\log \), and the reverse inequalities hold if the base of \(\log \) is between 0 and 1.
-
(iii)
If \(0 \le \lambda < 1\), and the base of \(\log \) is greater than 1, then:
$$\begin{aligned}&D_{\lambda } (\mathbf{r }, \mathbf{q }) \le A_{m, m}^{[9]} \le A_{m, m-1}^{[9]} \le \cdots \le A_{m, 2}^{[9]} \le A_{m, 1}^{[9]} = D_{1} (\mathbf{r }, \mathbf{q }), \end{aligned}$$(27)where
$$\begin{aligned}&A_{m, l}^{[9]} = \frac{1}{\lambda -1}\frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \log \left( \frac{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\left( \frac{r_{i_j}}{q_{i_j}}\right) ^{\lambda - 1}}{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}}\right) .\nonumber \\ \end{aligned}$$(28)
Proof
By applying Theorem 1.1 with \(I=(0, \infty )\), \(f: (0, \infty ) \rightarrow {\mathbb {R}}\), \(f(t):= t^{\frac{\mu - 1}{\lambda -1}}\):
we have:
if either \(0 \le \lambda< 1 < \beta \) or \(1 < \lambda \le \mu \), and the reverse inequality in (29) holds if \(0 \le \lambda \le \beta < 1\). By raising to power \(\frac{1}{\mu - 1}\), we have from all:
Since \(\log \) is increasing if the base of \(\log \) is greater than 1, it now follows (25). If the base of log is between 0 and 1, then \(\log \) is decreasing and, therefore, inequality in (25) is reversed. If \(\lambda = 1\) and \(\beta = 1\), we have (ii) and (iii), respectively, by taking limit. \(\square \)
Theorem 4.3
Assume \((\mathrm{H}_{1})\); let \(\mathbf{r } = (r_1, \ldots , r_n)\) and \(\mathbf{q } = (q_1, \ldots , q_n)\) be probability distributions. If either \(0 \le \lambda < 1\) and the base of \(\log \) is greater than 1, or \(1 < \lambda \) and the base of \(\log \) is between 0 and 1, then:
where
and
The inequalities in (31) are reversed if either \(0 \le \lambda < 1\) and the base of \(\log \) is between 0 and 1, or \(1 < \lambda \) and the base of \(\log \) is greater than 1.
Proof
We prove only the case when \(0 \le \lambda < 1\) and the base of \(\log \) is greater than 1 and the other cases can be proved similarly. Since \(\frac{1}{\lambda - 1} < 0\) and the function \(\log \) is concave and then choose \(I = (0, \infty )\), \(f : = \log \), \(p_{s} = r_{s}\), \(x_{s}: = \left( \frac{r_s}{q_s}\right) ^{\lambda - 1}\) in Theorem 1.1, we have:
and this gives the upper bound for \(D_{\lambda } (\mathbf{r }, \mathbf{q })\).
Since the base of \(\log \) is greater than 1, the function \(x \mapsto xf(x)\) \((x > 0)\) is convex; therefore, \(\frac{1}{1 - \lambda } < 0\), and Theorem 1.1 gives:
which give the lower bound of \(D_{\lambda } (\mathbf{r }, \mathbf{q })\). \(\square \)
Using the previous results, some inequalities of Rényi entropy are obtained. Let \(\frac{\mathbf{1 }}{\mathbf{n }} = (\frac{1}{n}, \ldots , \frac{1}{n})\) be a discrete probability distribution.
Corollary 4.4
Assume \((\mathrm{H}_1)\); let \(\mathbf{r }= (r_1, \ldots , r_n)\) and \(\mathbf{q }= (q_1, \ldots , q_n)\) be positive probability distributions.
-
(i)
If \(0 \le \lambda \le \mu \), \(\lambda , \mu \ne 1\), and the base of \(\log \) is greater than 1, then:
$$\begin{aligned}&H_{\lambda }(\mathbf{r }) = \log (n) - D_{\lambda }\left( \mathbf{r }, \frac{1}{\mathbf{n }}\right) \ge A_{m, m}^{[12]} \ge A_{m, m}^{[12]} \ge \cdots A_{m, 2}^{[12]} \ge A_{m, 1}^{[12]} = H_{\mu }(\mathbf{r }), \end{aligned}$$(34)where
$$\begin{aligned} A_{m, l}^{[12]}= & {} \frac{1}{1 - \mu }\log \left( \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \nonumber \right. \\&\times \left. \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \left( \frac{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}^{\lambda }}{\alpha _{I_m, i_j}}}{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}}\right) ^{\frac{\mu - 1}{\lambda - 1}} \right) . \end{aligned}$$The reverse inequalities hold in (34) if the base of \(\log \) is between 0 and 1.
-
(ii)
If \(1 < \mu \) and base of \(\log \) is greater than 1, then:
$$\begin{aligned} S= -\sum _{s=1}^{n}p_i\log (p_i) \ge A_{m, m}^{[13]} \ge A_{m, m-1}^{[13]} \ge \ldots \ge A_{m, 2}^{[13]} \ge A_{m, 1}^{[13]} = H_{\mu }(\mathbf{r }), \end{aligned}$$(35)where
$$\begin{aligned} A_{m, l}^{[13]}= & {} \log (n) + \frac{1}{1 -\mu }\log \left( \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \exp \right. \\&\times \left. \left( \frac{(\mu -1)\sum \nolimits _{j=1}^{l} \frac{r_{i_j}}{\alpha _{I_m, i_j}}\log \left( nr_{i_j}\right) }{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}}\right) \right) ; \end{aligned}$$the base of \(\exp \) is the same as the base of \(\log \). The inequalities in (35) are reversed if the base of \(\log \) is between 0 and 1.
-
(iii)
If \(0 \le \lambda < 1\), and the base of \(\log \) is greater than 1, then:
$$\begin{aligned} H_{\lambda }(\mathbf{r }) \ge A_{m, m}^{[14]} \ge A_{m, m-1}^{[14]} \ge \cdots \ge A_{m, 2}^{[14]} \le A_{m, 1}^{[14]} = S, \end{aligned}$$(36)where
$$\begin{aligned} A_{m, m}^{[14]} = \frac{1}{1 - \lambda } \frac{(m-1)!}{(l-1)!}\sum \limits _{(i_1, \ldots , i_l) \in I_l}\eta _{I_m, l}(i_1, \ldots , i_l) \left( \sum \limits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}\right) \log \left( \frac{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}^{\lambda }}{\alpha _{I_m, i_j}}}{\sum \nolimits _{j=1}^{l}\frac{r_{i_j}}{\alpha _{I_m, i_j}}}\right) . \end{aligned}$$(37)The inequalities in (36) are reversed if the base of \(\log \) is between 0 and 1.
Proof
(i) Suppose \(\mathbf{q }= \frac{\mathbf{1 }}{\mathbf{n }}\); then from (23), we have:
therefore, we have:
Now, using Theorem 4.2 (i) and (39), we get:
(ii) and (iii) can be proved similarly. \(\square \)
Corollary 4.5
Assume \((H_1)\), and let \(\mathbf{r }= (r_1, \ldots , r_n)\) and \(\mathbf{q }= (q_1, \ldots , q_n)\) be positive probability distributions.
If either \(0 \le \lambda < 1\) and the base of \(\log \) is greater than 1, or \(1 < \lambda \) and the base of \(\log \) is between 0 and 1, then:
where
The inequalities in (41) are reversed if either \(0 \le \lambda < 1\) and the base of \(\log \) is between 0 and 1, or \(1 < \lambda \) and the base of \(\log \) is greater than 1.
Proof
The proof is similar to Corollary 4.4 using Theorem 4.3. \(\square \)
5 Inequalities using Zipf–Mandelbrot law
The Zipf–Mandelbrot law is defined as follows (see [20]).
Definition 5.1
Zipf–Mandelbrot law is a discrete probability distribution depending on three parameters \(N \in \{1, 2, \ldots , \}, q \in [0, \infty )\) and \(t > 0\), and is defined by:
where
If the total mass of the law is taken over all \({\mathbb {N}}\), then for \(q \ge 0\), \(t > 1\), \(s \in {\mathbb {N}}\), density function of Zipf–Mandelbrot law becomes:
where
For \(q = 0\), the Zipf–Mandelbrot law. By Corollary 4.4 (iii), we get:
Conclusion 5.2
Assume \((H_1)\); let \(\mathbf{r }\) be a Zipf–Mandelbrot law, by Corollary 4.4 (iii), we get. If \(0 \le \lambda < 1\), and the base of \(\log \) is greater than 1, then:
The inequalities in (46) are reversed if the base of \(\log \) is between 0 and 1.
Conclusion 5.3
Assume \((H_1)\); let \(\mathbf{r }_{1}\) and \(\mathbf{r }_2\) be the Zipf–Mandelbort law with parameters \(N \in \{1, 2, \ldots \}\), \(q_1, q_2 \in [0, \infty )\) and \(s_1, s_2 > 0\), respectively. Then, from Corollary 3.4 (ii), we have: if the base of \(\log \) is greater than 1, then:
The inequalities in (47) are reversed if base of \(\log \) is between 0 and 1.
6 Shannon entropy, Zipf–Mandelbrot law, and hybrid Zipf–Mandelbrot law
Here, we maximize the Shannon entropy using the method of Lagrange multiplier under some equations constraints and get the Zipf–Mandelbrot law.
Theorem 6.1
If \(J = \{1, 2, \ldots , N \}\), for a given \(q \ge 0\), a probability distribution that maximizes the Shannon entropy under the constraints:
is Zipf–Mandelbrot law.
Proof
If \(J = \{1, 2, \ldots , N \}\). We set the Lagrange multipliers \(\lambda \) and t and consider the expression:
Just for the sake of convenience, replace \(\lambda \) by \(\ln \lambda -1\), and thus, the last expression gives:
From \({\widetilde{S}}_{r_s} = 0\), for \(s =1, 2, \ldots , N\), we get:
and on using the constraint \(\sum _{s = 1}^Nr_s = 1 \), we have:
where \(t > 0\), concluding that:
\(\square \)
Remark 6.2
Observe that the Zipf–Mandelbrot law and Shannon entropy can be bounded from above (see [21]):
where \(\left( q_1, \ldots , q_N\right) \) is a positive N-tuple, such that \(\sum _{s=1}^{N}q_s = 1\).
Theorem 6.3
If \(J = \{1, \ldots , N\}\), then probability distribution that maximizes Shannon entropy under constraints
is hybrid Zipf–Mandelbrot law given as:
where
Proof
First, consider \(J = \{1, \ldots , N\}\); we set the Lagrange multiplier and consider the expression:
On setting \({\tilde{S}}_{r_s} = 0\), for \(s= 1, \ldots , N\), we get:
after solving for \(r_s\), we get:
and we recognize this as the partial sum of Lerch’s transcendent that we will denote with:
with \(w \ge 0, k > 0\). \(\square \)
Remark 6.4
Observe that for Zipf–Mandelbrot law, Shannon entropy can be bounded from above (see [21]):
where \(\left( q_1, \ldots , q_N\right) \) is any positive N-tuple, such that \(\sum _{s=1}^{N}q_s = 1\).
Under the assumption of Theorem 2.3 (i), define the non-negative functionals as follows:
Under the assumption of Theorem 2.3 (ii), define the non-negative functionals as follows:
Under the assumption of Corollary 3.2 (i), define the following non-negative functionals:
Under the assumption of Corollary 3.2 (ii), define the following non-negative functionals are given as:
Under the assumption of Corollary 3.4 (i), let us define the non-negative functionals as follows:
Under the assumption of Corollary 3.4 (ii), define the non-negative functionals as follows:
Under the assumption of Theorem 4.2 (i), consider the following functionals:
Under the assumption of Theorem 4.2 (ii), consider the following functionals:
Under the assumption of Theorem 4.2 (iii), consider the following functionals:
Under the assumption of Theorem 4.3, consider the following non-negative functionals:
Under the assumption of Corollary 4.4 (i), consider the following non-negative functionals:
Under the assumption of Corollary 4.4 (ii), consider the following functionals:
Under the assumption of Corollary 4.4 (iii), consider the following functionals:
Under the assumption of Corollary 4.5, defined the following functionals.
7 Generalization of refinement of Jensen-, Rényi-, and Shannon-type inequalities via Lidstone polynomial
We construct some new identities with the help of generalized Lidstone polynomial (6).
Theorem 7.1
Assume \((H_1)\); let \(f: [\alpha _1, \alpha _2] \rightarrow {\mathbb {R}}\) be a function, where \([\alpha _1, \alpha _2] \subset {\mathbb {R}}\), such that be an interval, such that \(f\in C^{2m}[\alpha _1, \alpha _2]\) for \(m \ge 1\). Also let \(x_1, \ldots , x_n \in [\alpha _1, \alpha _2]\) and \(p_1, \ldots , p_n\) be positive real numbers, such that \(\sum \nolimits _{i=1}^{n}p_i=1\), and \({\mathfrak {F}}_m(t)\) are the same as defined in (5), and then:
Proof
Using (6) in place of f in \(\Theta _{i}(f),\) \(i = 1, 2, \ldots , 35\), we get (81). \(\square \)
Theorem 7.2
Assume \((H_1)\); let \(f: [\alpha _1, \alpha _2] \rightarrow {\mathbb {R}}\) be a function, where \([\alpha _1, \alpha _2] \subset {\mathbb {R}}\), such that be an interval, such that \(f \in C^{2m}[\alpha _1, \alpha _2]\) for \( m \ge 1\). Also let \(x_1, \ldots , x_n \in [\alpha _1, \alpha _2]\) and \(p_1, \ldots , p_n\) be positive real numbers, such that \(\sum \nolimits _{i=1}^{n}p_i=1\), and \({\mathfrak {F}}_m(t)\) are the same as defined in (5); let for \(m \ge 1\):
If f is 2m-convex function, then we have:
Proof
Since f is 2m-convex; therefore, \(f^{(2m)} \ge 0\) for all \(x \in [\alpha _1, \alpha _2]\), then using (82) in (81), we get the required result. \(\square \)
Theorem 7.3
Assume \((H_1)\); let \(f: [\alpha _1, \alpha _2] \rightarrow {\mathbb {R}}\) be a function, where \([\alpha _1, \alpha _2] \subset {\mathbb {R}}\) be an interval. Also let \(x_1, \ldots , x_n \in [\alpha _1, \alpha _2]\) and \(p_1, \ldots , p_n\) be positive real numbers, such that \(\sum \nolimits _{i=1}^{n}p_i=1\), and also suppose that \(f:[\alpha _1, \alpha _2] \rightarrow {\mathbb {R}}\) is 2m-convex. Then, the following results are valid.
-
(i)
If m is odd integer, then for every 2m-convex function, (83) holds. (ii) Suppose that (83) holds. If the function
$$\begin{aligned} \lambda (u) = \sum _{l=0}^{m-1}(\alpha _2 - \alpha _1)^{2l}g^{(2l)}(\alpha _1){\mathfrak {F}}_{l}\left( \frac{\alpha _2 - u}{\alpha _2 - \alpha _1}\right) + \sum _{l=0}^{m-1}(\alpha _2 - \alpha _1)^{2l}g^{(2l)}(\alpha _2){\mathfrak {F}}_{l}\left( \frac{u - \alpha _1}{\alpha _2 - \alpha _1}\right) \end{aligned}$$is convex, then the right-hand side of (83) is non-negative and we have:
$$\begin{aligned} \Theta _{i}(f) \ge 0, \,\ \,\ i = 1, 2, \ldots , 35. \end{aligned}$$(84)
Proof
(i) Note that \(G_{1}(u, s) \le 0\) for \(1 \le u, s, \le 1\) and also note that \(G_{m}(u, s) \le 0\) for odd integer m and \(G_{m}(u, s) \ge 0\) for even integer m. As \(G_1\) is convex function and \(G_{m-1}\) is positive for odd integer m, therefore:
This shows that \(G_m\) is convex in the first variable u if m is convex. Similarly, \(G_m\) is concave in the first variable if m is even. Hence, if m is odd, then:
therefore, (84) is valid.
(ii) Using the linearity of \(\Theta _{i}(f)\), we can write the right-hand side of (83) in the form \(\Theta _{i}(\lambda )\). As \(\lambda \) is supposed to be convex, therefore, the right-hand side of (83) is non-negative, and so \(\Theta _{i}(f) \ge 0\). \(\square \)
Remark A
We can investigate the bounds for the identities related to the generalization of refinement of Jensen inequality using inequalities for the C̆ebys̆ev functional and some results relating to the Gr̈uss and Ostrowski-type inequalities can be constructed as given in Section 3 of [5]. Also we can construct the non-negative functionals from inequality (83) and give related mean value theorems and we can construct the new families of m-exponentially convex functions and Cauchy means related to these functionals as given in Section 4 of [5].
References
Anderson, G.; Ge, Y.: The size distribution of Chinese cities. Reg. Sci. Urban Econ. 35(6), 756–776 (2005)
Auerbach, F.: Das Gesetz der Bevölkerungskonzentration. Petermanns Geogr. Mitt. 59, 74–76 (1913)
Black, D.; Henderson, V.: Urban evolution in the USA. J. Econ. Geogr. 3(4), 343–372 (2003)
Bosker, M.; Brakman, S.; Garrestsen, H.; Schramm, M.: A century of shocks: the evolution of the German city size distribution 1925–1999. Reg. Sci. Urban Econ. 38(4), 330–347 (2008)
Butt, S.I.; Khan, K.A.; Pečarić, J.: Generaliztion of Popoviciu inequality for higher order convex function via Tayor’s polynomial. Acta Univ. Apulensis Math. Inform 42, 181–200 (2015)
Butt, S.I.; Mehmood, N.; Pečarić, J.: New generalizations of Popoviciu type inequalities via new green functions and Fink’s identity. Trans. A Razmadze Math. Inst. 171(3), 293–303 (2017)
Butt, S.I.; Pečarić, J.: Popoviciu’s Inequality For \(N\)-Convex Functions. Lap Lambert Academic Publishing, Saarbrücken (2016)
Butt, S.I.; Pečarić, J.: Weighted Popoviciu type inequalities via generalized Montgomery identities. Hrvatske akademije znanosti i umjetnosti: Matematicke znanosti 69–89 (2015)
Butt, S.I.; Khan, K.A.; Pečarić, J.: Popoviciu type inequalities via Hermite’s polynomial. Math. Inequal. Appl. 19(4), 1309–1318 (2016)
Csiszár, I.: Information measures: a critical survey. In: Tans. 7th Prague Conf. on Info. Th., Statist. Decis. Funct., Random Process and 8th European Meeting of Statist., vol. B, pp. 73–86. Academia Prague (1978)
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2, 299–318 (1967)
Horváth, L.: A method to refine the discrete Jensen’s inequality for convex and mid-convex functions. Math. Comput. Model. 54(9–10), 2451–2459 (2011)
Horváth, L.; Khan, K.A.; Pečarić, J.: J. Combinatorial Improvements of Jensens Inequality/Classical and New Refinements of Jensens Inequality with Applications, Monographs in Inequalities 8, Element, Zagreb (2014)
Horváth, L.; Khan, K.A.; Pečarić, J.: Refinement of Jensen’s inequality for operator convex functions. Adv. Inequal. Appl. (2014)
Horváth, L.; Pečarić, J.: A refinement of discrete Jensen’s inequality. Math. Inequal. Appl. 14, 777–791 (2011)
Horváth, L.; Pečarić, Đ.; Pečarić, J.: Estimations of f-and Rényi divergences by using a cyclic refinement of the Jensens inequality. Bull. Malays. Math. Sci. Soc. 1–14 (2017)
Ioannides, Y.M.; Overman, H.G.: Zipf’s law for cities: an empirical examination. Reg. Sci. Urban Econ. 33(2), 127–137 (2003)
Kullback, S.: Information Theory and Statistics. Courier Corporation, Chelmsford (1997)
Kullback, S.; Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lovričević, N.; Pečarić, Đ.; Pečarić, J.: Zipf–Mandelbrot law, f-divergences and the Jensen-type interpolating inequalities. J. Inequal. Appl. 2018(1), 36 (2018)
Matic, M.; Pearce, C.E.; Pečarić, J.: Shannon’s and related inequalities in information theory. In: Survey on Classical Inequalities, pp. 127–164. Springer, Dordrecht (2000)
Pečarić, J.; Proschan, F.; Tong, Y.L.: Convex Functions, Partial Orderings and Statistical Applications. Academic Press, New York (1992)
Rényi, A.: On measure of information and entropy. In: Proceeding of the Fourth Berkely Symposium on Mathematics, Statistics and Probability, pp. 547–561 (1960)
Rosen, K.T.; Resnick, M.: The size distribution of cities: an examination of the Pareto law and primacy. J. Urban Econ. 8(2), 165–186 (1980)
Soo, K.T.: Zipf’s Law for cities: a cross-country investigation. Reg. Sci. Urban Econ. 35(3), 239–263 (2005)
Widder, D.V.: Completely convex function and Lidstone series. Trans. Am. Math. Soc. 51(1942), 387–398 (1942)
Zipf, G.K.: Human behaviour and the principle of least-effort. In: Cambridge MA edn. Addison-Wesley, Reading (1949)
Acknowledgements
The authors wish to thank the anonymous referees for their very careful reading of the manuscript and fruitful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Author contributions
All authors jointly worked on the results and they read and approved the final manuscript.
Conflict of interest
The authors declares that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research of fourth author was supported by the Ministry of Education and Science of the Russian Federation (the Agreement number No. 02.a03.21.0008).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Khan, K.A., Niaz, T., Pečarić, Đ. et al. Estimation of different entropies via Lidstone polynomial using Jensen-type functionals. Arab. J. Math. 9, 613–631 (2020). https://doi.org/10.1007/s40065-020-00277-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40065-020-00277-y
Mathematics Subject Classification
- 94Axx
- 39-XX
- 41A58