1 Introduction

In the last few decades, the classical mathematical inequalities and their generalized versions for convex functions have recorded an exponential growth with significant impact in modern analysis [9, 16, 21, 22, 2628, 30, 31, 33]. They have many applications in numerical quadrature, transform theory, probability, and statistical problems. Specially, they help to establish the uniqueness of the solutions of boundary value problems [8]. In the applied literature of mathematical inequalities, the Jensen inequality is a well-known, paramount and extensively used inequality [24, 7, 13, 32]. This inequality is of pivotal importance, because other classical inequalities, such as Hermite–Hadamard’s, Ky–Fan’s, Beckenbach–Dresher’s, Levinson’s, Minkowski’s, arithmetic–geometric, Young’s and Hölder’s inequalities, can be deduced from this inequality. The Jensen inequality and its generalizations, refinements, extensions and converses etc. have many applications in different fields of science, for example electrical engineering [11], mathematical statistics [23], financial economics [24], information theory, guessing and coding [1, 5, 6, 10, 1215, 1719, 25]. The discrete Jensen inequality can be found in [20], which states that:

If \(T:[\gamma _{1},\gamma _{2}]\rightarrow \mathbb{R}\) is a convex function and \(s_{i}\in [\gamma _{1},\gamma _{2}]\), \(u_{i}\geq 0\) for \(i=1,\ldots ,n\) with \(\sum_{i=1}^{n}u_{i}=U_{n}>0\), then

$$ T \Biggl(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i} \Biggr)\leq \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i}). $$

If the function T is concave then the reverse inequality holds in the above expression.

To derive the main result, we need the following Green function defined on \([\gamma _{1},\gamma _{2}]\times [\gamma _{1},\gamma _{2}]\) [29]:

$$ G(z,x)= \textstyle\begin{cases} \frac{(z-\gamma _{2})(x-\gamma _{1})}{\gamma _{2}-\gamma _{1}},& \gamma _{1}\leq x\leq z, \\ \frac{(x-\gamma _{2})(z-\gamma _{1})}{\gamma _{2}-\gamma _{1}},& z\leq x\leq \gamma _{2}. \end{cases} $$
(1.1)

This function G is continuous and convex with respect to the two variables z and x. Also, the following identity for the function \(T\in C^{2}[\gamma _{1},\gamma _{2}]\) holds, which is related to the Green function (1.1) [29]:

$$ T(z)=\frac{\gamma _{2}-z}{\gamma _{2}-\gamma _{1}}T(\gamma _{1})+ \frac{z-\gamma _{1}}{\gamma _{2}-\gamma _{1}}T(\gamma _{2})+ \int _{ \gamma _{1}}^{\gamma _{2}}G(z,x)T''(x) \,dx. $$
(1.2)

We organize the remaining paper as follows: In Sect. 2, we present a new bound for the Jensen gap for functions whose absolute values of the second derivative are convex, followed by a remark and a proposition presenting a converse of the Hölder inequality. In Sect. 3, we give applications of the main result for the Csiszár f-divergence functional, the Kullback–Leibler divergence, the Bhattacharyya coefficient, the Hellinger distance, the Rényi divergence, the \(\chi ^{2}\)-divergence, the Shannon entropy and triangular discrimination. Section 4 is devoted to the conclusion of the paper.

2 Main result

We begin by presenting our main result.

Theorem 2.1

Let\(T\in C^{2}[\gamma _{1},\gamma _{2}]\)be a function such that\(|T''|\)is convex and\(s_{i}\in [\gamma _{1},\gamma _{2}]\), \(u_{i}\geq 0\)for\(i=1,\ldots ,n\)with\(\sum_{i=1}^{n}u_{i}=U_{n}>0\), then

$$\begin{aligned}& \Biggl\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})-T \Biggl( \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i} \Biggr) \Biggr\vert \\& \quad \leq \frac{ \vert T''(\gamma _{2}) \vert - \vert T''(\gamma _{1}) \vert }{6(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{U_{n}} \sum _{i=1}^{n}u_{i}s^{3}_{i}- \Biggl( \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i} \Biggr)^{3} \Biggr) \\& \qquad {}+ \frac{\gamma _{2} \vert T''(\gamma _{1}) \vert -\gamma _{1} \vert T''(\gamma _{2}) \vert }{2(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}s^{2}_{i}- \Biggl( \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i} \Biggr)^{2} \Biggr). \end{aligned}$$
(2.1)

Proof

Using (1.2) in \(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})\) and \(T (\frac{1}{U_{n}} \sum_{i=1}^{n}u_{i}s_{i} )\), we get

$$ \begin{aligned}[b] &\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i}) \\ &\quad = \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i} \biggl(\frac{\gamma _{2}-s_{i}}{\gamma _{2}-\gamma _{1}}T(\gamma _{1})+ \frac{s_{i}-\gamma _{1}}{\gamma _{2}-\gamma _{1}} T(\gamma _{2})+ \int _{\gamma _{1}}^{\gamma _{2}}G(s_{i},x)T''(x) \,dx \biggr) \end{aligned} $$
(2.2)

and

$$\begin{aligned} T \Biggl(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i} \Biggr) &= \frac{\gamma _{2}-\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i}}{\gamma _{2}-\gamma _{1}} T(\gamma _{1})+ \frac{\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i}-\gamma _{1}}{\gamma _{2}-\gamma _{1}}T( \gamma _{2}) \\ &\quad {}+ \int _{\gamma _{1}}^{\gamma _{2}}G \Biggl(\frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}s_{i},x \Biggr)T''(x)\,dx. \end{aligned}$$
(2.3)

Subtracting (2.3) from (2.2), we obtain

$$\begin{aligned}& \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})-T \Biggl(\frac{1}{U_{n}} \sum_{i=1}^{n}u_{i}s_{i} \Biggr) \\& \quad = \int _{\gamma _{1}}^{\gamma _{2}} \Biggl(\frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}G(s_{i},x) -G \Biggl(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i},x \Biggr) \Biggr)T''(x)\,dx. \end{aligned}$$
(2.4)

Taking the absolute value of (2.4), we get

$$\begin{aligned}& \Biggl\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})-T \Biggl( \frac{1}{U_{n}} \sum_{i=1}^{n}u_{i}s_{i} \Biggr) \Biggr\vert \\& \quad = \Biggl\vert \int _{\gamma _{1}}^{\gamma _{2}} \Biggl(\frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}G(s_{i},x)-G \Biggl(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i},x \Biggr) \Biggr)T''(x)\,dx \Biggr\vert \\& \quad \leq \int _{\gamma _{1}}^{\gamma _{2}} \Biggl\vert \frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}G(s_{i},x)-G \Biggl(\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i},x \Biggr) \Biggr\vert \bigl\vert T''(x) \bigr\vert \,dx. \end{aligned}$$
(2.5)

Using a change of variable we write \(x=t\gamma _{1}+(1-t)\gamma _{2}\), \(t\in [0,1]\). Also, as \(G(z,x)\) is convex, so from (2.5) we have

$$\begin{aligned}& \Biggl\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})-T (\bar{s} ) \Biggr\vert \\& \quad \leq (\gamma _{2}-\gamma _{1}) \int _{0}^{1} \Biggl(\frac{1}{U_{n}} \sum _{i=1}^{n}u_{i}G\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr)-G\bigl(\bar{s},t \gamma _{1}+(1-t)\gamma _{2}\bigr) \Biggr) \\& \qquad {}\times \bigl\vert T''\bigl(t \gamma _{1}+(1-t)\gamma _{2}\bigr) \bigr\vert \,dt, \end{aligned}$$
(2.6)

where \(\bar{s}=\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}s_{i}\).

Since \(|T''|\) is a convex function, (2.6) becomes

$$\begin{aligned}& \Biggl\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}T(s_{i})-T (\bar{s} ) \Biggr\vert \\& \quad \leq (\gamma _{2}-\gamma _{1}) \int _{0}^{1} \Biggl(\frac{1}{U_{n}} \sum _{i=1}^{n}u_{i}G\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr) -G\bigl(\bar{s},t \gamma _{1}+(1-t)\gamma _{2}\bigr) \Biggr) \\& \qquad {}\times \bigl(t \bigl\vert T''(\gamma _{1}) \bigr\vert +(1-t) \bigl\vert T''( \gamma _{2}) \bigr\vert \bigr)\,dt \\& \quad =(\gamma _{2}-\gamma _{1}) \int _{0}^{1} \Biggl(\frac{1}{U_{n}}\sum _{i=1}^{n}u_{i}G\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr)t \bigl\vert T''(\gamma _{1}) \bigr\vert \\& \qquad {}+\frac{1}{U_{n}}\sum_{i=1}^{n}u_{i}G \bigl(s_{i},t\gamma _{1}+(1-t)\gamma _{2}\bigr) (1-t) \bigl\vert T''( \gamma _{2}) \bigr\vert -G\bigl(\bar{s},t\gamma _{1}+(1-t)\gamma _{2}\bigr) \\& \qquad {}\times t \bigl\vert T''(\gamma _{1}) \bigr\vert -G\bigl(\bar{s},t\gamma _{1}+(1-t)\gamma _{2}\bigr) (1-t) \bigl\vert T''( \gamma _{2}) \bigr\vert \Biggr)\,dt \\& \quad =(\gamma _{2}-\gamma _{1}) \Biggl( \bigl\vert T''(\gamma _{1}) \bigr\vert \frac{1}{U_{n}} \sum_{i=1}^{n}u_{i} \int _{0}^{1}tG\bigl(s_{i},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \qquad {}+ \bigl\vert T''(\gamma _{2}) \bigr\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i} \int _{0}^{1}(1-t)G\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \qquad {}- \bigl\vert T''(\gamma _{1}) \bigr\vert \int _{0}^{1}tG \bigl(\bar{s},t\gamma _{1}+(1-t) \gamma _{2} \bigr)\,dt \\& \qquad {}- \bigl\vert T''(\gamma _{2}) \bigr\vert \int _{0}^{1}(1-t)G \bigl(\bar{s},t\gamma _{1}+(1-t) \gamma _{2} \bigr)\,dt \Biggr) \\& \quad=(\gamma _{2}-\gamma _{1}) \Biggl( \bigl\vert T''(\gamma _{1}) \bigr\vert \frac{1}{U_{n}} \sum_{i=1}^{n}u_{i} \int _{0}^{1}tG\bigl(s_{i},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \qquad {}+ \bigl\vert T''(\gamma _{2}) \bigr\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i} \int _{0}^{1}G\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \qquad {}- \bigl\vert T''(\gamma _{2}) \bigr\vert \frac{1}{U_{n}}\sum_{i=1}^{n}u_{i} \int _{0}^{1}tG\bigl(s_{i},t \gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \qquad {}- \bigl\vert T''(\gamma _{1}) \bigr\vert \int _{0}^{1}tG \bigl(\bar{s},t\gamma _{1}+(1-t) \gamma _{2} \bigr)\,dt \\& \qquad {}- \bigl\vert T''(\gamma _{2}) \bigr\vert \int _{0}^{1}G \bigl(\bar{s},t\gamma _{1}+(1-t) \gamma _{2} \bigr)\,dt \\& \qquad {}+ \bigl\vert T''(\gamma _{2}) \bigr\vert \int _{0}^{1}tG \bigl(\bar{s},t\gamma _{1}+(1-t) \gamma _{2} \bigr)\,dt \Biggr). \end{aligned}$$
(2.7)

Now by using the change of variable \(x=t\gamma _{1}+(1-t)\gamma _{2}\) for \(t\in [0,1]\), we obtain

$$\begin{aligned}& \int _{0}^{1}tG\bigl(s_{i},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \quad =\frac{1}{(\gamma _{1}-\gamma _{2})^{3}} \biggl( \frac{\gamma ^{3}_{1}s_{i}}{6}-\frac{\gamma _{1}s^{3}_{i}}{6} - \frac{5\gamma _{2}s^{3}_{i}}{6}- \frac{\gamma _{2}\gamma ^{2}_{1}s_{i}}{2}- \frac{\gamma _{2}\gamma ^{3}_{1}}{6} + \frac{\gamma _{1}\gamma _{2}s^{2}_{i}}{2}- \frac{\gamma ^{2}_{2}s^{2}_{i}}{2} \\& \qquad {}+\frac{\gamma ^{2}_{2}\gamma ^{2}_{1}}{2}+ \frac{\gamma ^{3}_{2}s_{i}}{3}+\gamma _{2}s^{3}_{i}- \frac{\gamma _{1}\gamma ^{3}_{2}}{3} \biggr). \end{aligned}$$
(2.8)

Replacing \(s_{i}\) by in (2.8), we get

$$\begin{aligned}& \int _{0}^{1}tG\bigl(\bar{s},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt \\& \quad =\frac{1}{(\gamma _{1}-\gamma _{2})^{3}} \biggl( \frac{\gamma ^{3}_{1}\bar{s}}{6}-\frac{\gamma _{1}(\bar{s})^{3}}{6} - \frac{5\gamma _{2}(\bar{s})^{3}}{6}- \frac{\gamma _{2}\gamma ^{2}_{1}\bar{s}}{2}- \frac{\gamma _{2}\gamma ^{3}_{1}}{6}+ \frac{\gamma _{1}\gamma _{2}(\bar{s})^{2}}{2} \\& \qquad {}-\frac{\gamma ^{2}_{2}(\bar{s})^{2}}{2}+ \frac{\gamma ^{2}_{2}\gamma ^{2}_{1}}{2} + \frac{\gamma ^{3}_{2}\bar{s}}{3}+\gamma _{2}(\bar{s})^{3}- \frac{\gamma _{1}\gamma ^{3}_{2}}{3} \biggr). \end{aligned}$$
(2.9)

Also,

$$ \int _{0}^{1}G\bigl(s_{i},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt= \frac{(s^{2}_{i}-\gamma _{2}s_{i}-\gamma _{1}s_{i}+\gamma _{1}\gamma _{2})}{2(\gamma _{2}-\gamma _{1})}. $$
(2.10)

Replacing \(s_{i}\) by in (2.10), we get

$$ \int _{0}^{1}G\bigl(\bar{s},t\gamma _{1}+(1-t)\gamma _{2}\bigr)\,dt= \frac{((\bar{s})^{2}-\gamma _{2}\bar{s}-\gamma _{1}\bar{s}+\gamma _{1}\gamma _{2})}{2(\gamma _{2}-\gamma _{1})}. $$
(2.11)

Substituting the values from (2.8)–(2.11) in (2.7) and simplifying, we get the required result (2.1). □

Remark 2.2

If we use the Green functions \(G_{1}\)\(G_{4}\) as given in [29] instead of G in Theorem 2.1, we obtain the same result (2.1).

As an application of the above result, we derive a converse of the Hölder inequality in the following proposition.

Proposition 2.3

Let\(q>1\), \(p\in \mathbb{R}^{+}-\{(2,3)\cup (0,1]\}\)such that\(\frac{1}{q}+\frac{1}{p}=1\). Also, let\([\gamma _{1},\gamma _{2}]\)be a positive interval and\((a_{1},\ldots ,a_{n})\), \((b_{1},\ldots ,b_{n})\)be two positive n-tuples such that\(\frac{\sum_{i=1}^{n}a_{i}b_{i}}{\sum_{i=1}^{n}b^{q}_{i}}\), \(a_{i}b_{i}^{- \frac{q}{p}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\), then

$$\begin{aligned}& \Biggl(\sum_{i=1}^{n}a_{i}^{p} \Biggr)^{\frac{1}{p}} \Biggl(\sum_{i=1}^{n}b_{i}^{q} \Biggr)^{\frac{1}{q}}-\sum_{i=1}^{n}a_{i}b_{i} \\& \quad \leq \Biggl( \frac{p(p-1)(\gamma _{2}^{p-2}-\gamma _{1}^{p-2})}{6(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}^{3}b_{i}^{1-2 \frac{q}{p}} - \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum_{i=1}^{n}a_{i}b_{i} \Biggr)^{3} \Biggr) \\& \qquad {}+ \frac{p(p-1)(\gamma _{2}\gamma _{1}^{p-2}-\gamma _{1}\gamma _{2}^{p-2})}{2(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}^{2}b_{i}^{1- \frac{q}{p}} \\& \qquad {}- \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}b_{i} \Biggr)^{2} \Biggr) \Biggr)^{\frac{1}{p}}\sum _{i=1}^{n}b_{i}^{q}. \end{aligned}$$
(2.12)

Proof

Let \(T(x)=x^{p}\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(T''(x)=p(p-1)x^{p-2}>0\) and \(|T''|''(x)=p(p-1)(p-2)(p-3)x^{p-4}>0\). This shows that T and \(|T''|\) are convex functions, therefore, using (2.1) for \(T(x)=x^{p}\), \(u_{i}=b_{i}^{q}\) and \(s_{i}=a_{i}b_{i}^{-\frac{q}{p}}\), we derive

$$\begin{aligned}& \Biggl( \Biggl(\sum_{i=1}^{n}a_{i}^{p} \Biggr) \Biggl(\sum_{i=1}^{n}b_{i}^{q} \Biggr)^{p-1}- \Biggl(\sum_{i=1}^{n}a_{i}b_{i} \Biggr)^{p} \Biggr)^{ \frac{1}{p}} \\& \quad \leq \Biggl( \frac{p(p-1)(\gamma _{2}^{p-2}-\gamma _{1}^{p-2})}{6(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}^{3}b_{i}^{1-2 \frac{q}{p}} - \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum_{i=1}^{n}a_{i}b_{i} \Biggr)^{3} \Biggr) \\& \qquad {}+ \frac{p(p-1)(\gamma _{2}\gamma _{1}^{p-2}-\gamma _{1}\gamma _{2}^{p-2})}{2(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}^{2}b_{i}^{1- \frac{q}{p}} \\& \qquad {}- \Biggl(\frac{1}{\sum_{i=1}^{n}b_{i}^{q}}\sum _{i=1}^{n}a_{i}b_{i} \Biggr)^{2} \Biggr) \Biggr)^{\frac{1}{p}}\sum _{i=1}^{n}b_{i}^{q}. \end{aligned}$$
(2.13)

By utilizing the inequality \(x^{\alpha }-y^{\alpha }\leq (x-y)^{\alpha }\), \(0\leq y\leq x\), \(\alpha \in [0,1]\) for \(x= (\sum_{i=1}^{n}a_{i}^{p} ) (\sum_{i=1}^{n}b_{i}^{q} )^{p-1}\), \(y= (\sum_{i=1}^{n}a_{i}b_{i} )^{p}\) and \(\alpha =\frac{1}{p}\), we obtain

$$\begin{aligned}& \Biggl(\sum_{i=1}^{n}a_{i}^{p} \Biggr)^{\frac{1}{p}} \Biggl(\sum_{i=1}^{n}b_{i}^{q} \Biggr)^{\frac{1}{q}}-\sum_{i=1}^{n}a_{i}b_{i} \\& \quad \leq \Biggl( \Biggl(\sum_{i=1}^{n}a_{i}^{p} \Biggr) \Biggl(\sum_{i=1}^{n}b_{i}^{q} \Biggr)^{p-1}- \Biggl(\sum_{i=1}^{n}a_{i}b_{i} \Biggr)^{p} \Biggr)^{ \frac{1}{p}}. \end{aligned}$$
(2.14)

Now using (2.14) in (2.13), we get (2.12). □

3 Applications in information theory

Definition 3.1

(Csiszár f-divergence)

Let \([\gamma _{1},\gamma _{2}]\subset \mathbb{R}\) and \(f:[\gamma _{1},\gamma _{2}]\rightarrow \mathbb{R}\) be a function, then, for \(\mathbf{r}=(r_{1},\ldots ,r_{n})\in \mathbb{R}^{n}\) and \(\mathbf{w}=(w_{1},\ldots ,w_{n})\in \mathbb{R}^{n}_{+}\) such that \(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\) (\(i=1,\ldots ,n\)), the Csiszár f-divergence functional is defined as [17, 25]

$$ \bar{D}_{c}(\mathbf{r},\mathbf{w})=\sum _{i=1}^{n}w_{i}f \biggl( \frac{r_{i}}{w_{i}} \biggr). $$

Theorem 3.2

Let\(f\in C^{2}[\gamma _{1},\gamma _{2}]\)be a function such that\(|f''|\)is convex and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\in \mathbb{R}^{n}\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\in \mathbb{R}^{n}_{+}\)such that\(\frac{\sum_{i=1}^{n}r_{i}}{\sum_{i=1}^{n}w_{i}}, \frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\), then

$$\begin{aligned}& \biggl\vert \frac{1}{\sum_{i=1}^{n}w_{i}}\bar{D}_{c}(\mathbf{r},\mathbf{w})-f \biggl(\frac{\sum_{i=1}^{n}r_{i}}{\sum_{i=1}^{n}w_{i}} \biggr) \biggr\vert \\& \quad \leq \frac{ \vert f''(\gamma _{2}) \vert - \vert f''(\gamma _{1}) \vert }{6(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}w_{i}}\sum _{i=1}^{n} \frac{r_{i}^{3}}{w^{2}_{i}}- \biggl( \frac{\sum_{i=1}^{n}r_{i}}{\sum_{i=1}^{n}w_{i}} \biggr)^{3} \Biggr) \\& \qquad {}+ \frac{\gamma _{2} \vert f''(\gamma _{1}) \vert -\gamma _{1} \vert f''(\gamma _{2}) \vert }{2(\gamma _{2}-\gamma _{1})} \Biggl(\frac{1}{\sum_{i=1}^{n}w_{i}}\sum _{i=1}^{n} \frac{r_{i}^{2}}{w_{i}}- \biggl( \frac{\sum_{i=1}^{n}r_{i}}{\sum_{i=1}^{n}w_{i}} \biggr)^{2} \Biggr). \end{aligned}$$
(3.1)

Proof

The result (3.1) can easily be deduced from (2.1) by choosing \(T=f\), \(s_{i}=\frac{r_{i}}{w_{i}}\), \(u_{i}= \frac{w_{i}}{\sum_{i=1}^{n}w_{i}}\). □

Definition 3.3

(Rényi divergence)

For two positive probability distributions \(\mathbf{r}=(r_{1},\ldots , r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\) and a nonnegative real number μ such that \(\mu \neq 1\), the Rényi divergence is defined as [17, 25]

$$ D_{re}(\mathbf{r},\mathbf{w})=\frac{1}{\mu -1}\log \Biggl(\sum _{i=1}^{n}r_{i}^{ \mu }w_{i}^{1-\mu } \Biggr). $$

Corollary 3.4

Let\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\). Also let\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)be positive probability distributions and\(\mu >1\)such that\(\sum_{i=1}^{n}w_{i} (\frac{r_{i}}{w_{i}} )^{\mu }, ( \frac{r_{i}}{w_{i}} )^{\mu -1}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$\begin{aligned}& D_{re}(\mathbf{r},\mathbf{w})-\frac{1}{\mu -1}\sum _{i=1}^{n}r_{i} \log \biggl( \frac{r_{i}}{w_{i}} \biggr)^{\mu -1} \\& \quad \leq \frac{\gamma _{1}+\gamma _{2}}{6(1-\mu )\gamma ^{2}_{1}\gamma _{2}^{2}} \Biggl(\sum_{i=1}^{n}r_{i} \biggl(\frac{r_{i}}{w_{i}} \biggr)^{3(\mu -1)}- \Biggl(\sum _{i=1}^{n}r_{i}^{\mu }w_{i}^{1-\mu } \Biggr)^{3} \Biggr) \\& \qquad {}+ \frac{\gamma ^{2}_{1}+\gamma _{1}\gamma _{2}+\gamma ^{2}_{2}}{2(\mu -1)\gamma ^{2}_{1}\gamma _{2}^{2}} \Biggl(\sum _{i=1}^{n}r_{i} \biggl(\frac{r_{i}}{w_{i}} \biggr)^{2(\mu -1)}- \Biggl(\sum_{i=1}^{n}r_{i}^{\mu }w_{i}^{1-\mu } \Biggr)^{2} \Biggr). \end{aligned}$$
(3.2)

Proof

Let \(T(x)=-\frac{1}{\mu -1}\log x\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(T''(x)=\frac{1}{(\mu -1)x^{2}}>0\) and \(|T''|''(x)=\frac{6}{(\mu -1)x^{4}}>0\). This shows that T and \(|T''|\) are convex functions, therefore using (2.1) for \(T(x)=-\frac{1}{\mu -1}\log x\), \(u_{i}=r_{i}\) and \(s_{i}= (\frac{r_{i}}{w_{i}} )^{\mu -1}\), we derive (3.2). □

Definition 3.5

(Shannon entropy)

For a positive probability distribution \(\mathbf{w}=(w_{1},\ldots ,w_{n})\), the Shannon entropy (information divergence) is defined as [17, 25]

$$ E_{s}(\mathbf{w})=-\sum_{i=1}^{n}w_{i} \log w_{i}. $$

Corollary 3.6

Let\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{w}=(w_{1},\ldots ,w_{n})\)be a positive probability distribution such that\(\frac{1}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$ \log n-E_{s}(\mathbf{w})\leq \frac{\gamma ^{2}_{1}+\gamma _{1}\gamma _{2}+\gamma ^{2}_{2}}{2\gamma ^{2}_{1}\gamma _{2}^{2}} \Biggl(\sum_{i=1}^{n}\frac{1}{w_{i}}-n^{2} \Biggr)- \frac{\gamma _{1}+\gamma _{2}}{6\gamma ^{2}_{1}\gamma _{2}^{2}} \Biggl( \sum_{i=1}^{n} \frac{1}{w^{2}_{i}}-n^{3} \Biggr). $$
(3.3)

Proof

Let \(f(x)=-\log x\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=\frac{1}{x^{2}}>0\) and \(|f''|''(x)=\frac{6}{x^{4}}>0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=-\log x\) and \((r_{1},\ldots ,r_{n})=(1,\ldots ,1)\), we get (3.3). □

Definition 3.7

(Kullback–Leibler divergence)

For two positive probability distributions \(\mathbf{r}=(r_{1},\ldots ,r_{n})\) and \(\mathbf{w}=(w_{1},\ldots ,w_{n})\), the Kullback–Leibler divergence is defined as [17, 25]

$$ D_{kl}(\mathbf{r},\mathbf{w})=\sum_{i=1}^{n}r_{i} \log \frac{r_{i}}{w_{i}}. $$

Corollary 3.8

Let\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)be positive probability distributions such that\(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$ D_{kl}(\mathbf{r},\mathbf{w})\leq \frac{\gamma _{1}+\gamma _{2}}{2\gamma _{1}\gamma _{2}} \Biggl(\sum_{i=1}^{n} \frac{r_{i}^{2}}{w_{i}}-1 \Biggr)-\frac{1}{6\gamma _{1}\gamma _{2}} \Biggl(\sum_{i=1}^{n} \frac{r_{i}^{3}}{w^{2}_{i}}-1 \Biggr). $$
(3.4)

Proof

Let \(f(x)=x\log x\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=\frac{1}{x}>0\) and \(|f''|''(x)=\frac{2}{x^{3}}>0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=x\log x\), we get (3.4). □

Definition 3.9

(\(\chi ^{2}\)-divergence)

Let \(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\) be positive probability distributions, then \(\chi ^{2}\)-divergence is defined as [25]:

$$ D_{\chi ^{2}}(\mathbf{r},\mathbf{w})=\sum_{i=1}^{n} \frac{(r_{i}-w_{i})^{2}}{w_{i}}. $$

Corollary 3.10

If\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)are two positive probability distributions such that\(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\), then

$$ D_{\chi ^{2}}(\mathbf{r},\mathbf{w})\leq \Biggl(\sum _{i=1}^{n} \frac{r_{i}^{2}}{w_{i}}-1 \Biggr). $$
(3.5)

Proof

Let \(f(x)=(x-1)^{2}\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=2>0\) and \(|f''|''(x)=0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=(x-1)^{2}\), we obtain (3.5). □

Definition 3.11

(Bhattacharyya coefficient)

Bhattacharyya coefficient for two positive probability distributions \(\mathbf{r}=(r_{1},\ldots ,r_{n})\) and \(\mathbf{w}=(w_{1},\ldots ,w_{n})\) is defined by [25]

$$ C_{b}(\mathbf{r},\mathbf{w})=\sum_{i=1}^{n} \sqrt{r_{i}w_{i}}. $$

The Bhattacharyya distance is given by \(D_{b}(\mathbf{r},\mathbf{w})=-\log C_{b}(\mathbf{r},\mathbf{w})\).

Corollary 3.12

Let\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)be two positive probability distributions such that\(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$\begin{aligned}& 1-C_{b}(\mathbf{r},\mathbf{w}) \\& \quad \leq \frac{\gamma ^{\frac{3}{2}}_{1}-\gamma ^{\frac{3}{2}}_{2}}{24\gamma ^{\frac{3}{2}}_{1}\gamma _{2}^{\frac{3}{2}}(\gamma _{2}-\gamma _{1})} \Biggl(\sum _{i=1}^{n}\frac{r_{i}^{3}}{w^{2}_{i}}-1 \Biggr) + \frac{\gamma ^{\frac{5}{2}}_{2}-\gamma ^{\frac{5}{2}}_{1}}{8\gamma ^{\frac{3}{2}}_{1}\gamma _{2}^{\frac{3}{2}}(\gamma _{2}-\gamma _{1})} \Biggl(\sum_{i=1}^{n} \frac{r_{i}^{2}}{w_{i}}-1 \Biggr). \end{aligned}$$
(3.6)

Proof

Let \(f(x)=-\sqrt{x}\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=\frac{1}{4x^{\frac{3}{2}}}>0\) and \(|f''|''(x)=\frac{15}{16x^{\frac{7}{2}}}>0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=-\sqrt{x}\), we obtain (3.6). □

Definition 3.13

(Hellinger distance)

For two positive probability distributions \(\mathbf{r}=(r_{1},\ldots , r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\) the Hellinger distance is defined as [25]

$$ D^{2}_{h}(\mathbf{r},\mathbf{w})=\frac{1}{2}\sum _{i=1}^{n}(\sqrt{r_{i}}- \sqrt{w_{i}})^{2}. $$

Corollary 3.14

If\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)are positive probability distributions such that\(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$\begin{aligned}& D^{2}_{h}(\mathbf{r},\mathbf{w}) \\& \quad \leq \frac{\gamma ^{\frac{3}{2}}_{1}-\gamma ^{\frac{3}{2}}_{2}}{24\gamma ^{\frac{3}{2}}_{1}\gamma _{2}^{\frac{3}{2}}(\gamma _{2}-\gamma _{1})} \Biggl(\sum _{i=1}^{n}\frac{r_{i}^{3}}{w^{2}_{i}}-1 \Biggr) + \frac{\gamma ^{\frac{5}{2}}_{2}-\gamma ^{\frac{5}{2}}_{1}}{8\gamma ^{\frac{3}{2}}_{1}\gamma _{2}^{\frac{3}{2}}(\gamma _{2}-\gamma _{1})} \Biggl(\sum_{i=1}^{n} \frac{r_{i}^{2}}{w_{i}}-1 \Biggr). \end{aligned}$$
(3.7)

Proof

Let \(f(x)=\frac{1}{2}(1-\sqrt{x})^{2}\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=\frac{1}{4x^{\frac{3}{2}}}>0\) and \(|f''|''(x)=\frac{15}{16x^{\frac{7}{2}}}>0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=\frac{1}{2}(1-\sqrt{x})^{2}\), we deduce (3.7). □

Definition 3.15

(Triangular discrimination)

For two positive probability distributions \(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\), the triangular discrimination is defined as [25]

$$ D_{\triangle }(\mathbf{r},\mathbf{w})=\sum_{i=1}^{n} \frac{(r_{i}-w_{i})^{2}}{r_{i}+w_{i}}. $$

Corollary 3.16

Let\([\gamma _{1},\gamma _{2}]\subseteq \mathbb{R}^{+}\)and\(\mathbf{r}=(r_{1},\ldots ,r_{n})\), \(\mathbf{w}=(w_{1},\ldots ,w_{n})\)be positive probability distributions such that\(\frac{r_{i}}{w_{i}}\in [\gamma _{1},\gamma _{2}]\)for\(i=1,\ldots ,n\). Then

$$\begin{aligned} D_{\triangle }(\mathbf{r},\mathbf{w})&\leq \frac{4 ((\gamma _{1}+1)^{3}-(\gamma _{2}+1)^{3} )}{3(\gamma _{1}+1)^{3}(\gamma _{2}+1)^{3} (\gamma _{2}-\gamma _{1})} \Biggl(\sum_{i=1}^{n} \frac{r_{i}^{3}}{w^{2}_{i}}-1 \Biggr) \\ &\quad {}+ \frac{4 (\gamma _{2}(\gamma _{2}+1)^{3}-\gamma _{1}(\gamma _{1}+1)^{3} )}{(\gamma _{1}+1)^{3}(\gamma _{2}+1)^{3}(\gamma _{2}-\gamma _{1})} \Biggl(\sum _{i=1}^{n}\frac{r_{i}^{2}}{w_{i}}-1 \Biggr). \end{aligned}$$
(3.8)

Proof

Let \(f(x)=\frac{(x-1)^{2}}{(x+1)}\), \(x\in [\gamma _{1},\gamma _{2}]\), then \(f''(x)=\frac{8}{(x+1)^{3}}>0\) and \(|f''|''(x)=\frac{96}{(x+1)^{5}}>0\). This shows that f and \(|f''|\) are convex functions, therefore using (3.1) for \(f(x)=\frac{(x-1)^{2}}{(x+1)}\), we get (3.8). □

4 Conclusion

A growing interest in applying the notion of convexity to various fields of science has been recorded, in the last few decades. Convex functions have some rational properties such as differentiability, monotonicity and continuity, which help in their applications. The Jensen inequality generalizes and improves the notion of classical convexity. This inequality and its extensions, improvements, refinements and converses etc and bounds for its gap resolve some difficulties in the modeling of some physical phenomena. For such a purpose, in this paper we have derived a new bound for the Jensen gap for functions whose absolute value of second derivative are convex. Based on this bound, we have deduced a new converse of the Hölder inequality as well. Finally, we have demonstrated new bounds for the Csiszár, Rényi, \(\chi ^{2}\) and Kullback–Leibler divergences etc. in information theory as applications of the main result. The idea and technique used in this paper may be extended to other inequalities to reduce the number of difficulties in the applied literature of mathematical inequalities.