Abstract
Direct integration of the Riemann–Stieltjes integral has been used to computing convolution integrals. This approach has been established to be simple and accurate with good convergence property. In this paper, we used some numerical methods to estimation of entropy of a continuous random variable and then some estimators are introduced. Bounds on the error terms are derived for some direct Riemann–Stieltjes integration methods. Consistency of estimators is proved and by simulation, the proposed estimators are compared with some prominent estimators, namely Correa (Commun Stat Theory Methods 24:2439–2449, 1995), Ebrahimi et al. (Stat Probab Lett 20:225–234, 1994), van Es (Scand J Stat 19:61–72, 1992) and Vasicek (J R Stat Soc B 38:54–59, 1976). The results indicate that the proposed estimators have smaller mean squared error than other estimators.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The entropy H(f) of the random variable X with distribution function F(x) and continuous density function f(x) is defined by [23] to be
Estimation of entropy from a random sample has been considered by many authors. For discrete random variables, References [8, 15, 29] have proposed estimators of entropy, while [5, 9, 11, 14, 16, 27, 28] have proposed solutions for the problem of estimating the entropy for continuous random variables.
Vasicek [28] expressed (1.1) as
and then by replacing the distribution function F by the empirical distribution function \(F_n,\) and using a difference operator instead of the differential operator proposed an entropy estimator. He also estimated the derivative of \(F^{1}(p)\) by a function of the order statistics. Assuming that \(X_1,\ldots ,X_n\) is a random sample, Vasicek’s estimator is
where the window size m is a positive integer smaller than \(n/2,\, X_{(i)} =X_{(1)}\) if \(i<1,\,X_{(i)} =X_{(n)}\) if \(i>n\) and \(X_{(1)} \le X_{(2)} \le \cdots \le X_{(n)}\) are order statistics based on a random sample of size n. Vasicek proved that his estimator is consistent, i.e., \(\textit{HV}_{mn} \xrightarrow {\Pr \!.}H(f)\) as \(n\rightarrow \infty , \, m\rightarrow \infty ,\,\frac{m}{n}\rightarrow 0.\)
Van Es [27] using spacings introduced an estimator of entropy and proved the consistency and asymptotic normality of this estimator under some conditions. Van Es’ estimator is given by
Ebrahimi et al. [9] modified Vasicek’s estimator and then proposed the following estimator.
where
They showed that \(\textit{HE}_{mn} \xrightarrow {{\Pr \!.}}H(f)\) as \(n\rightarrow \infty ,\,m\rightarrow \infty ,\,m/n\rightarrow 0.\) Then they showed that their estimator has smaller bias and mean squared error than Vasicek’s estimator, by simulation.
Correa [5] proposed a modification of Vasicek estimator as follows:
He considered the sample information as
and written Eq. (1.2) as
Then he noted that the argument of log is the equation of the slope of the straight line that joins the points \((F_n (X_{(i+m)} ),\,X_{(i+m)} )\) and \((F_n (X_{(im)} ),\,X_{(im)}),\) and therefore applied a local linear model based on \(2m+1\) points to estimate the density of F(x) in the interval \((X_{(i+m)},\,X_{(im)}),\)
Via the least square method, he proposed an estimator of entropy as
where
He compared his estimator with Vasicek’s and van Es’s estimators and concluded the mean square error (MSE) of his estimator is smaller than the MSE of Vasicek’s estimator. Also for some m his estimator behaves better than van Es estimator.
Entropy estimators are used to developing entropybased statistical procedures. See for example, [1, 2, 10, 13, 20, 21]. Therefore, new entropy estimators can be useful in practice.
For many computational problems in applied probability and statistics, we have to compute the Riemann–Stieltjes integral of the following form
where the function g(x) is usually a distribution function.
Direct integration of the Riemann–Stieltjes integral can be used to computing convolution integrals. This approach has been established to be simple and accurate with good convergence property.
The above integral can be approximated directly using the definition of the Riemann–Stieltjes integral. The function f(x) may approximate by a piecewise constant function \(\tilde{f}(x)\) and by transforming the integration to a summation, we have
Therefore, a numerical algorithm can be obtained based on the above formula.
This procedure is used by [30] in solving renewal equation. Independently, a similar idea is used by [26]. Numerical results in these papers show that direct RSintegration is very simple and accurate compared with the existing algorithms ([6, 19], for example) that are more complicated. Some applications of this approach can be found in [4, 17, 25]. Xie et al. [31] developed further methodologies of the Riemann–Stieltjes integration under general conditions.
In Sect. 2, some direct integration methods and the bounds on the errors are given. In Sect. 3, numerical methods are used to estimation of entropy of continuous random variables. We show that the proposed estimators are consistent. Scale invariance of variance and mean squared error of the estimators are established. In the last section we present results of a comparison of the proposed estimators with the competing estimators by a simulation study.
2 Some Direct Integration Methods
Let the interval \(I=[a,\,b]\) be partitioned into n equal length intervals denoted by \(I_i =[x_{i1},\,x_i ], \,i=1,\ldots ,n\) where \(x_0 =a\) and \(x_i =x_{i1} +h,\) where \(h=(ba)/n\) is the length of each interval. By using the direct Riemann–Stieltjes integration method ([30]), we approximate the function f(x) with a function \(\tilde{f}(x)\) and then use the definition of the Riemann–Stieltjes integral. Similar to the case of rectangle and trapezoidal rules for the Riemann integral, Xie et al. [31] distinguished between the midpoint and mean value RSintegration method, and described these methods as follows.
2.1 The Midpoint Rectangle RSIntegration
By the rectangle RSintegration method the function f(x) is approximated by its value at the midpoint
of each interval. Then
Numerical accuracy and convergence are demonstrated in [30].
2.2 The MeanValue Rectangle RSIntegration
Another method to approximate f(x) is to use the mean value of f(x) at each interval. We approximate the function f(x) at the ith interval by the mean value of the endpoints \((f(x_i )+f(x_{i1} ))/2.\) In this case, the Eq. (1.4) is approximated by
Since the mean value of f(x) at points \(x_i\) and \(x_{i1}\) is used in approximating f(x), we call the mean value rectangle RSintegration method or the generalized trapezoidal rule. For the Riemann integral, this method is usually called the trapezoidal rule since it is equivalent to the approximating f(x) by a piecewise linear function. This is not the case for Riemann–Stieltjes integral unless g(x) is a linear function.
This method is a simple method that gives satisfactory results and it has been used by [3, 26].
2.3 The Generalized Simpson Rule
Xie et al. [31] generalized other integration method to Riemann–Stieltjes equation (1.4) which is called the generalized Simpson rule.
Let the interval \(I=[a,\,b]\) be divided into 2n equal parts of length and let \(h=(ba)/2n.\) Now the Eq. (1.4) approximated by
where \(Q_n\) is given by the meanvalue rectangle equation (2.1) with respect to n subintervals of length \((ba)/n\) and \(Q_{2n}\) is the rule (2.1) for 2n subintervals of length \((ba)/2n.\) Then
is considered as generalized Simpson rule.
2.4 General Bounds on the Truncation Errors
Numerical results show that all the mention methods can be used successfully, see, e.g., [4, 7, 26, 30]. In [3] some convergence results are presented and then [31] derived some further bounds on the truncation errors of the method under very general assumptions. They only assumed that g(x) is an increasing function which is the usual assumption used for defining the Riemann–Stieltjes integral ([22], p. 122). The following theorems have proved by [31].
Theorem 1
For the midpoint method, the global truncation error \( \upvarepsilon \) may be bounded by
assuming f(x) is continuously differentiable and g(x) is an increasing function.
Remark
The convergence of this method is thus of order 1. As the number of intervals n increases, the error will decrease at least as fast as 1 / n.
Theorem 2
Under the same conditions as in Theorem 1, the global truncation error may be bounded by
for the mean value method.
Theorem 3
Under the same conditions as in Theorem 1, the global truncation error \( \upvarepsilon \) for the generalized Simpson rule may be bounded by
3 Estimation of Entropy
Suppose \(X_1,\ldots ,X_n\) are order statistics of a random sample of size n from an unknown continuous distribution F with a probability density function f(x). We use the methods discussed in previous section to estimate the entropy H(f) of an unknown continuous probability density function f. Using these methods the following approximations for entropy can be derived.

(1)
\(H(f)=\int {\log f(x)dF(x)} \approx \sum \limits _{i=1}^n {\log f(x_i)\{F(x_i )F(x_{i1} )\}} =H_1,\)

(2)
\(H(f)\approx \sum \limits _{i=1}^n {\log f\left( \frac{x_i +x_{i1}}{2}\right) \{F(x_i )F(x_{i1} )\}} =H_2,\)

(3)
\(H(f)\approx \sum \limits _{i=1}^n {\frac{\log f(x_{i1} )+\log f(x_i )}{2}\{F(x_i )F(x_{i1} )\}} =H_3,\)

(4)
\(H(f)\approx \frac{4Q_{2n} }{3}\frac{Q_n }{3}=H_4.\)
Now, we replace \(F(x_i)\) and \(f(x_i)\) with i / n (the empirical distribution function) and \(\hat{{f}}(x_i )\) (the kernel density estimate), respectively. Therefore, we obtain
The kernel density function estimator is wellknown and is defined by
where h is a bandwidth and k is a kernel function which satisfies
Usually, k will be a symmetric probability density function (see, [24]).
Here, the kernel function is chosen the standard normal density function and the bandwidth h is chosen the normal optimal smoothing formula, \(h=1.06sn^{\frac{1}{5}},\) where s is the sample standard deviation.
Using Theorems 1–3, the global truncation error \( \upvarepsilon \) can be obtained as follows.
We have
therefore,
So, for the midpoint, mean value and generalized Simpson methods, the global truncation error \( \upvarepsilon \) may is bounded by
respectively.
The scale of the random variable X has no effect on the accuracy of \(H_1,\,H_2,\,H_3\) or \(H_4\) in estimating H(f). The following theorem shows this fact. Similar results have been obtained for \(\textit{HV}_{mn}\) and \(\textit{HE}_{mn}\) by [18] and [9], respectively.
Theorem 3.1
Let \(X_1,\ldots ,X_n\) be a sequence of i.i.d. random variables with entropy \(H^{X}(f)\) and let \(Y_i =cX_i,\,i=1,\ldots ,n,\) where \(c>0.\) Let \(H_1^X\) and \(H_1^Y\) be entropy estimators for \(H^{X}(f)\) and \(H^{Y}(g),\) respectively (here g is pdf of \(Y=cX).\) Then the following properties hold.

(i)
\(E( {H_1^Y })=E( {H_1^X } )+\log c,\)

(ii)
\(Var( {H_1^X })=Var( {H_1^X }),\)

(iii)
\(MSE( {H_1^X })=MSE( {H_1^X }).\)
Proof
We have
where \(h_y =1.06s_y n^{1/5}=1.06cs_x n^{\frac{1}{5}}=ch_x.\)
Therefore, \(H_1^Y =H_1^X +\log c,\) and the theorem is established.
The above theorem is hold for the other proposed estimators.
The following theorem establishes the consistency of estimators.
Theorem 3.2
Let C be the class of continuous densities with finite entropies and let \(X_1,\ldots ,X_n\) be a random sample from \(f\in C.\) If \(n\rightarrow \infty ,\) then
Proof
It is obvious by consistency of \(\hat{{f}}(x)\) and continuity of \(\hat{{f}}(x).\)
4 Simulation Study
A simulation study is carried out to analyze the behavior of the proposed entropy estimators, \(H_i,\,i=1,\,2,\,3,\,4.\) The proposed estimators are compared with some prominent estimators, namely Vasicek’s estimator [28], van Es’s estimator [27], Ebrahimi et al.’s estimator [9] and Correa’s estimator [5]. For comparisons, we consider normal, exponential and uniform distributions which are the same three distributions considered in [5]. For each sample size 10,000 samples were generated and the RMSEs of the estimators are computed.
For competitor estimators, we chose value of m using the following heuristic formula (see [12]):
Tables 1, 2 and 3 present the RMSE values (and standard deviation) of the eight estimators at different sample size.
We can see from Tables 1, 2 and 3 that the proposed estimators compare favorably with their competitors. Also, note that the proposed estimators do not depend on m. In general, the proposed estimators have a good performance for small sample sizes.
5 Conclusion
In this paper, we first have described some direct integration methods and the bounds on the errors have been given. We next have introduced some estimators of entropy of a continuous random variable using numerical methods and the bounds on the errors of estimators are obtained. We then have compared the proposed estimators with some prominent existing estimators and observed that for small sample sizes the new estimators behave better than the competitors. The advantage of the proposed estimators were the fact that they do not depend on the window size m, unlike the competitor estimators.
References
Alizadeh Noughabi H, Arghami NR (2013) General treatment of goodnessoffit tests based on Kullback–Leibler information. J Stat Comput Simul 83:1556–1569
Balakrishnan N, Habibi Rad A, Arghami NR (2007) Testing exponentiality based on Kullback–Leibler information with progressively typeII censored data. IEEE Trans Reliab 56:301–307
Boehme TK, Preuss W, van der Wall V (1991) On a simple numerical method for computing Stieltjes integrals in reliability theory. Probab Eng Inf Sci 5:113–128
Chaudhry ML (1995) On computations of the mean and variance of the number of renewals—a unified approach. J Oper Res Soc 46:1352–1364
Correa JC (1995) A new estimator of entropy. Commun Stat Theory Methods 24:2439–2449
Deligonul ZS, Bilgen S (1984) Solution of the Volterra equation of renewal theory with the Galerkin technique using cubic splines. J Stat Comput Simul 20:37–45
Den Iseger PW, Smith MAJ, Dekker R (1997) Computing compound distribution faster!. Insur Math Econ 20:23–34
Dobrushin RL (1958) Simplified method of experimental estimate of entropy of stationary sequence. Theory Probab Appl 3:462–464
Ebrahimi N, Pflughoeft K, Soofi E (1994) Two measures of sample entropy. Stat Probab Lett 20:225–234
Esteban MD, Castellanos ME, Morales D, Vajda I (2001) Monte Carlo comparison of four normality tests using different entropy estimates. Commun Stat Simul Comput 30:761–785
Goria MN, Leonenko NN, Mergel VV, Novi Inverardi PL (2005) A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Nonparametr Stat 17:277–297
Grzegorzewski P, Wieczorkowski R (1999) Entropybased goodnessoffit test for exponentiality. Commun Stat Theory Methods 28:1183–1202
Habibi Rad A, Yousefzadeh F, Balakrishnan N (2011) Goodnessoffit test based on Kullback–Leibler information for progressively TypeII censored data. IEEE Trans Reliab 60:570–579
Hall P, Morton SC (1993) On the estimation of entropy. Ann Inst Stat Math 45:69–88
Hutcheson K, Shenton LR (1974) Some moments of an estimate of Shannon’s measure of information. Commun Stat 3:89–94
Joe H (1989) Estimation of entropy and other functionals of a multivariate density. Ann Inst Stat Math 41:683–697
Kao EPC (1997) An introduction to stochastic processes. Duxbury Press, Belmont
Mack SP (1988) A comparative study of entropy estimators and entropy based goodnessoffit tests. PhD Dissertation, University of California, Riverside
McConalogue DJ (1981) Numerical treatment of convolution integrals involving distributions with densities having singularities at the origin. Commun Stat Simul Comput 10:265–280
Pakyari R, Balakrishnan N (2012) A general purpose approximate goodnessoffit test for progressively typeII censored data. IEEE Trans Reliab 61:238–244
Pakyari R, Balakrishnan N (2013) Goodnessoffit tests for progressively typeII censored data from locationscale distributions. J Stat Comput Simul 83:167–178
Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGrawHill, New York
Shannon CE (1948) A mathematical theory of communications. Bell Syst Tech J 27:379–423; 623–656
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Tijms HC (1994) Stochastic models: an algorithmic approach. Wiley, New York
van der Wall V, Preuss W (1989) A simple numerical procedure for solving Volterra–Stieltjes integral equations. In: Proceedings of the conference on complex analysis, Sofia, pp 424–430
van Es B (1992) Estimating functionals related to a density by class of statistics based on spacings. Scand J Stat 19:61–72
Vasicek O (1976) A test for normality based on sample entropy. J R Stat Soc B 38:54–59
Vatutin VA, Michailov VG (1995) Statistical estimation of entropy of discrete random variables with large numbers of results. Russ Math Surv 50:963–976
Xie M (1989) On the solution of renewaltype integral equations. Commun Stat Simul Comput 18:281–293
Xie M, Preuss W, Cui L (2003) Error analysis of some integration procedures for renewal equation and convolution integrals. J Stat Comput Simul 73:59–70
Acknowledgments
The author thanks the Associate Editor and anonymous referees for making some valuable suggestions which led to a considerable improvement in the presentation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alizadeh Noughabi, H. Entropy Estimation Using Numerical Methods. Ann. Data. Sci. 2, 231–241 (2015). https://doi.org/10.1007/s4074501500459
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4074501500459