Abstract
Calculating the sum of independent non-identically distributed random variables is necessary in the scientific field. Computing the probability of the corresponding significance point is important in cases that have a finite sum of random variables. However, it is difficult to evaluate this probability when the number of random variables increases. Under these circumstances, consideration of a more accurate approximation of the distribution function is extremely important. A saddlepoint approximation is performed using upper probabilities from the distribution of the sum of independent non-identically gamma random variables under finite sample sizes. In this study, we compared the results from a saddlepoint approximation to those from normal and moment-based approximations to identify the most appropriate method to use for the distribution function.
Similar content being viewed by others
Introduction
The distribution of the sum of independent identically distributed gamma random variables is well known. However, within the scientific field, it is necessary to know the distribution of the sum of independent non-identically distributed (i.n.i.d.) gamma random variables. For example, it would be necessary to know this distribution for calculating total waiting times where component times are assumed to be independent exponential or gamma random variables. In addition, engineers calculate total excess water flow into a dam as the sum of i.n.i.d. gamma random variables. To calculate the exact probability distribution of the sum of i.n.i.d. gamma random variables, the probability of all possible elements consistent with the sum must be computed. Mathai [12] derived the distribution of the sum of i.n.i.d. gamma random variables by converting the moment-generating function. Additionally, Moschopoulos [13] calculated the distribution of the sum of i.n.i.d. gamma random variables using a simple recursive relation approach. For the detail of the gamma distribution family, we refer the reader to Khodabin and Ahmadabadi [9]. However, Mathai [12] and Moschopoulos [13] derived the density of the sum of i.n.i.d. gamma random variables with infinite summation. This method of computation is intractable in practice, especially in cases in which there is an increase in the number of random variables. An exact calculation is feasible by applying the standard inversion formula to the characteristic function in computer algebra systems, such as Mathematica. However, in these calculations, the probability is estimated with an approximation method. Approximation methods are widely used and have been studied extensively. From a practical view, approximations are typically precise and straightforward to implement in various statistical software programs. Hence, obtaining a more accurate approximation for evaluating the density or the distribution function of i.n.i.d. random variables remains an important area of debate in statistics. In this study, we describe the use of approximation methods to calculate the distribution of the sum of i.n.i.d. gamma random variables in Sect. “A Saddlepoint approximation to the distribution of sum of i.n.i.d. gamma random variables”. Furthermore, we discuss the derivation of the order of errors of suggested approximation for the given distribution. For the approximation presented in this paper, we used the saddlepoint formula employed previously by Daniels [2, 3] and developed by Lugannani [11]. The saddlepoint approximation can be obtained for any statistic or random variable that contains a cumulant generating function. Additionally, the saddlepoint generates accurate probabilities in the tail of distribution. Saddlepoint approximations have been used with great success by several researchers. Excellent discussions of their applications to a range of distributional problems are found in the following studies: Jensen [8], Huzurbazar [7], Kolassa [10], and Butler [1]. Recently, Eisinga et al. [4] discussed the use of the saddlepoint approximation for the sum of i.n.i.d. binomial random variables. Additionally, Murakami [14] and Nadarajah [15] considered the use of the saddlepoint approximation for the sum of i.n.i.d. uniform and beta random variables, respectively. In Sect. “Numerical results”, we discuss the results obtained from using the saddlepoint approximation. In Sect. “Concluding remarks”, we summarize our conclusions.
A Saddlepoint approximation to the distribution of sum of i.n.i.d. gamma random variables
In this section, we discuss the use of the saddlepoint approximation of the sum of independent non-identically gamma random variables. We assumed that \(X_1,\ldots ,X_n\) are independent random variables, with shape, \(\alpha _i>0\), and scale parameters, \(\beta _i>0\), for \(i=1,\ldots ,n\). Next, we let \(S_n=X_1+X_2+\cdots +X_n\). The moment-generating function of \(S_n\) is
It is important to note that Mathai [12] derived the density function of the sum of i.n.i.d. gamma random variables by converting its moment-generating function as follows:
where \(\rho =\alpha _1+\cdots +\alpha _n\) and \((y)_z\) denote the Pochhammer symbol. In addition, Moschopoulos [13] obtained the density function of the sum of i.n.i.d. gamma random variables using the following simple recursive relation approach:
where
with \(\delta _0=1\). It is difficult to evaluate the exact density of \(S_n\) with increasing n.
Herein, we consider an approximation to the distribution of \(S_n\). The cumulant generating function of \(S_n\) is
Using the cumulant generating function, the mean, \(\mu\), and variance, \(\sigma ^2\), of \(S_n\) are given below:
According to Daniels [3], the saddlepoint approximation of the density function of \(S_n\) is as follows:
where
and \(\hat{s}\) is the root of \(\kappa _n'(s)=v\) which is readily solved numerically by the Newton–Raphson algorithm.
Several approaches have been used to further minimize the error of the saddlepoint approximation [5]. For example, one method uses a higher order approximation by including adjustments for the third and fourth cumulants [3]. A higher order saddlepoint approximation uses the following correction term:
where
The approximate tail probabilities of \(S_n\) are determined by numerically integrating Eq. (1).
An alternative approach is to use the Lugannani and Rice [11] for the continuous tail probability approximation as follows:
where \(\phi (\cdot )\) is the standard normal density function, \(\Phi (\cdot )\) is the corresponding cumulative distribution function, and
where \({\text{sgn}}(\hat{s})=\pm {1}\), 0 if \(\hat{s}\) is positive, negative, or zero.
Numerical results
In this section, we investigated the upper probability using the saddlepoint approximations to \(S_n\). In this study, we focused on the Lugannani–Rice formula. Note that Mathai [12] obtained a normal approximation with \(n \rightarrow \infty\). Moschopoulos [13] derived the density of \(S_n\) with infinite summation as follows:
We used a finite number and truncated the infinite series to meet an acceptable precision as published by Moschopoulos [13]. This equation is listed as follows:
where \((\rho )_k=\rho (\rho +1)\cdots (\rho +k-1)\), \((\rho )_0=1\) and \(b=\max _{2\le {}i\le {}n}(1-\beta _{1}/\beta _{i})\). To bound the truncation error with the sum of the first \(\ell +1\), we used the following equation:
In addition, we used another approximation method for the distribution of \(S_n\), a moment-based approximation proposed by Ha and Provost [6]. The distribution of \(S_n\) is approximated by the polynomial adjusted \(\tilde{f}_k(v)\) such that
where
and m(k) and \(\text {E}(M^k)\) denote the kth moment of the adjusted distribution \(\psi (v)\) and the kth moment of \(S_n\), respectively.
Herein, we consider the approximation adjusted with the skew-normal distribution as follows:
where
Then,
Note that we obtained \(\xi _0=1,\xi _1=\xi _2=0\) for \(k=2\). Afterwards, the moment-based approximation with skew-normal polynomial was as follows:
An important step for the proposed method is to determine the optimal degrees for the polynomials. We followed the selection rule, which is based on the integrated squared differences between density approximations as previously published by Ha and Provost [6].
For this study, the following notations were utilized: exact probability of \(S_n\), \(E_\mathrm{P}\), (as proposed by Moschopoulos [13]); normal approximation, \(A_\mathrm{N}\); saddlepoint approximation with Lugannani–Rice formula, \(A_\mathrm{L}\); moment-based approximation with skew-normal polynomial, \(A_\mathrm{M}\); and the relative error of approximations, r.e. (Tables 1, 2, 3). We used different values for \(\vec {\alpha }=(\alpha _1,\alpha _2,\ldots ,\alpha _n)\) and \(\vec {\beta }=(\beta _1,\beta _2,\ldots ,\beta _n)\). These values were grouped into cases 1–3. Herein, we assumed that \(\alpha _i\) and \(\beta _i\) for \(n=5, 10\) and 15 as follows:
-
Case 1:
\(\alpha _i\) and \(\beta _i\) were simulated from Case A: Uniform distribution with interval [0, 3] independently as
-
$$\alpha _i = (1.04022, 1.52149, 2.96165, 0.77156, 1.93264)$$$$\beta _i =(2.93353, 2.60821, 2.49735, 1.57684, 1.05720)$$
Case B: Poisson distribution with parameter \(\lambda =10\) independently as
-
$$\alpha _i =(9, 10, 18, 8, 11)$$$$\beta _i =(17, 14, 13, 10, 9)$$
Case C: Lognormal distribution with parameters location \(\mu =0\) and scale \(\sigma =2\) independently as
-
$$\alpha _i = (0.05459, 0.87723, 0.98562, 1.37783, 6.40726)$$$$\beta _i =(1.68872, 0.39881, 0.25645, 6.14009, 0.18285)$$
Case D: Gamma distribution with parameters shape \(\gamma =2.0\) and scale \(\xi =1.5\) independently as
-
$$\alpha _i = (1.96560, 1.89408, 3.00261, 4.28812, 3.01364)$$$$\beta _i =(6.99957, 5.68468, 3.15081, 3.49359, 3.32123)$$
Case E: Exponential distribution with parameter \(\lambda =2.0\) independently as
-
$$\alpha _i = (0.52959, 0.33946, 0.00643, 0.67897, 0.21986)$$$$\beta _i = (0.01120, 0.06997, 0.09169, 0.32160, 0.52149)$$
Case F: Binomial distribution with parameter \(N=20\), \(p=0.3\) independently as
-
$$\alpha _i = (5, 6, 11, 5, 7)$$$$\beta _i = (10, 8, 8, 6, 5)$$
-
-
Case 2:
\(\alpha _i\) and \(\beta _i\) were simulated from Case A: Uniform [0, 3] independently as
-
$$\alpha _i = (0.22417, 1.14752, 0.50906, 1.98942, 2.72316, 2.50722, 1.28708, 0.52985,2.61593, 2.06543)$$$$\beta _i=(2.85308, 0.91297, 2.36745, 0.57299, 2.95146, 2.01277, 2.77988, 0.36263, 0.10206, 1.98484)$$
Case B: Poisson distribution with parameter \(\lambda =10\) independently as
-
$$\alpha _i = (6, 9, 7, 11, 14, 13, 9, 7, 14, 11)$$$$\beta _i = (15, 8, 12, 7, 17, 11, 15, 6, 5, 11)$$
Case C: Lognormal distribution with parameters location \(\mu =0\) and scale \(\sigma =2\) independently as
-
$$\alpha _i=(0.11560, 10.5703, 0.18732, 0.05774, 9.51800, 0.32369, 0.01573, 0.38308, 0.15837,0.50578)$$$$\beta _i=(0.57765, 0.00716, 0.04465, 0.01232, 0.13829, 0.69592, 0.62148, 7.70751, 0.11445,0.04548)$$
Case D: Gamma distribution with parameters shape \(\gamma =2.0\) and scale \(\xi =1.5\) independently as
-
$$\alpha _i=(2.17916, 1.10074, 1.40375, 0.55393, 3.18918, 3.27868, 5.79357, 0.74198, 2.54858,1.52722)$$$$\beta _i=(4.51414, 3.89484, 2.66041, 3.88386, 1.83366, 2.45540, 2.29499, 2.67110, 1.79620,6.17920)$$
Case E: Exponential distribution with parameter \(\lambda =2.0\) independently as
-
$$\alpha _i=(0.63848, 0.49992, 0.68366, 0.20362, 0.39683, 0.03837, 1.13507, 0.31168, 0.24129,0.29315)$$$$\beta _i=(1.54349, 0.15572, 1.81201, 0.21383, 0.25284, 0.31853, 0.27480, 0.62515, 0.89060,0.21336)$$
Case F: Binomial distribution with parameter \(N=20\), \(p=0.3\) independently as
-
$$\alpha _i = (5, 5, 5, 7, 6, 9, 3, 6, 7, 6)$$$$\beta _i = (3, 7, 2, 7, 6, 6, 6, 5, 4, 7)$$
-
-
Case 3:
\(\alpha _i\) and \(\beta _i\) were simulated from Case A: Uniform distribution with interval [0, 3] independently as
-
$$\alpha _i = (0.58008, 2.22637, 2.51611, 1.12297, 2.29383, 1.18906, 1.57483, 0.31849, 2.53235,2.27937,2.91811,2.57865,0.79358,1.86520,2.46924)$$$$\beta _i = (0.28821, 1.91593, 2.49916, 0.86430, 0.84279, 0.11141, 1.67239, 2.36912, 2.71671,0.77938,1.87762,0.17339,0.96113,0.79465,1.37481)$$
Case B: Poisson distribution with parameter \(\lambda =10\) independently as
-
$$\alpha _i=(14, 8, 7, 6, 11, 8, 9, 10, 14, 9, 10, 7, 13, 12, 15)$$$$\beta _i = (11, 8, 12, 10, 16, 11, 4, 7, 5, 10, 8, 10, 18, 6, 14)$$
Case C: Lognormal distribution with parameter location \(\mu =0\) and scale \(\sigma =2\) independently as
-
$$\alpha _i = (0.65787, 2.10546, 27.7337, 5.30692, 1.32748, 6.60756, 0.06394, 0.44348, 0.62371,2.24283,2.82459,9.36521,0.82459,0.27752,0.78127)$$$$\beta _i=(0.35243, 0.03793, 9.39041, 0.64510, 5.61762, 1.26038, 33.0086, 2.74834, 0.82037,9.86410,9.98734,0.85253,18.6350,0.06063,0.15637)$$
Case D: Gamma distribution with parameter shape \(\gamma =2.0\) and scale \(\xi =1.5\) independently as
-
$$\alpha _i= (6.89214, 8.05464, 1.88477, 7.43300, 3.20878, 3.90603, 0.939177, 1.5469, 12.2503,5.63549,1.49472,3.16031,1.32145,1.69085,0.815398)$$$$\beta _i=(2.62198, 1.1429, 4.84865, 1.37041, 7.08359, 5.63902, 3.86697, 4.96188, 3.69634,7.9203,6.06043,4.08442,0.501449,3.76872,9.79386)$$
Case E: Exponential distribution with parameter \(\lambda =2.0\) independently as
-
$$\alpha _i=(0.32981, 0.46440, 0.08766, 0.29351, 0.16135, 1.34496, 0.17082, 0.05857, 0.34622,0.11345,0.31936,0.23438,0.21816,0.22887,0.06095)$$$$\beta _i=(0.05570, 0.88621, 0.78224, 0.42277, 1.07697, 0.23754, 0.43072, 0.20499, 0.24145,0.38523,0.13308,0.19507,1.63801,0.15344,0.11302)$$
Case F: Binomial distribution with parameter \(N=20\), \(p=0.3\) independently as
-
$$\alpha _i=(6, 5, 8, 6, 7, 3, 7, 9, 6, 8, 6, 7, 7, 7, 8)$$$$\beta _i =(9, 4, 4, 6, 4, 7, 6, 7, 7, 6, 7, 7, 3, 7, 8)$$
-
The results listed in Tables 1, 2 and 3 indicate that the \(A_\mathrm{M}\) approximation was more suitable than the normal approximation for the distribution of \(S_n\). In support of this, we observed that the \(A_\mathrm{L}\) approximation was more accurate than the \(A_\mathrm{M}\) approximation in all cases tested. Therefore, we suggest estimating the probability using the \(A_\mathrm{L}\) approximation in cases with large n.
Concluding remarks
In this paper, we considered both the saddlepoint and moment-based approximations on the distribution of the sum of i.n.i.d. gamma random variables. Use of the saddlepoint approximation was an accurate method for calculating distribution. From our results, we determined that the precision of the saddlepoint approximation was superior to both the normal and moment-based approximations.
References
Butler, R.W.: Saddlepoint Approximations with Applications. Cambridge University Press, Cambridge (2007)
Daniels, H.E.: Saddlepoint approximations in statistics. Ann. Math. Stat. 25, 631–650 (1954)
Daniels, H.E.: Tail probability approximations. Int. Stat. Rev. 55, 37–48 (1987)
Eisinga, R., Grotenhuis, M.T., Pelzer, B.: Saddlepoint approximation for the sum of independent non-identically distributed binomial random variables. Stat. Neerl. 67, 190–201 (2013)
Gillespie, C.S., Renshaw, E.: An improved saddlepoint approximation. Math. Biosci. 208, 359–374 (2007)
Ha, H.-T., Provost, S.B.: A viable alternative to resorting to statistical tables. Commun. Stat. Simul. Comput. 36, 1135–1151 (2007)
Huzurbazar, S.: Practical saddlepoint approximations. Am. Stat. 53, 225–232 (1999)
Jensen, J.L.: Saddlepoint Approximations. Oxford University Press, Oxford (1995)
Khodabin, M., Ahmadabadi, A.: Some properties of generalized gamma distribution. Math. Sci. 4, 9–28 (2010)
Kolassa, J.E.: Series Approximation Methods in Statistics. Springer-Verlag, New York (2006)
Lugannani, R., Rice, S.O.: Saddlepoint approximation for the distribution of the sum of independent random variables. Adv. Appl. Probab. 12, 475–490 (1980)
Mathai, A.M.: Storage capacity of a dam with gamma type inputs. Ann. Inst. Stat. Math. 34, 591–597 (1982)
Moschopoulos, P.G.: The distribution of the sum of independent gamma random variables. Ann. Inst. Stat. Math. 37, 541–544 (1985)
Murakami, H.: A saddlepoint approximation to the distribution of the sum of independent non-identically uniform random variables. Stat. Neerl. 68, 267–275 (2014)
Nadarajah, S., Jiang, X., Chu, J.: A saddlepoint approximation to the distribution of the sum of independent non-identically beta random variables. Stat. Neerl. 69, 102–114 (2015)
Acknowledgments
The author would like to thank the editor and the referee for their valuable comments and suggestions. The author appreciates that this research is supported by the Grant-in-Aid for Young Scientists B of JSPS, KAKENHI Number 26730025.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Murakami, H. Approximations to the distribution of sum of independent non-identically gamma random variables. Math Sci 9, 205–213 (2015). https://doi.org/10.1007/s40096-015-0169-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40096-015-0169-2