## Abstract

The integral \({\int }_{0}^{\infty }x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx\) appears in likelihood ratios used to detect a change in the parameters of a normal distribution. As part of the *m*^{th} moment of a truncated normal distribution, this integral is known to satisfy a recursion relation, which has been used to calculate the first four moments of a truncated normal. Use of higher order moments was rare. In more recent times, this integral has found important applications in methods of changepoint detection, with *m* going up to the thousands. The standard recursion formula entails numbers whose values grow quickly with *m*, rendering a low cap on computational feasibility. We present various aspects of dealing with the computational issues: asymptotics, recursion and approximation. We provide an example in a changepoint detection setting.

This is a preview of subscription content, log in to check access.

## References

Barr D R, Sherrill E T (1999) Mean and variance of truncated normal distributions. Amer Statist 53:357–361

Burkardt J (2014) The truncated normal distribution. https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf

Cook JD (2010) Upper bounds on non-central chi-square tails and truncated normal moments. https://www.johndcook.com/non_central_chi_square.pdf

Dhrymes P J (2005) Moments of truncated (normal) distributions. http://www.columbia.edu/~pjd1/papers.html

Gordon L, Pollak M (1995) A robust surveillance scheme for stochastically ordered alternatives. Ann Statist 23:1350–1375

Horrace WC (2015) Moments of the truncated normal distribution. J Prod Anal 43:133–138

Krieger AM, Pollak M, Yakir B (2003) Surveillance of a simple linear regression. J Amer Statist Assoc 98:1–15

Lee A (1914) Table of the Gaussian “tail” tunctions; when the “tail” is larger than the body. Biometrika 10:208–214

Liquet B, Nazarathy Y (2015) A dynamic view to moment matching of truncated distributions. Statist Prob Letts 104(53):87–93

MATLAB (2014) MATLAB and statistics toolbox release 2014b. The Mathworks Inc., Natick

Toolbox (2016) MATLAB multiprecision computing toolbox (ADVANPIX Release 2016). The MathWorks, Inc., Natick. http://www.advanpix

O’Connor AN (2011) Probability distributions used in reliability engineering. Reliability information analysis center (RIAC)

Pearson K, Lee A (1908) On the generalized probable error in multiple normal correlation. Biometrika 6:59–68

Pollak M (1987) Average run lengths of an optimal method of detecting a change in distribution. Ann Statist 15:749–779

Pollak M, Siegmund D (1991) Sequential detection of a change in a normal mean when the initial value is unknown. Ann Statist 19:394–416

Pollak M, Croarkin C, Hagwood C (1993) Surveillance schemes with application to mass calibration. NISTIR 5158 Technical Report, Statistical Division, The National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

Quesenberry C (1991) SPC Q processes for start-up processes and short or long runs. J Qual Tech 23:213–224

van Dobben de Bruyn CS (1968) Cumulative Sum Tests: Theory and Practice. Griffin, London

## Acknowledgements

This work was supported by grant 1450/13 from the Israel Science Foundation and by the Marcy Bogen Chair of Statistics at the Department of Statistics, The Hebrew University of Jerusalem. The authors would like to thank the referees for their salient comments, which greatly improved the paper.

## Author information

### Affiliations

### Corresponding author

## Appendices

### Appendix A: Proof of Theorem 1

The case *a* = 0 is trivial. In the following we assume *a*≠ 0. Recall that

Hence by Stirling’s approximation

and so

We break the sum in Eq. 1 into pieces and analyze them separately.

Let *γ* > *e* and let \(M=a\sqrt {m}\gamma \). We first show that the sum from *M* to \(\infty \) in Eq. 1 is negligible. Again, by Stirling’s approximation

and so

Hence

For large enough *m* and \(j=M=|a|\sqrt {m}\gamma \), this expression equals

Let 0 < *ε*. For *j* ≥ *M* the derivative with respect to *j* of

equals

It follows that as \(m\rightarrow {\infty }\)

For *j* ≤ *M*

It follows that

Note that if *X* ∼ *P**o**i**s**s**o**n*(*λ*) then \(P(X>y)\leq \frac {Ee^{tX}}{e^{ty}}=\frac {e^{\lambda (e^{t}-1)}}{e^{ty}}\) for all *t* > 0, which is minimal at \(t=\log (\frac {y}{\lambda })\); hence \(\log (P(X>y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Recall that \(\frac {1}{2}<\eta <1\) and denote \(\lambda =a\sqrt {m}+\frac {1}{4}\gamma a^{2}\) , \(y=a\sqrt {m}+(a\sqrt {m})^{\eta }\). Thus

If *X* ∼ *P**o**i**s**s**o**n*(*λ*) then \(P(X<y)\leq \frac {Ee^{-tX}}{e^{-ty}}=\frac {e^{\lambda (e^{-t}-1)}}{e^{-ty}}\) for all *t* > 0, which is minimal at \(t=-\log (\frac {y}{\lambda })\); hence \(\log (P(X<y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Hence for \(j<a\sqrt {m}-(a\sqrt {m})^{\eta }\)

Finally, for \( a\sqrt {m}-(a\sqrt {m})^{\eta }\leq j\leq a\sqrt {m}+(a\sqrt {m})^{\eta }\)

Hence \(\frac {j^{2}}{m}=a^{2}[1+O((a\sqrt {m})^{\eta -1})]\) and

Combining Equations (1)–(6) accounts for Theorem 1.

### Appendix B: Computing

This is a MATLAB (2014) program for calculating

Input: *m* > 2,*a*,*c*. Output: psi =\( \psi _{a,c}=\frac {{\int }_{c}^{\infty } x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx}{ {\int }_{0}^{\infty } x^{m} e^{-\frac {1}{2}x^{2}}dx}e^{-a \sqrt {m}}\).

###
*Remark 1*

The program above is intended to cover all cases of c, including*c* = 0.When n becomes large, \(\log ({\Gamma }(n/2))\)is calculated more precisely than Γ(*n*/2),so that when *c*≠ 0the last summands in both *p**s**i*(*n*)and *x**i*(*n*)should be completely calculated first on the log scale and only then exponentiated.

### Appendix C: A Sequential Changepoint Detection Context

The need for calculating *ψ*_{0,a}(*m*) appears in a number of changepoint problems (cf. Pollak et al. 1993 and Krieger et al. 2003). For example, consider independent normally distributed random variables observed sequentially whose mean may (or may not) increase at an unknown time *ν* and one monitors the sequence to detect such a change. Formally,

where *X*_{1},*X*_{2},… are independent. Consider the case that *μ* and *σ* are unknown and one considers an increase of *δ* standard deviations to be of import. Define:

The sequence {*Z*_{i}} is invariant with respect to the unknown parameters *μ* and *σ*. Therefore, the likelihood ratio

of the first *n* − 1 invariant statistics *Z*_{2},…,*Z*_{n} (for a change occurring at the *ν* = *k*^{th} observation vs. no change ever occurring) can be calculated. Once done, the Shiryaev–Roberts statistic

can be used to declare that a change is in effect; an alarm would be raised if *R*_{n} crosses a pre-specified threshold *A*.

Since one cannot differentiate between the case that there is no change ever and the case that a change occurred at the very beginning, necessarily \({{\Lambda }_{1}^{n}}= 1\). Clearly,

Letting

a lengthy calculation (similar to that in Pollak et al. 1993) yields for 2 ≤ *k* ≤ *n*

(Note that \(\frac {g_{0,a}(m)}{g_{0,0}(m)}=\psi _{0,a}(m)e^{a\sqrt {m}}\). For the sake of precision, it is advisable to first calculate the log of the components of \({{\Lambda }_{k}^{n}}(\delta )\) and then exponentiate their sum.)

As an example where *m* can be very large, we consider the list of weights at birth of 196710 babies born at a large hospital in Israel between 13/10/2004 and 18/5/2017 whose weight at birth was between 2000 and 5000 grams. Figure 1 is a histogram of the data. A normal distribution seems to fit the data well.

During the last decades, worldwide, obesity and macrosomia have been on the rise. It would have been reasonable to monitor the weight of newborn infants for an increase in mean. Since a rise could be gradual, it would be reasonable to start by trying to detect a small change — for example’s sake, we choose *δ* = 0.1, an increase of one tenth of a standard deviation (ca. 46 grams). Figure 2 presents the sequence of surveillance statistics *R*_{n} for the first 5000 observations and Fig. 3 for the first 65000 observations.

The choice of the threshold *A* is made in light of the risk one is willing to take regarding a false alarm. Suppose one were willing to tolerate a false alarm on the average once in 20 years. With an average of roughly 15000 babies born each year, this would mean a false alarm on the average once in 300000 observations. Using a renewal-theoretic approximation for the ARL2FA (Pollak 1987), this means that the cutoff level *A* should be 300000/1.06 ∼ 283000. It is clear from Figs. 2 and 3 that such a change would not been detected within the first 65000 observations.

Figure 4 presents the sequence of surveillance statistics *R*_{n} for the entire sequence. Clearly, *R*_{n} exceeds *A* = 283000 a short while after the 70000^{th} observation. In fact,the 70440^{th} is the first to carry *R*_{n} over the threshold. This means that it would have been declared after the 70440^{th} newborn that the mean weight has increased. Figure 5 depicts the loglikelihood function \(\log {\Lambda }^{70440}_{k}(\delta = 0.1)\), *k* = 1,…, 70440. This function attains its maximum at *k* = 65834, so 65834 can be regarded at the time of stopping (70400) as the maximum likelihood estimate of the changepoint. This could be interpreted as the increase being detected 4606 observations (ca. 4 months) after its occurrence.

In fact, the average weight of the first 65833 newborns is 3286 grams, whereas the average weight of newborns *#*65834 − *#*67000 is 3332 grams (an increase of approximately 0.1 standard deviations).

Continuing with the ensuing newborns, *R*_{n} drops again, and until the end of the sequence does not cross the *A* = 283000 level again. In fact, the average weight of the newborns from babies *#*67001 − 196710 is 3291 grams. So, it seems as if the increase was either temporary, or apparent. Was the alarm false? After observing 67000 newborns, it would have seemed that the increase was real. With hindsight, one may either believe that the increase was real, but a change (a decrease) took place thereafter and the mean weight reverted to its original level, or one may interpret the episode as having been a false alarm.

The calculation of *R*_{n} was done in the following way. *R*_{1},…,*R*_{5000} were calculated using the recursions of Section ??; ensuing *R*_{n}^{′}s were calculated by applying Theorem 1. Rather than calculating \(\frac {g_{0,\delta U_{k}(n)}(n-2)}{g_{0,0}(n-2)}\) by the recursions separately for each *δ**U*_{k}(*n*), calculation time drops immensely by creating a fine enough grid of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) and interpolating. Creation of such a grid (in our example for *n* = 1,…, 5000 and approximately 500 values of *x*, spaced so that between adjacent *x*’s the function \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) is almost perfectly linear). This grid is calculated much faster by the recursions of Section ?? than by the method of Section ?? (after all, the recursion that generated \(\frac {g_{0,x}(5000)}{g_{0,0}(5000)}\) created all of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\), *n* = 1,…, 5000 along the way, whereas by the method of Section ?? each \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) has to be calculated separately.)

### Appendix D: Remarks

###
*Remark 1*

A lower bound can be obtained by Jensen’s inequality:

###
*Remark 2*

Cook (2010) presented an upper bound

for *c* > 0.On the log scale, Cook’s upper bound has an asymptotic(\(m \rightarrow \infty \)) order of magnitude*m*^{2}, whereas Theorem 1 positsan order of magnitude \(m\log {m}\).

###
*Remark 3*

A similar type of analysis can be done for\({\int }_{-\infty }^{\infty }x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)(cf. Pollak et al. 1993). Note that

and (unless *m* = 0)that (depending on a) one of the two integrals is asymptotically negligible with respectto the other. Thus Theorem 1 can be applied to obtain an approximation to the*m*^{th}moment of a (non-truncated) normal distribution.

###
*Remark 4*

A similar type of analysis can be done when the truncation is from both ends. To see this, it sufficesto consider \({{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)for *c* > 0.Clearly, \(\frac {{{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx}{{{\int }_{0}^{c}} x^{m}dx}\rightarrow e^{-\frac {1}{2}(c-a)^{2}}\)as\(m \rightarrow \infty \). It followsthat

Without loss of generality,assume that *c* > |*b*|. A recursionfor \({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)that builds on thisis

where\(h_{a,b,c}(m)=\frac {m + 1}{c^{m + 1}}{{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\). (Notethat for |*b*| > *c* > *b*,\({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx= (-1)^{m} {\int }_{-c}^{-b} x^{m}e^{-\frac {1}{2}(x+a)^{2}}dx\).)

###
*Remark 5*

A similar recursion can be constructed for densities proportional to\(e^{{const} \times x^{2}}I(b<x<c)\)where*c**o**n**s**t* > 0and*b*,*c* are finite.

###
*Remark 6*

Tables relevant to the truncated normal distribution have been around for over a century(cf. Pearson and Lee 1908and Lee 1914). Even today it is considered as part and parcelof applied distributions (cf. O’Connor 2011) and papers have been devoted to estimation ofits parameters (cf. Barr and Sherrill 1999; Horrace 2015; Liquet and Nazarathy 2015). Formost practical purposes, the first four moments of the distribution (mean, variance, skewnessand kurtosis) have been of applied interest. Higher order moments may appear in quadraturemethods (cf. Burkardt 2014); their order of magnitude would seem to be in the tens at most.

## Rights and permissions

## About this article

### Cite this article

Pollak, M., Shauly-Aharonov, M. A Double Recursion for Calculating Moments of the Truncated Normal Distribution and its Connection to Change Detection.
*Methodol Comput Appl Probab* **21, **889–906 (2019). https://doi.org/10.1007/s11009-018-9622-7

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- Changepoint
- On-line
- Shiryaev–Roberts
- Surveillance

### Mathematics Subject Classification (2010)

- 62L10
- 62E15
- 60E05