A Double Recursion for Calculating Moments of the Truncated Normal Distribution and its Connection to Change Detection


The integral \({\int }_{0}^{\infty }x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx\) appears in likelihood ratios used to detect a change in the parameters of a normal distribution. As part of the mth moment of a truncated normal distribution, this integral is known to satisfy a recursion relation, which has been used to calculate the first four moments of a truncated normal. Use of higher order moments was rare. In more recent times, this integral has found important applications in methods of changepoint detection, with m going up to the thousands. The standard recursion formula entails numbers whose values grow quickly with m, rendering a low cap on computational feasibility. We present various aspects of dealing with the computational issues: asymptotics, recursion and approximation. We provide an example in a changepoint detection setting.

This is a preview of subscription content, log in to check access.


  1. Barr D R, Sherrill E T (1999) Mean and variance of truncated normal distributions. Amer Statist 53:357–361

    Google Scholar 

  2. Burkardt J (2014) The truncated normal distribution. https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf

  3. Cook JD (2010) Upper bounds on non-central chi-square tails and truncated normal moments. https://www.johndcook.com/non_central_chi_square.pdf

  4. Dhrymes P J (2005) Moments of truncated (normal) distributions. http://www.columbia.edu/~pjd1/papers.html

  5. Gordon L, Pollak M (1995) A robust surveillance scheme for stochastically ordered alternatives. Ann Statist 23:1350–1375

    MathSciNet  Article  MATH  Google Scholar 

  6. Horrace WC (2015) Moments of the truncated normal distribution. J Prod Anal 43:133–138

    Article  Google Scholar 

  7. Krieger AM, Pollak M, Yakir B (2003) Surveillance of a simple linear regression. J Amer Statist Assoc 98:1–15

    MathSciNet  Article  MATH  Google Scholar 

  8. Lee A (1914) Table of the Gaussian “tail” tunctions; when the “tail” is larger than the body. Biometrika 10:208–214

    Google Scholar 

  9. Liquet B, Nazarathy Y (2015) A dynamic view to moment matching of truncated distributions. Statist Prob Letts 104(53):87–93

    MathSciNet  Article  MATH  Google Scholar 

  10. MATLAB (2014) MATLAB and statistics toolbox release 2014b. The Mathworks Inc., Natick

    Google Scholar 

  11. Toolbox (2016) MATLAB multiprecision computing toolbox (ADVANPIX Release 2016). The MathWorks, Inc., Natick. http://www.advanpix

    Google Scholar 

  12. O’Connor AN (2011) Probability distributions used in reliability engineering. Reliability information analysis center (RIAC)

  13. Pearson K, Lee A (1908) On the generalized probable error in multiple normal correlation. Biometrika 6:59–68

    Article  Google Scholar 

  14. Pollak M (1987) Average run lengths of an optimal method of detecting a change in distribution. Ann Statist 15:749–779

    MathSciNet  Article  MATH  Google Scholar 

  15. Pollak M, Siegmund D (1991) Sequential detection of a change in a normal mean when the initial value is unknown. Ann Statist 19:394–416

    MathSciNet  Article  MATH  Google Scholar 

  16. Pollak M, Croarkin C, Hagwood C (1993) Surveillance schemes with application to mass calibration. NISTIR 5158 Technical Report, Statistical Division, The National Institute of Standards and Technology, Gaithersburg, MD 20899, USA

  17. Quesenberry C (1991) SPC Q processes for start-up processes and short or long runs. J Qual Tech 23:213–224

    Article  Google Scholar 

  18. van Dobben de Bruyn CS (1968) Cumulative Sum Tests: Theory and Practice. Griffin, London

    Google Scholar 

Download references


This work was supported by grant 1450/13 from the Israel Science Foundation and by the Marcy Bogen Chair of Statistics at the Department of Statistics, The Hebrew University of Jerusalem. The authors would like to thank the referees for their salient comments, which greatly improved the paper.

Author information



Corresponding author

Correspondence to Moshe Pollak.


Appendix A: Proof of Theorem 1

The case a = 0 is trivial. In the following we assume a≠ 0. Recall that

$${\int}_{0}^{\infty}x^{m} e^{-\frac{1}{2}x^{2}}dx= 2^{\frac{1}{2}(m-1)}{\Gamma}(\frac{m + 1}{2}) . $$

Hence by Stirling’s approximation

$$\begin{array}{@{}rcl@{}} \frac{{\int}_{0}^{\infty}x^{m + 1} e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty}x^{m} e^{-\frac{1}{2}x^{2}}dx} &=&\frac{2^{\frac{1}{2}m}{\Gamma}(\frac{m + 2}{2})} {2^{\frac{1}{2}(m-1)}{\Gamma}(\frac{m + 1}{2})} \\ &=&2^{\frac{1}{2}}\frac{e^{-\frac{m + 2}{2}}(\frac{m + 2}{2})^{\frac{m + 1}{2}}}{e^{-\frac{m + 1}{2}}(\frac{m + 1}{2})^{\frac{m}{2}}}(1+O(\frac{1}{m^{2}})) \\ &=&e^{-\frac{1}{2}}(\frac{m + 2}{m + 1})^{(m + 1)\frac{1}{2}}(m + 1)^{\frac{1}{2}}(1+O(\frac{1}{m^{2}})) \\ &=&\sqrt{m + 1}(1+O(\frac{1}{m})) \end{array} $$

and so

$$\begin{array}{@{}rcl@{}} e^{\frac{1}{2}a^{2}} \frac{{\int}_{0}^{\infty}x^{m} e^{-\frac{1}{2}(x-a)^{2}}dx}{{\int}_{0}^{\infty} x^{m} e^{-\frac{1}{2}x^{2}} dx} &=& \frac{{\int}_{0}^{\infty}e^{ax}x^{m} e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty} x^{m} e^{-\frac{1}{2}x^{2} }dx} \\ &=&\sum\limits_{j = 0}^{\infty}\frac{{\int}_{0}^{\infty}\frac{(ax)^{j}}{j!}x^{m} e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty} x^{m} e^{-\frac{1}{2}x^{2} dx}} \\ &=&\sum\limits_{j = 0}^{\infty}\frac{a^{j}}{j!}\frac{{\int}_{0}^{\infty}x^{m+j} e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty} x^{m} e^{-\frac{1}{2}x^{2}} dx} \\ &=&\sum\limits_{j = 0}^{\infty}\frac{a^{j}}{j!}\prod\limits_{i = 1}^{j}\frac{{\int}_{0}^{\infty}x^{m+i} e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty} x^{m+i-1} e^{-\frac{1}{2}x^{2} }dx} \\ &=&\sum\limits_{j = 0}^{\infty}\frac{a^{j}}{j!}\prod\limits_{i = 1}^{j}(\sqrt{m+i}(1+O(\frac{1}{m}))) \\ &=&\sum\limits_{j = 0}^{\infty}\frac{(|a|\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}\sqrt{\prod\limits_{i = 1}^{j}(1+\frac{i}{m})} . \end{array} $$

We break the sum in Eq. 1 into pieces and analyze them separately.

Let γ > e and let \(M=a\sqrt {m}\gamma \). We first show that the sum from M to \(\infty \) in Eq. 1 is negligible. Again, by Stirling’s approximation

$$\begin{array}{@{}rcl@{}} \prod\limits_{i = 1}^{j} (1+\frac{i}{m}) &=&\frac{(m+j)!}{m!m^{j}} =\frac{e^{-(m+j)}(m+j)^{m+j+\frac{1}{2}}(1+O(\frac{1}{m+j}))}{m^{j}e^{-m}m^{m+\frac{1}{2}}(1+O(\frac{1}{m}))} \\ &=&e^{-j}(\frac{m+j}{m})^{m+j+\frac{1}{2}}(1+O(\frac{1}{m})) \end{array} $$

and so

$$\begin{array}{@{}rcl@{}} \frac{(|a|\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}\sqrt{\prod\limits_{i = 1}^{j} (1+\frac{i}{m})} &=&\frac{(|a|\sqrt{m})^{j}e^{-\frac{1}{2}j}\sqrt{(\frac{m+j}{m})^{m+j+\frac{1}{2}}}(1+O(\frac{1}{m}))^{j}}{\sqrt{2\pi}e^{-j}j^{j+\frac{1}{2}}} \\ && \times (1+O(\frac{1}{m})). \end{array} $$


$$\log \{\frac{(|a|\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}\sqrt{\prod\limits_{i = 1}^{j} (1+\frac{i}{m})}\}$$
$$\begin{array}{@{}rcl@{}} &=&j\log(|a|\sqrt{m})+\frac{1}{2}(m+j+\frac{1}{2})\log(\frac{m+j}{m})-\frac{1}{2}\log(2\pi) +\frac{1}{2}j-(j+\frac{1}{2})\log(j) \\ &&\quad \quad +j\log(1+o(\frac{1}{m})) \\ &=&-(j+\frac{1}{2})\log(j)+\frac{1}{2}(m+j+\frac{1}{2})\log(m+j)-\frac{1}{2}(m+j+\frac{1}{2})\log(m)+j\log(|a|\sqrt{m}) \\ &&\quad \quad+\frac{1}{2}j +j\log(1+o(\frac{1}{m}))-\frac{1}{2}\log(2\pi). \ \end{array} $$

For large enough m and \(j=M=|a|\sqrt {m}\gamma \), this expression equals

$$-|a|\sqrt{m}\gamma(\log(\gamma)-1+o(1)) +\frac{1}{4}a^{2}\gamma^{2}(1+o(1))\quad<\quad -\frac{1}{2}a\sqrt{m}\gamma(\log(\gamma)-1). $$

Let 0 < ε. For jM the derivative with respect to j of

$$-(j+\frac{1}{2})\log(j)+\frac{1}{2}(m+j+\frac{1}{2})\log(m+j)-\frac{1}{2}(m+j+\frac{1}{2})\log(m)+j\log(|a|\sqrt{m})+\frac{1}{2}j $$
$$+j\log(1+\frac{\varepsilon}{m}) $$


$$-\log(\frac{j}{|a|\sqrt{m}})-\frac{2m+j}{4j(m+j)}+\frac{1}{2}\log(1+\frac{j}{m})+\log(1+\frac{\varepsilon}{m}) \quad \leq \quad -\frac{1}{2}\log(\gamma). $$

It follows that as \(m\rightarrow {\infty }\)

$$\begin{array}{@{}rcl@{}} \sum\limits_{j=M}^{\infty}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}\sqrt{\prod\limits_{i = 1}^{j}(1+\frac{i}{m})} &\leq& e^{-\frac{1}{2}a\sqrt{m}\gamma(\log(\gamma)-1)} \sum\limits_{j = 0}^{\infty}e^{-\frac{1}{2}\log(\gamma)j} \\ &=&\frac{ e^{-\frac{1}{2}a\sqrt{m}\gamma(\log(\gamma)-1)}}{1- e^{-\frac{1}{2}\log(\gamma)}}\rightarrow 0. \end{array} $$

For jM

$$\begin{array}{@{}rcl@{}} \log(\prod\limits_{i = 1}^{j}(1+\frac{i}{m}))&=&\sum\limits_{i = 1}^{j}[\frac{i}{m}-\frac{1}{2}\frac{i^{2}}{m^{2}}+\frac{1}{3}\frac{i^{3}}{m^{3}} {\ldots} ] \\ &=&\frac{j(j + 1)}{2m}-\sum\limits_{i = 1}^{j}\frac{i^{2}}{m^{2}} (\frac{1}{2}-\frac{1}{3}\frac{i}{m}+\frac{1}{4}\frac{i^{2}}{m^{2}} {\ldots} ) \\ &=& j(\frac{1}{2}\frac{j}{m}) -O(\frac{|a|^{3}}{\sqrt{m}}) . \end{array} $$

It follows that

$$\begin{array}{@{}rcl@{}} \sum\limits_{j = 0}^{M}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}\sqrt{\prod\limits_{i = 1}^{j}(1+\frac{i}{m})} &=&\sum\limits_{j = 0}^{M}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}e^{\frac{1}{4}\frac{j^{2}}{m}-O(\frac{|a|^{3}}{\sqrt{m}})}. \end{array} $$

Note that if XPoisson(λ) then \(P(X>y)\leq \frac {Ee^{tX}}{e^{ty}}=\frac {e^{\lambda (e^{t}-1)}}{e^{ty}}\) for all t > 0, which is minimal at \(t=\log (\frac {y}{\lambda })\); hence \(\log (P(X>y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Recall that \(\frac {1}{2}<\eta <1\) and denote \(\lambda =a\sqrt {m}+\frac {1}{4}\gamma a^{2}\) , \(y=a\sqrt {m}+(a\sqrt {m})^{\eta }\). Thus

$$\sum\limits_{j=a\sqrt{m}+(a\sqrt{m})^{\eta}}^{M=\gamma a\sqrt{m}}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}e^{\frac{1}{4}\frac{j^{2}}{m}}$$
$$\begin{array}{@{}rcl@{}} &&< e^{a\sqrt{m}+\frac{1}{4} a^{2}}e^{\frac{1}{4} a^{2}(\gamma-1)}{P}(Poisson(\lambda)>y) \\ &&\leq e^{a\sqrt{m}+\frac{1}{4} a^{2}}e^{\frac{1}{4} a^{2}(\gamma-1)}e^{\log((a\sqrt{m})^{\eta}-\frac{1}{4}\gamma a^{2})-(a\sqrt{m}+(a\sqrt{m})^{\eta})\log(\frac{a\sqrt{m}+(a\sqrt{m})^{\eta}}{a\sqrt{m}+\frac{1}{4}\gamma a^{2}})} \\ &&\leq e^{a\sqrt{m}+\frac{1}{4} a^{2}}O(e^{-(a\sqrt{m})^{\eta}}). \end{array} $$

If XPoisson(λ) then \(P(X<y)\leq \frac {Ee^{-tX}}{e^{-ty}}=\frac {e^{\lambda (e^{-t}-1)}}{e^{-ty}}\) for all t > 0, which is minimal at \(t=-\log (\frac {y}{\lambda })\); hence \(\log (P(X<y))<{e^{y-\lambda }}/{(\frac {y}{\lambda })^{y}}\). Hence for \(j<a\sqrt {m}-(a\sqrt {m})^{\eta }\)

$$\sum\limits_{j = 0}^{a\sqrt{m}-(a\sqrt{m})^{\eta}}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}e^{\frac{1}{4}\frac{j^{2}}{m}}$$
$$\begin{array}{@{}rcl@{}} &&<e^{a\sqrt{m}+\frac{1}{4}a^{2}} \times{{P}(Poisson}(a\sqrt{m}(1+O(\frac{1}{m})))<a\sqrt{m}-(a\sqrt{m})^{\eta}) \\ &&=e^{a\sqrt{m}+\frac{1}{4}a^{2}}O(e^{-(a\sqrt{m})^{2\eta-1}}). \end{array} $$

Finally, for \( a\sqrt {m}-(a\sqrt {m})^{\eta }\leq j\leq a\sqrt {m}+(a\sqrt {m})^{\eta }\)

$$a^{2}[1-2a^{\eta-1}m^{\frac{1}{2}(\eta-1)}+a^{2(\eta-1)}m^{(\eta-1)}]\leq \frac{j^{2}}{m}\leq a^{2}[1 + 2a^{\eta-1}m^{\frac{1}{2}(\eta-1)}+a^{2(\eta-1)}m^{(\eta-1)}]. $$

Hence \(\frac {j^{2}}{m}=a^{2}[1+O((a\sqrt {m})^{\eta -1})]\) and

$$ \sum\limits_{j=a\sqrt{m}-(a\sqrt{m})^{\eta}}^{ a\sqrt{m}+(a\sqrt{m})^{\eta}}\frac{(a\sqrt{m}(1+O(\frac{1}{m})))^{j}}{j!}e^{\frac{1}{4}\frac{j^{2}}{m}}=e^{a\sqrt{m}+\frac{1}{4}a^{2}[1+O((a\sqrt{m})^{\eta-1})]}. $$

Combining Equations (1)–(6) accounts for Theorem 1.

Appendix B: Computing

This is a MATLAB (2014) program for calculating

$$\frac{{\int}_{c}^{\infty} x^{m} e^{-\frac{1}{2}(x-a)^{2}}dx}{ {\int}_{0}^{\infty} x^{m} e^{-\frac{1}{2}x^{2}}dx}e^{-a \sqrt{m}} $$

Input: m > 2,a,c. Output: psi =\( \psi _{a,c}=\frac {{\int }_{c}^{\infty } x^{m} e^{-\frac {1}{2}(x-a)^{2}}dx}{ {\int }_{0}^{\infty } x^{m} e^{-\frac {1}{2}x^{2}}dx}e^{-a \sqrt {m}}\).


Remark 1

The program above is intended to cover all cases of c, includingc = 0.When n becomes large, \(\log ({\Gamma }(n/2))\)is calculated more precisely than Γ(n/2),so that when c≠ 0the last summands in both psi(n)and xi(n)should be completely calculated first on the log scale and only then exponentiated.

Appendix C: A Sequential Changepoint Detection Context

The need for calculating ψ0,a(m) appears in a number of changepoint problems (cf. Pollak et al. 1993 and Krieger et al. 2003). For example, consider independent normally distributed random variables observed sequentially whose mean may (or may not) increase at an unknown time ν and one monitors the sequence to detect such a change. Formally,

$$X_{1},\ldots,X_{\nu-1} \sim {Normal}(\mu,\sigma^{2}) $$
$$X_{\nu},{\ldots} \quad \sim {Normal}(\mu+\delta \sigma, \sigma^{2}) $$

where X1,X2,… are independent. Consider the case that μ and σ are unknown and one considers an increase of δ standard deviations to be of import. Define:

$$Y_{i}=(X_{i}-\bar{X}_{i-1})\sqrt{\frac{i-1}{i}} $$
$$Z_{i}=\frac{Y_{i}}{|Y_{2}|} . $$

The sequence {Zi} is invariant with respect to the unknown parameters μ and σ. Therefore, the likelihood ratio

$${{\Lambda}_{k}^{n}}=\frac{f_{\nu=k}(Z_{2},\ldots,Z_{n})}{f_{\nu=\infty}(Z_{2},\ldots,Z_{n})} $$

of the first n − 1 invariant statistics Z2,…,Zn (for a change occurring at the ν = kth observation vs. no change ever occurring) can be calculated. Once done, the Shiryaev–Roberts statistic

$$R_{n}=\sum\limits_{k = 1}^{n}{{\Lambda}_{k}^{n}} $$

can be used to declare that a change is in effect; an alarm would be raised if Rn crosses a pre-specified threshold A.

Since one cannot differentiate between the case that there is no change ever and the case that a change occurred at the very beginning, necessarily \({{\Lambda }_{1}^{n}}= 1\). Clearly,

$${{\Lambda}_{2}^{2}}(\delta)=\{{\Phi}(\frac{\delta}{\sqrt{2}})I(Z_{2}= 1)+[1-{\Phi}(\frac{\delta}{\sqrt{2}})]I(Z_{2}=-1)\}/\frac{1}{2} $$
$$= 1+[2{\Phi} (\frac{\delta}{\sqrt{2}})-1]\text{sign}(Z_{2}) . $$


$$U_{k}(n)=(k-1) \frac{\sum\limits_{i=k}^{n} \frac{Z_{i}}{\sqrt{i(i-1)}}}{\sqrt{\sum\limits_{i = 2}^{n} {Z_{i}^{2}}}}=(k-1) \frac{\sum\limits_{i=k}^{n} \frac{Y_{i}}{\sqrt{i(i-1)}}}{\sqrt{\sum\limits_{i = 2}^{n} {Y_{i}^{2}}}} $$

a lengthy calculation (similar to that in Pollak et al. 1993) yields for 2 ≤ kn

$${{\Lambda}_{k}^{n}}(\delta)= \{{{\Lambda}_{2}^{2}}I\{k = 2\}+I\{k \neq 2\}\}e^{-\frac{1}{2}(k-1)^{2} \delta^{2} [\frac{1}{k-1}-\frac{1}{n}]}e^{\frac{1}{2}\delta^{2} {U_{k}^{2}}(n)} \frac{g_{0,\delta U_{k}(n)}(n-2)}{g_{0,0}(n-2)} . $$

(Note that \(\frac {g_{0,a}(m)}{g_{0,0}(m)}=\psi _{0,a}(m)e^{a\sqrt {m}}\). For the sake of precision, it is advisable to first calculate the log of the components of \({{\Lambda }_{k}^{n}}(\delta )\) and then exponentiate their sum.)

As an example where m can be very large, we consider the list of weights at birth of 196710 babies born at a large hospital in Israel between 13/10/2004 and 18/5/2017 whose weight at birth was between 2000 and 5000 grams. Figure 1 is a histogram of the data. A normal distribution seems to fit the data well.

Fig. 1

Weight (in grams) at birth of 196710 infants born at a large hospital in Israel between 13/10/2004 and 19/5/2017

During the last decades, worldwide, obesity and macrosomia have been on the rise. It would have been reasonable to monitor the weight of newborn infants for an increase in mean. Since a rise could be gradual, it would be reasonable to start by trying to detect a small change — for example’s sake, we choose δ = 0.1, an increase of one tenth of a standard deviation (ca. 46 grams). Figure 2 presents the sequence of surveillance statistics Rn for the first 5000 observations and Fig. 3 for the first 65000 observations.

Fig. 2

A Shiryaev–Roberts control chart for the weights of the first 5000 babies described in Fig. 1 born after 13/10/2004

Fig. 3

A Shiryaev–Roberts control chart for the weights of the first 65000 babies described in Fig. 1 born after 13/10/2004

The choice of the threshold A is made in light of the risk one is willing to take regarding a false alarm. Suppose one were willing to tolerate a false alarm on the average once in 20 years. With an average of roughly 15000 babies born each year, this would mean a false alarm on the average once in 300000 observations. Using a renewal-theoretic approximation for the ARL2FA (Pollak 1987), this means that the cutoff level A should be 300000/1.06 ∼ 283000. It is clear from Figs. 2 and 3 that such a change would not been detected within the first 65000 observations.

Figure 4 presents the sequence of surveillance statistics Rn for the entire sequence. Clearly, Rn exceeds A = 283000 a short while after the 70000th observation. In fact,the 70440th is the first to carry Rn over the threshold. This means that it would have been declared after the 70440th newborn that the mean weight has increased. Figure 5 depicts the loglikelihood function \(\log {\Lambda }^{70440}_{k}(\delta = 0.1)\), k = 1,…, 70440. This function attains its maximum at k = 65834, so 65834 can be regarded at the time of stopping (70400) as the maximum likelihood estimate of the changepoint. This could be interpreted as the increase being detected 4606 observations (ca. 4 months) after its occurrence.

Fig. 4

A Shiryaev–Roberts control chart for the weights all of the babies born described in Fig. 1 between 13/10/2004 and 18/5/2017

Fig. 5

The loglikelihood ratio of {increase at k} vs. {no increase} after declaring at observation #70440 that an increase is in effect

In fact, the average weight of the first 65833 newborns is 3286 grams, whereas the average weight of newborns #65834 − #67000 is 3332 grams (an increase of approximately 0.1 standard deviations).

Continuing with the ensuing newborns, Rn drops again, and until the end of the sequence does not cross the A = 283000 level again. In fact, the average weight of the newborns from babies #67001 − 196710 is 3291 grams. So, it seems as if the increase was either temporary, or apparent. Was the alarm false? After observing 67000 newborns, it would have seemed that the increase was real. With hindsight, one may either believe that the increase was real, but a change (a decrease) took place thereafter and the mean weight reverted to its original level, or one may interpret the episode as having been a false alarm.

The calculation of Rn was done in the following way. R1,…,R5000 were calculated using the recursions of Section ??; ensuing Rns were calculated by applying Theorem 1. Rather than calculating \(\frac {g_{0,\delta U_{k}(n)}(n-2)}{g_{0,0}(n-2)}\) by the recursions separately for each δUk(n), calculation time drops immensely by creating a fine enough grid of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) and interpolating. Creation of such a grid (in our example for n = 1,…, 5000 and approximately 500 values of x, spaced so that between adjacent x’s the function \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) is almost perfectly linear). This grid is calculated much faster by the recursions of Section ?? than by the method of Section ?? (after all, the recursion that generated \(\frac {g_{0,x}(5000)}{g_{0,0}(5000)}\) created all of \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\), n = 1,…, 5000 along the way, whereas by the method of Section ?? each \(\frac {g_{0,x}(n)}{g_{0,0}(n)}\) has to be calculated separately.)

Appendix D: Remarks

Remark 1

A lower bound can be obtained by Jensen’s inequality:

$$\begin{array}{@{}rcl@{}} \log(e^{\frac{1}{2}a^{2}}\frac{{\int}_{0}^{\infty}x^{m} e^{-\frac{1}{2}(x-a)^{2}}dx}{{\int}_{0}^{\infty}x^{m} e^{-\frac{1}{2}x^{2}}dx}) &=&\log({\int}_{0}^{\infty}e^{ax}\frac{x^{m}e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}x^{2}}dx}) \\ &\geq& {\int}_{0}^{\infty}ax\frac{x^{m}e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}x^{2}}dx} \\ &=&a\frac{{\int}_{0}^{\infty}x^{m + 1}e^{-\frac{1}{2}x^{2}}dx}{{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}x^{2}}dx} \\ &=&a\frac{2^{\frac{1}{2}m}{\Gamma}(\frac{m + 2}{2})} {2^{\frac{1}{2}(m-1)}{\Gamma}(\frac{m + 1}{2})} =a\frac{\sqrt{2}{\Gamma}(\frac{m + 2}{2})} {{\Gamma}(\frac{m + 1}{2})} \\ &=&a\sqrt{m}(1+O(\frac{1}{m^{2}})) \quad . \end{array} $$

Remark 2

Cook (2010) presented an upper bound

$$ {\int}_{c}^{\infty}x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx \leq \frac{\pi}{2\sqrt{2}}e^{-c^{2}-m+\frac{ma}{c}+\frac{m^{2}}{2c^{2}}}\frac{c^{m}}{c+\sqrt{c^{2}+\frac{4}{\pi}}} $$

for c > 0.On the log scale, Cook’s upper bound has an asymptotic(\(m \rightarrow \infty \)) order of magnitudem2, whereas Theorem 1 positsan order of magnitude \(m\log {m}\).

Remark 3

A similar type of analysis can be done for\({\int }_{-\infty }^{\infty }x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)(cf. Pollak et al. 1993). Note that

$$\begin{array}{@{}rcl@{}} {\int}_{-\infty}^{\infty}x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx &=&{\int}_{-\infty}^{0}x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx+{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx \\ &=&(-1)^{m}{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}(x+a)^{2}}dx+{\int}_{0}^{\infty}x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx \end{array} $$

and (unless m = 0)that (depending on a) one of the two integrals is asymptotically negligible with respectto the other. Thus Theorem 1 can be applied to obtain an approximation to themthmoment of a (non-truncated) normal distribution.

Remark 4

A similar type of analysis can be done when the truncation is from both ends. To see this, it sufficesto consider \({{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)for c > 0.Clearly, \(\frac {{{\int }_{0}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx}{{{\int }_{0}^{c}} x^{m}dx}\rightarrow e^{-\frac {1}{2}(c-a)^{2}}\)as\(m \rightarrow \infty \). It followsthat

$${{\int}_{0}^{c}} x^{m}e^{-\frac{1}{2}(x-a)^{2}}dx=e^{-\frac{1}{2}(c-a)^{2}}\frac{c^{m + 1}}{m + 1}(1+o(1)) . $$

Without loss of generality,assume that c > |b|. A recursionfor \({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\)that builds on thisis

$$h_{a,b,c}(m)=\frac{m + 1}{c^{2}}[h_{a,b,c}(m-2)+\frac{ac}{m}h_{a,b,c}(m-1)+(\frac{b}{c})^{m-1}e^{-\frac{1}{2}(b-a)^{2}}-e^{-\frac{1}{2}(c-a)^{2}}] $$

where\(h_{a,b,c}(m)=\frac {m + 1}{c^{m + 1}}{{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx\). (Notethat for |b| > c > b,\({{\int }_{b}^{c}} x^{m}e^{-\frac {1}{2}(x-a)^{2}}dx= (-1)^{m} {\int }_{-c}^{-b} x^{m}e^{-\frac {1}{2}(x+a)^{2}}dx\).)

Remark 5

A similar recursion can be constructed for densities proportional to\(e^{{const} \times x^{2}}I(b<x<c)\)whereconst > 0andb,c are finite.

Remark 6

Tables relevant to the truncated normal distribution have been around for over a century(cf. Pearson and Lee 1908and Lee 1914). Even today it is considered as part and parcelof applied distributions (cf. O’Connor 2011) and papers have been devoted to estimation ofits parameters (cf. Barr and Sherrill 1999; Horrace 2015; Liquet and Nazarathy 2015). Formost practical purposes, the first four moments of the distribution (mean, variance, skewnessand kurtosis) have been of applied interest. Higher order moments may appear in quadraturemethods (cf. Burkardt 2014); their order of magnitude would seem to be in the tens at most.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pollak, M., Shauly-Aharonov, M. A Double Recursion for Calculating Moments of the Truncated Normal Distribution and its Connection to Change Detection. Methodol Comput Appl Probab 21, 889–906 (2019). https://doi.org/10.1007/s11009-018-9622-7

Download citation


  • Changepoint
  • On-line
  • Shiryaev–Roberts
  • Surveillance

Mathematics Subject Classification (2010)

  • 62L10
  • 62E15
  • 60E05