Skip to main content
Log in

Probability distribution as a path and its action integral

  • Original Paper
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

To describe the convergence in law of a sequence of probability distributions, “the principle of least action” is introduced nonparametrically into statistics. A probability measure should be treated as a path (in some sense) to apply calculus of variations, and it is shown that saddlepoints, which appear in the method of saddlepoint approximations, play a crucial role. An action integral, i.e., a functional of the saddlepoint, is defined as a definite integral of entropy. As a saddlepoint equation naturally appears in the Gâteaux derivative of that integral, a unique saddlepoint may be found as an optimal path for this variations problem. Consequently, by virtue of the unique correspondence between probability measures and saddlepoints, the convergence in law is clearly described by a decreasing sequence of action integrals. Thereby, a new criterion for evaluating the convergence is introduced into statistics and a novel interpretation of saddlepoints is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Amari, S. (1990). Differential-geometrical methods in statistics (2nd ed.). Berlin: Springer.

    MATH  Google Scholar 

  • Bahadur, R. R. (1971). Some limit theorems in statistics, no 4 in regional conference series in applied mathematics. Philadelphia: SIAM.

    Book  Google Scholar 

  • Barndorff-Nielsen, O. E., & Cox, D. R. (1989). Asymptotic techniques for use in statistics. London: Chapman and Hall.

    Book  Google Scholar 

  • Brazzale, A. R., Davison, A. C., & Reid, N. (2007). Applied asymptotics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Butler, R. W. (2007). Saddlepoint approximations with applications. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Daniels, H. E. (1954). Saddlepoint approximations in statistics. Annals of Mathematical Statistics, 25, 631–650.

    Article  MathSciNet  Google Scholar 

  • Daniels, H. E. (1980). Exact saddlepoint approximations. Biometrika, 67, 59–63.

    Article  MathSciNet  Google Scholar 

  • DasGupta, A. (2008). Asymptotic theory of statistics and probability. New York: Springer.

    MATH  Google Scholar 

  • de Bruijn, N. G. (1970). Asymptotic methods in analysis (3rd ed.). Amsterdam: North Holland.

    Google Scholar 

  • Dembo, A., & Zeitouni, O. (1998). Large deviations techniques and applications (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Dupuis, P., & Ellis, R. S. (1997). A weak convergence approach to the theory of large deviations. New York: Wiley.

    Book  Google Scholar 

  • Ellis, R. S. (2006). Entropy, large deviations, and statistical mechanics, classics in mathematics. Berlin: Springer.

    Book  Google Scholar 

  • Field, C. A. (1985). Approach to normality of mean and M-estimators of location. Canadian Journal of Statistics, 13, 201–210.

    Article  MathSciNet  Google Scholar 

  • Field, C. A., & Ronchetti, E. M. (1990). Small sample asymptotics. Hayward: Institute of Mathematical Statistics.

    MATH  Google Scholar 

  • Gelfand, I. M. (1963). Calculus of variations. New Jersey: Prentice-Hall.

    Google Scholar 

  • Hall, P. (1992). The bootstrap and edgeworth expansion. New York: Springer.

    Book  Google Scholar 

  • Jensen, J. L. (1995). Saddlepoint approximations. Oxford: Clarendon Press.

    MATH  Google Scholar 

  • Kolassa, J. E. (1997). Series approximation methods in statistics. Lecture notes in statistics (Vol. 88). New York: Springer.

    MATH  Google Scholar 

  • Kotz, S., Johnson, N. L., & Read, C. B. (1983). Encyclopedia of statistical sciences (Vol. 3). New York: Wiley.

    MATH  Google Scholar 

  • Laha, R. G., & Rohatgi, V. K. (1979). Probability theory. New York: Wiley.

    MATH  Google Scholar 

  • Lukacs, E. (1970). Characteristic functions (2nd ed.). London: Charles Griffin.

    MATH  Google Scholar 

  • Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley.

    Book  Google Scholar 

  • Shiryaev, A. N. (1996). Probability (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Takeuchi, H. (2006). Tauberian property in saddlepoint approximations. Bulletin of Informatics and Cybernetics, 38, 59–69.

    Article  MathSciNet  Google Scholar 

  • Takeuchi, H. (2013). Correspondence between saddlepoint and probability distribution. Journal of the Japan Statistical Society, 42(2), 185–208. (in Japanese).

    MathSciNet  MATH  Google Scholar 

  • Takeuchi, H. (2014). On a convexity of saddlepoint and its curvature. Journal of the Japan Statistical Society, 44(1), 1–17. (in Japanese).

    Article  MathSciNet  Google Scholar 

  • Takeuchi, H. (2015). The sp-transform of probability distributions. Journal of the Japan Statistical Society, 45(1), 19–40. (in Japanese).

    MathSciNet  Google Scholar 

  • Takeuchi, H. (2016). On \(\gamma\)-decomposition of probability distributions. Journal of the Japan Statistical Society, 45(2), 231–245. (in Japanese).

    MathSciNet  Google Scholar 

  • Takeuchi, H. (2017). On a comparison between Lévy’s inversion formula and saddlepoint approximations. Journal of the Japan Statistical Society, 46(2), 113–135. (in Japanese).

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The author would like to express his sincere thanks to the referees for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroyuki Takeuchi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 8.1

Let \(\alpha (t)\) be the saddlepoint of a probability distribution \(F \in {{\mathcal {P}}}\). The relationship between the i-th differential coefficient of \(\alpha (t)\) at \(t = \mu\) and the i-th cumulant \(\kappa _i\) of F is as follows:

$$\begin{aligned} \alpha (\mu )= & \,\, 0, \quad \alpha '(\mu ) = \frac{1}{\kappa _2}, \quad \alpha ''(\mu ) = - \frac{\kappa _3}{\kappa _2^3}, \quad \alpha ^{(3)}(\mu ) = 3 \frac{\kappa _3^2}{\kappa _2^5} - \frac{\kappa _4}{\kappa _2^4}, \nonumber \\ \alpha ^{(4)}(\mu )= & {} - \! 15 \frac{\kappa _3^3}{\kappa _2^7} + 10 \frac{\kappa _3 \kappa _4}{\kappa _2^6} - \frac{\kappa _5}{\kappa _2^5}, \nonumber \\ \alpha ^{(5)}(\mu )= & {} \,\, 105 \frac{\kappa _3^4}{\kappa _2^9} - 105 \frac{\kappa _3^2 \kappa _4}{\kappa _2^8} + \frac{15 \kappa _3 \kappa _5 + 10 \kappa _4^2}{\kappa _2^7} - \frac{\kappa _6}{\kappa _2^6}. \ \cdots \end{aligned}$$
(23)

Conversely, let \(\alpha ^{(i)} :=\alpha ^{(i)}(\mu )\). Then the cumulants are as follows.

$$\begin{aligned} \kappa _1 & = \alpha ^{-1}(0), \quad \kappa _2 = \frac{1}{\alpha '}, \quad \kappa _3 = - \frac{\alpha ''}{(\alpha ')^3}, \quad \kappa _4 = 3 \frac{(\alpha '')^2}{(\alpha ')^5} - \frac{\alpha ^{(3)}}{(\alpha ')^4}, \nonumber \\ \kappa _5 & = - 15 \frac{(\alpha '')^3}{(\alpha ')^7} + 10 \frac{\alpha ^{(3)} \alpha ''}{(\alpha ')^6} - \frac{\alpha ^{(4)}}{(\alpha ')^5}, \ \cdots \end{aligned}$$
(24)

Proof

(23) can be obtained by continuous differentiation of the saddlepoint equation (2) with respect to t. (24) is an obvious consequence of (23). \(\square\)

Hereafter, we suppose that F has a finite support \([-L, L] \subset {{\mathbb {R}}}\). Let \(F_n\) be the empirical distribution of F, and \(M_n\) and M their moment generating functions, respectively.

Lemma 8.2

We suppose that a function \(\alpha : t \in {{\mathbb {R}}} \mapsto \alpha (t) \in {{\mathbb {R}}}\) belongs to the \(C^1\) class and is strictly increasing on a neighborhood of \(t = \mu\). If we define

$$\begin{aligned} \Delta (\varepsilon ) :=\sup _{t \in I_{\delta }} \{ \alpha (t + \varepsilon ) - \alpha (t), \, \alpha (t) - \alpha (t - \varepsilon ) \}, \end{aligned}$$
(25)

for some \(\delta > 0\), then we have the following for sufficiently small \(\varepsilon > 0\):

  1. (i)

    \(\Delta (\varepsilon ) > 0\).

  2. (ii)

    There exists a \(\delta ' > \delta\) such that \(\Delta (\varepsilon ) \le \varepsilon \max _{{t \in I_{{\delta ^{\prime}}} }} \alpha ^{\prime}(t)\).

Proof

By assumption, there exists some \(\delta ' > 0\) such that \(\alpha = \alpha (t)\) is strictly increasing on \(I_{\delta '}\). We fix any \(\delta\) with \(0< \delta < \delta '\), and for any \(\varepsilon\) such that \(0< \varepsilon < \delta ' - \delta\) we have \(I_{\delta + \varepsilon } \subset I_{\delta '}\). Therefore, \(\alpha = \alpha (t)\), \(\alpha = \alpha (t + \varepsilon )\), and \(\alpha = \alpha (t - \varepsilon )\) are well defined; furthermore, they are uniformly continuous and strictly increasing on \(I_{\delta }\). We define \(\Delta ^+ (\varepsilon )\) and \(\Delta ^-(\varepsilon )\) by

$$\Delta ^+ (\varepsilon ) :=\sup _{t \in I_{\delta }} \{ \alpha (t + \varepsilon ) - \alpha (t) \} , \quad \Delta ^-(\varepsilon ) :=\sup _{t \in I_{\delta }} \{ \alpha (t) - \alpha (t - \varepsilon ) \}.$$

(i) If \(t' \in I_{\delta }\) and \(\varepsilon > 0\), then \(\Delta ^+ (\varepsilon ) \ge \alpha (t' + \varepsilon ) - \alpha (t') > 0.\) Likewise, \(\Delta ^-(\varepsilon ) > 0\) is also true. Hence, we have \(\Delta (\varepsilon ) = \max \{ \Delta ^+ (\varepsilon ), \, \Delta ^- (\varepsilon ) \} > 0\).

(ii) As \(\alpha (t) = \alpha (t - \varepsilon ) + \varepsilon \alpha '(t - \theta \varepsilon )\) for some \(\theta\) \((0< \theta < 1)\), we have \(\alpha (t) - \alpha (t - \varepsilon ) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t)\) and \(\alpha (t + \varepsilon ) - \alpha (t) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t)\) for \(t \in I_{\delta }\). Hence, \(\Delta (\varepsilon ) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t) \le \varepsilon \max _{t \in I_{\delta '}} \alpha '(t).\) \(\square\)

Lemma 8.3

We assume that a function \(t : \alpha \in {{\mathbb {R}}} \rightarrow t(\alpha ) \in {{\mathbb {R}}}\) belongs to the \(C^1\) class and is strictly increasing on a neighborhood of the origin with \(t(0) = \mu\). Furthermore, for a sequence of continuous functions \(\{ t_n \}_{n \in {{\mathbb {N}}}}\) we assume that

$$\lim _{n \rightarrow \infty } \sup _{|\alpha | \le \eta } |t_n(\alpha ) - t(\alpha )| = 0,$$
(26)

for some \(\eta > 0\). Then there exists a \(\delta > 0\) such that

$$\lim _{n \rightarrow \infty } \sup _{t \in I_{\delta }} |\alpha _n(t) - \alpha (t)| = 0,$$
(27)

where \(\alpha _n(t)\) and \(\alpha (t)\) are the inverse functions of \(t_n(\alpha )\) and \(t(\alpha )\), respectively.

Proof

By (26), for any \(\varepsilon > 0\) there exists an integer \(N_0\) such that if \(n \ge N_0\), then we have \((\alpha , t_n(\alpha )) \in \{ (\alpha , t) : |\alpha | \le \eta , \, |t - t(\alpha )| < \varepsilon \}.\) Moreover, there exists a positive constant \(\delta\), which does not depend on \(\varepsilon\), such that \(t(-\eta ) + \varepsilon< \mu - \delta< \mu< \mu + \delta < t(+\eta ) - \varepsilon .\) With this \(\delta\), we have

$$(t, \alpha _n(t)) \in \{ (t, \alpha ) : |t - \mu | \le \delta , \ \alpha (t - \varepsilon )< \alpha < \alpha (t + \varepsilon ) \},$$
(28)

for \(n \ge N_0\). As the quantity \(\Delta (\varepsilon )\) defined in Lemma 8.2 is positive and independent of t, we have for \(n \ge N_0\)

$$(t, \alpha _n(t)) \in \{ (t, \alpha ) : |t - \mu | \le \delta , \ |\alpha - \alpha (t)| < \Delta (\varepsilon ) \} ,$$

by (28). Hence,

$$\limsup _{n \rightarrow \infty } \sup _{|t - \mu | \le \delta } |\alpha _n(t) - \alpha (t)| \le \Delta (\varepsilon ),$$

and by Lemma 8.2 (ii), the conclusion follows upon letting \(\varepsilon \rightarrow 0\). \(\square\)

The following corollary to Lemma 8.3 holds obviously.

Corollary 8.1

If we replace \(t(\alpha )\) with its inverse \(\alpha (t)\) in Lemma 8.3, then the same conclusion follows under the condition \(\alpha (\mu ) = 0\).

Lemma 8.4

For any fixed \(\eta > 0\), we have the following:

$$\begin{aligned}&\mathrm{(i)} \sup _{|\alpha | \le \eta } |M_n(\alpha ) - M(\alpha )| \le 4e^{\eta L} \sup _{|x| \le L} |F_n(x) - F(x)|, \quad \text{ a.s. } \\&\mathrm{(ii)} \sup _{|\alpha | \le \eta } |M'_n(\alpha ) - M'(\alpha )| \le 4L e^{\eta L} \sup _{|x| \le L} |F_n(x) - F(x)|, \quad \text{ a.s. } \end{aligned}$$

Proof

(i) As \(F_n(x) - F(x)\) is of bounded variation for \(|x| \le L\),

$$\begin{aligned}&\sup _{|\alpha | \le \eta } |M_n(\alpha ) - M(\alpha )| \\&\quad = \sup _{|\alpha | \le \eta } \left| \bigl [ e^{\alpha x} \{ F_n(x) - F(x) \} \bigr ]_{-L}^{L} - \int _{-L}^{L} \{ F_n(x) - F(x) \} \, {\text{d}}e^{\alpha x} \right| \\&\quad \le 4e^{\eta L} \sup _{|x| \le L} |F_n(x) - F(x)|, \quad \text{ a.s. } \end{aligned}$$

(ii) \(\left| \frac{\partial }{\partial \alpha } e^{\alpha x} \right|\) is F-integrable, as for \(|\alpha | \le \eta\) we have \(\left| \frac{\partial }{\partial \alpha } e^{\alpha x} \right| \le |x| e ^{\eta |x|}\) and

$$\int _{-L}^{L} |x| e ^{\eta |x|} \, {\text{d}}F(x) \le L e^{\eta L}.$$

Therefore, we can interchange differentiation and integration:

$$M'(\alpha ) = \frac{\partial }{\partial \alpha } \int _{-L}^{L} e^{\alpha x} \, {\text{d}}F(x) = \int _{-L}^{L} x e^{\alpha x} \, {\text{d}}F(x).$$

Hence,

$$\begin{aligned}&\sup _{|\alpha | \le \eta } |M'_n(\alpha ) - M'(\alpha )| \\&\quad = \sup _{|\alpha | \le \eta } \left| \bigl [ x e^{\alpha x} \{ F_n(x) - F(x) \} \bigr ]_{-L}^{L} - \int _{-L}^{L} \{ F_n(x) - F(x) \} \, {\text{d}} x e^{\alpha x} \right| \\&\quad \le 4L e^{\eta L} \sup _{|x| \le L} |F_n(x) - F(x)|, \end{aligned}$$

with probability one. \(\square\)

Lemma 8.5

For some \(\eta > 0\), we have \(\inf _{|\alpha | \le \eta } M(\alpha ) > 0.\) Furthermore, there exists \(C > 0\) such that \(\inf _{|\alpha | \le \eta } M_n(\alpha ) \ge C\) for sufficiently large n with probability one.

Proof

For any sufficiently small \(\varepsilon > 0\), there exists an \(\eta > 0\) such that if \(|\alpha | \le \eta\), then \(M(\alpha ) > 1 - \varepsilon\). Thus, \(\inf _{|\alpha | \le \eta } M(\alpha ) \ge 1 - \varepsilon > 0.\) By Lemma 8.4, for any \(\varepsilon ' > 0\), there exists an \(N_0 \ge 1\) such that if \(n \ge N_0\), then \(M_n(\alpha ) \ge M(\alpha ) - \varepsilon '\) for all \(|\alpha | \le \eta\). Hence, if \(\varepsilon ' < 1 - \varepsilon\) and \(C :=1 - \varepsilon - \varepsilon '\), then

$$\begin{aligned} \inf _{|\alpha | \le \eta } M_n(\alpha ) \ge \inf _{|\alpha | \le \eta } M(\alpha ) - \varepsilon ' \ge 1 - \varepsilon - \varepsilon ' = C > 0, \end{aligned}$$

for sufficiently large n with probability one. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Takeuchi, H. Probability distribution as a path and its action integral. Jpn J Stat Data Sci 3, 485–511 (2020). https://doi.org/10.1007/s42081-019-00062-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-019-00062-y

Keywords

Mathematics Subject Classification

Navigation