Abstract
To describe the convergence in law of a sequence of probability distributions, “the principle of least action” is introduced nonparametrically into statistics. A probability measure should be treated as a path (in some sense) to apply calculus of variations, and it is shown that saddlepoints, which appear in the method of saddlepoint approximations, play a crucial role. An action integral, i.e., a functional of the saddlepoint, is defined as a definite integral of entropy. As a saddlepoint equation naturally appears in the Gâteaux derivative of that integral, a unique saddlepoint may be found as an optimal path for this variations problem. Consequently, by virtue of the unique correspondence between probability measures and saddlepoints, the convergence in law is clearly described by a decreasing sequence of action integrals. Thereby, a new criterion for evaluating the convergence is introduced into statistics and a novel interpretation of saddlepoints is provided.
Similar content being viewed by others
References
Amari, S. (1990). Differential-geometrical methods in statistics (2nd ed.). Berlin: Springer.
Bahadur, R. R. (1971). Some limit theorems in statistics, no 4 in regional conference series in applied mathematics. Philadelphia: SIAM.
Barndorff-Nielsen, O. E., & Cox, D. R. (1989). Asymptotic techniques for use in statistics. London: Chapman and Hall.
Brazzale, A. R., Davison, A. C., & Reid, N. (2007). Applied asymptotics. Cambridge: Cambridge University Press.
Butler, R. W. (2007). Saddlepoint approximations with applications. Cambridge: Cambridge University Press.
Daniels, H. E. (1954). Saddlepoint approximations in statistics. Annals of Mathematical Statistics, 25, 631–650.
Daniels, H. E. (1980). Exact saddlepoint approximations. Biometrika, 67, 59–63.
DasGupta, A. (2008). Asymptotic theory of statistics and probability. New York: Springer.
de Bruijn, N. G. (1970). Asymptotic methods in analysis (3rd ed.). Amsterdam: North Holland.
Dembo, A., & Zeitouni, O. (1998). Large deviations techniques and applications (2nd ed.). New York: Springer.
Dupuis, P., & Ellis, R. S. (1997). A weak convergence approach to the theory of large deviations. New York: Wiley.
Ellis, R. S. (2006). Entropy, large deviations, and statistical mechanics, classics in mathematics. Berlin: Springer.
Field, C. A. (1985). Approach to normality of mean and M-estimators of location. Canadian Journal of Statistics, 13, 201–210.
Field, C. A., & Ronchetti, E. M. (1990). Small sample asymptotics. Hayward: Institute of Mathematical Statistics.
Gelfand, I. M. (1963). Calculus of variations. New Jersey: Prentice-Hall.
Hall, P. (1992). The bootstrap and edgeworth expansion. New York: Springer.
Jensen, J. L. (1995). Saddlepoint approximations. Oxford: Clarendon Press.
Kolassa, J. E. (1997). Series approximation methods in statistics. Lecture notes in statistics (Vol. 88). New York: Springer.
Kotz, S., Johnson, N. L., & Read, C. B. (1983). Encyclopedia of statistical sciences (Vol. 3). New York: Wiley.
Laha, R. G., & Rohatgi, V. K. (1979). Probability theory. New York: Wiley.
Lukacs, E. (1970). Characteristic functions (2nd ed.). London: Charles Griffin.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley.
Shiryaev, A. N. (1996). Probability (2nd ed.). New York: Springer.
Takeuchi, H. (2006). Tauberian property in saddlepoint approximations. Bulletin of Informatics and Cybernetics, 38, 59–69.
Takeuchi, H. (2013). Correspondence between saddlepoint and probability distribution. Journal of the Japan Statistical Society, 42(2), 185–208. (in Japanese).
Takeuchi, H. (2014). On a convexity of saddlepoint and its curvature. Journal of the Japan Statistical Society, 44(1), 1–17. (in Japanese).
Takeuchi, H. (2015). The sp-transform of probability distributions. Journal of the Japan Statistical Society, 45(1), 19–40. (in Japanese).
Takeuchi, H. (2016). On \(\gamma\)-decomposition of probability distributions. Journal of the Japan Statistical Society, 45(2), 231–245. (in Japanese).
Takeuchi, H. (2017). On a comparison between Lévy’s inversion formula and saddlepoint approximations. Journal of the Japan Statistical Society, 46(2), 113–135. (in Japanese).
Acknowledgements
The author would like to express his sincere thanks to the referees for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 8.1
Let \(\alpha (t)\) be the saddlepoint of a probability distribution \(F \in {{\mathcal {P}}}\). The relationship between the i-th differential coefficient of \(\alpha (t)\) at \(t = \mu\) and the i-th cumulant \(\kappa _i\) of F is as follows:
Conversely, let \(\alpha ^{(i)} :=\alpha ^{(i)}(\mu )\). Then the cumulants are as follows.
Proof
(23) can be obtained by continuous differentiation of the saddlepoint equation (2) with respect to t. (24) is an obvious consequence of (23). \(\square\)
Hereafter, we suppose that F has a finite support \([-L, L] \subset {{\mathbb {R}}}\). Let \(F_n\) be the empirical distribution of F, and \(M_n\) and M their moment generating functions, respectively.
Lemma 8.2
We suppose that a function \(\alpha : t \in {{\mathbb {R}}} \mapsto \alpha (t) \in {{\mathbb {R}}}\) belongs to the \(C^1\) class and is strictly increasing on a neighborhood of \(t = \mu\). If we define
for some \(\delta > 0\), then we have the following for sufficiently small \(\varepsilon > 0\):
-
(i)
\(\Delta (\varepsilon ) > 0\).
-
(ii)
There exists a \(\delta ' > \delta\) such that \(\Delta (\varepsilon ) \le \varepsilon \max _{{t \in I_{{\delta ^{\prime}}} }} \alpha ^{\prime}(t)\).
Proof
By assumption, there exists some \(\delta ' > 0\) such that \(\alpha = \alpha (t)\) is strictly increasing on \(I_{\delta '}\). We fix any \(\delta\) with \(0< \delta < \delta '\), and for any \(\varepsilon\) such that \(0< \varepsilon < \delta ' - \delta\) we have \(I_{\delta + \varepsilon } \subset I_{\delta '}\). Therefore, \(\alpha = \alpha (t)\), \(\alpha = \alpha (t + \varepsilon )\), and \(\alpha = \alpha (t - \varepsilon )\) are well defined; furthermore, they are uniformly continuous and strictly increasing on \(I_{\delta }\). We define \(\Delta ^+ (\varepsilon )\) and \(\Delta ^-(\varepsilon )\) by
(i) If \(t' \in I_{\delta }\) and \(\varepsilon > 0\), then \(\Delta ^+ (\varepsilon ) \ge \alpha (t' + \varepsilon ) - \alpha (t') > 0.\) Likewise, \(\Delta ^-(\varepsilon ) > 0\) is also true. Hence, we have \(\Delta (\varepsilon ) = \max \{ \Delta ^+ (\varepsilon ), \, \Delta ^- (\varepsilon ) \} > 0\).
(ii) As \(\alpha (t) = \alpha (t - \varepsilon ) + \varepsilon \alpha '(t - \theta \varepsilon )\) for some \(\theta\) \((0< \theta < 1)\), we have \(\alpha (t) - \alpha (t - \varepsilon ) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t)\) and \(\alpha (t + \varepsilon ) - \alpha (t) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t)\) for \(t \in I_{\delta }\). Hence, \(\Delta (\varepsilon ) \le \varepsilon \max _{t \in I_{\delta + \varepsilon }} \alpha '(t) \le \varepsilon \max _{t \in I_{\delta '}} \alpha '(t).\) \(\square\)
Lemma 8.3
We assume that a function \(t : \alpha \in {{\mathbb {R}}} \rightarrow t(\alpha ) \in {{\mathbb {R}}}\) belongs to the \(C^1\) class and is strictly increasing on a neighborhood of the origin with \(t(0) = \mu\). Furthermore, for a sequence of continuous functions \(\{ t_n \}_{n \in {{\mathbb {N}}}}\) we assume that
for some \(\eta > 0\). Then there exists a \(\delta > 0\) such that
where \(\alpha _n(t)\) and \(\alpha (t)\) are the inverse functions of \(t_n(\alpha )\) and \(t(\alpha )\), respectively.
Proof
By (26), for any \(\varepsilon > 0\) there exists an integer \(N_0\) such that if \(n \ge N_0\), then we have \((\alpha , t_n(\alpha )) \in \{ (\alpha , t) : |\alpha | \le \eta , \, |t - t(\alpha )| < \varepsilon \}.\) Moreover, there exists a positive constant \(\delta\), which does not depend on \(\varepsilon\), such that \(t(-\eta ) + \varepsilon< \mu - \delta< \mu< \mu + \delta < t(+\eta ) - \varepsilon .\) With this \(\delta\), we have
for \(n \ge N_0\). As the quantity \(\Delta (\varepsilon )\) defined in Lemma 8.2 is positive and independent of t, we have for \(n \ge N_0\)
by (28). Hence,
and by Lemma 8.2 (ii), the conclusion follows upon letting \(\varepsilon \rightarrow 0\). \(\square\)
The following corollary to Lemma 8.3 holds obviously.
Corollary 8.1
If we replace \(t(\alpha )\) with its inverse \(\alpha (t)\) in Lemma 8.3, then the same conclusion follows under the condition \(\alpha (\mu ) = 0\).
Lemma 8.4
For any fixed \(\eta > 0\), we have the following:
Proof
(i) As \(F_n(x) - F(x)\) is of bounded variation for \(|x| \le L\),
(ii) \(\left| \frac{\partial }{\partial \alpha } e^{\alpha x} \right|\) is F-integrable, as for \(|\alpha | \le \eta\) we have \(\left| \frac{\partial }{\partial \alpha } e^{\alpha x} \right| \le |x| e ^{\eta |x|}\) and
Therefore, we can interchange differentiation and integration:
Hence,
with probability one. \(\square\)
Lemma 8.5
For some \(\eta > 0\), we have \(\inf _{|\alpha | \le \eta } M(\alpha ) > 0.\) Furthermore, there exists \(C > 0\) such that \(\inf _{|\alpha | \le \eta } M_n(\alpha ) \ge C\) for sufficiently large n with probability one.
Proof
For any sufficiently small \(\varepsilon > 0\), there exists an \(\eta > 0\) such that if \(|\alpha | \le \eta\), then \(M(\alpha ) > 1 - \varepsilon\). Thus, \(\inf _{|\alpha | \le \eta } M(\alpha ) \ge 1 - \varepsilon > 0.\) By Lemma 8.4, for any \(\varepsilon ' > 0\), there exists an \(N_0 \ge 1\) such that if \(n \ge N_0\), then \(M_n(\alpha ) \ge M(\alpha ) - \varepsilon '\) for all \(|\alpha | \le \eta\). Hence, if \(\varepsilon ' < 1 - \varepsilon\) and \(C :=1 - \varepsilon - \varepsilon '\), then
for sufficiently large n with probability one. \(\square\)
Rights and permissions
About this article
Cite this article
Takeuchi, H. Probability distribution as a path and its action integral. Jpn J Stat Data Sci 3, 485–511 (2020). https://doi.org/10.1007/s42081-019-00062-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-019-00062-y
Keywords
- Action integral
- Analytic characteristic function
- Calculus of variations
- Gâteaux derivative
- Principle of least action
- Saddlepoint