# Probability distribution as a path and its action integral

• Hiroyuki Takeuchi
Original Paper

## Abstract

To describe the convergence in law of a sequence of probability distributions, “the principle of least action” is introduced nonparametrically into statistics. A probability measure should be treated as a path (in some sense) to apply calculus of variations, and it is shown that saddlepoints, which appear in the method of saddlepoint approximations, play a crucial role. An action integral, i.e., a functional of the saddlepoint, is defined as a definite integral of entropy. As a saddlepoint equation naturally appears in the Gâteaux derivative of that integral, a unique saddlepoint may be found as an optimal path for this variations problem. Consequently, by virtue of the unique correspondence between probability measures and saddlepoints, the convergence in law is clearly described by a decreasing sequence of action integrals. Thereby, a new criterion for evaluating the convergence is introduced into statistics and a novel interpretation of saddlepoints is provided.

## Keywords

Action integral Analytic characteristic function Calculus of variations Gâteaux derivative Principle of least action Saddlepoint

## Mathematics Subject Classification

62G20 62G30 62G99

## 1 Introduction

There are several criteria for the discrepancy between probability distributions, such as the Lévy–Prokhorov metric, the Kolmogorov distance, or the Kullback–Leibler divergence in statistics, probability, or information theory (Amari 1990; Hall 1992; Serfling 1980; Shiryaev 1996). The Edgeworth expansion may also be effective in capturing the difference between a normalized sample mean distribution and the standard normal. Nevertheless, these criteria appear to be arbitrary in a sense, and it is natural to ask whether they can be unified. In this study, a concept of entropy is used as a criterion rather than these distances by introducing “the principle of least action” into statistics. That is, discrepancy will be measured by entropy. It has been widely and successfully used in physics, control theory and statistics. Even though there is the likelihood principle in statistics, it will be applied nonparametrically.

Through the action integral, which is defined as a definite integral of the Lagrangian with respect to the time parameter, the convergence in law $$F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F$$ may be regarded as a decreasing process of it. Furthermore, convergence speed may be clearly evaluated by the asymptotic expansion of that integral. The action integral is a functional, and a stationary point that minimizes it is called an optimal path, if it exists.

To define the action integral for a sequence of probability distributions, the following should be considered. First, to apply the calculus of variations, it should be proved that there is a class of probability distributions that can be treated as paths. Secondly, the limit distribution F should be a unique stationary point of the integral. Finally, the action integral of F should be minimal, compared to that of the $$F_n$$, to express that F is the most stable state.

In Sect. 2, the Legendre transform of the cumulant generating function plays a key role, which is called entropy in statistical mechanics (Ellis 2006). Then an action integral of a distribution is defined by the definite integral of this entropy, which is a functional of the saddlepoint. It is shown in Takeuchi (2013) that the saddlepoint is a strictly increasing function, which uniquely corresponds to the probability distribution through the behavior on a neighborhood of the expectation. Some of its properties resemble those of the analytic characteristic function of a distribution and they are obtained in Theorems 2.1 and 2.2.

In Sect. 3, the relationship between the Gâteaux derivative of an action integral and the saddlepoint equation is derived from the fundamental lemma of calculus of variations. A unique saddlepoint is obtained as an optimal solution, i.e., optimal path, of this problem. This ensures that the corresponding probability distribution can be regarded as a path. In Theorem 3.1, the main result of this paper, it is shown that the action integral is globally minimized by a unique saddlepoint.

In Sect. 4, it is shown that the local uniform convergence of the saddlepoint implies convergence of the action integral as well as convergence in law. The action integral of the distribution for a normalized sum of sample variables is evaluated for the central limit theorem. Similarly, the de Moivre–Laplace’s theorem, the normal approximation to the Poisson distribution, and Poisson’s law of small numbers are also obtained. The main concern in calculus of variations is to derive a stationary point, i.e., an optimal path, of an action integral, and valuable information of the integral may be lost. However, as a scalar quantity, the integral may be effectively used to describe convergence in law. Asymptotic expansions and numerical examples prove this fact.

A theory based on the principle of least action has several applications in statistics. For example, a new expression for the discrepancy of conjugate distributions is obtained in Sect. 5 by means of the Gâteaux derivative of the action integral with respect to the distribution.

In Sect. 6, the action integral of an empirical distribution is evaluated. Here, it is assumed that F has finite support. The lemmas required in this section and Sect. 4 are collected in Sect. 8.

On the complex plane, a saddlepoint is the point of intersection of the real axis and an integral path that localizes the integrand of Lévy’s inversion formula in the method of saddlepoint approximations (Daniels 1954; de Bruijn 1970; Field and Ronchetti 1990). Despite its importance in saddlepoint approximations, the steepest descent method, and the large deviation principle, this interesting function has not received proper attention. A saddlepoint looks like a “Kuroko”, a stage assistant attired and hooded in black at Japanese Kabuki, and will effectively replace probability distributions.

## 2 Action integral of probability distributions

Gibbs’ variational principle asserts that the probability distribution that describes the equilibrium state is given by the solution to a calculus of variations problem. In this problem, the large deviation principle plays a crucial role, and the Legendre transform of the cumulant generating function $$K^*_F(t) = \sup _{ \alpha \in {{\mathbb {R}}}} \{ t \alpha - K_F(\alpha ) \}$$ is called entropy, see Ellis (2006). Bahadur (1971) used the Cramér-type large deviation theorem: $$\lim _{n \rightarrow \infty } \frac{1}{n} \log \mathrm{Pr} \left\{ \frac{1}{n} \sum _{i = 1}^n X_i \in A \right\} = - \inf _{t \in A} I(t)$$ for an asymptotic theory of statistics. The rate function I (t) is entropy (Ellis 2006) and is given by Legendre transform of the cumulant generating function of distribution F.

The maximizer $$\alpha = \alpha (t)$$ on the right-hand side of $$K^*_F(t)$$, if it exists, is called a saddlepoint of F. Under the following conditions (1), the existence of a saddlepoint is shown in Theorem 2.1, which also provides some fundamental properties of it, and whose proof is significantly simpler than that in Takeuchi (2013).
\begin{aligned}&\text{(1) } F \text{ has } \text{ an } \text{ analytic } \text{ characteristic } \text{ function. } \nonumber \\&\text{(2) } F \text{ is } \text{ not } \text{ degenerate } \text{ to } \text{ one } \text{ point. } \end{aligned}
(1)
The reader is referred to Laha and Rohatgi (1979) and Lukacs (1970) for the definition and properties of analytic characteristic functions. The following is called saddlepoint equation:
$$\frac{{\text{d}}}{{{\text{d}}\alpha }}K_{F} (\alpha ) = t.$$
(2)
In general, the large deviation and the Laplace principles are equivalent to the same rate function. In particular, the function is given by $$K^*_F(t)$$ in Cramér’s theorem (Dembo and Zeitouni 1998; Dupuis and Ellis 1997). In this study, “$$K^*_F(t)$$” is referred to as the entropy of the probability distributionF, as in Ellis (2006).

Field (1985) and Field and Ronchetti (1990) mentioned that the asymptotic normality of the M-estimator can be obtained using saddlepoints. Takeuchi (2014) showed that the curvature of a saddlepoint around the origin describes the asymptotic normality of a statistic. The central limit theorem can be characterized by a projection from the set of saddlepoints to a Riemannian manifold with the Fisher–Rao metric, and the projection does not depend on the coordinate system (Takeuchi 2015). For other properties and applications of saddlepoints, see Takeuchi (2006, 2016, 2017). For the methods of steepest descent and saddlepoint approximations, refer to Brazzale et al. (2007), de Bruijn (1970), Butler (2007), DasGupta (2008), Field and Ronchetti (1990), Jensen (1995) and Kolassa (1997).

As $$K_F : \alpha \in {{\mathbb {R}}} \mapsto K_F(\alpha ) \in {{\mathbb {R}}} \cup \{ +\infty \}$$ is a convex function, its Legendre transform may be written as follows:
\begin{aligned} K^*_F(t) = \sup _{ \alpha \in \mathrm{dom}{} { K_F}} \{ t \alpha - K_F(\alpha ) \}, \end{aligned}
(3)
where $$\mathrm{dom} K_F :=\{ \alpha : K_F(\alpha ) < \infty \}$$ is called the effective domain of $$K_F$$.

### Theorem 2.1

(Takeuchi 2013) Under the condition (1) we have the following on a neighborhood of$$\mu$$, that is$$t \in \{ t : |t - \mu | < \delta \}$$for sufficiently small$$\delta > 0$$, where$$\mu$$and$$\sigma ^2$$are the mean and variance of a distributionF, respectively.
1. (i)

The saddlepoint equation (2) has a unique solution$$\alpha = \alpha _t$$.

2. (ii)

The saddlepoint$$\alpha (t)$$exists and is equivalent to the solution$$\alpha _t$$.

3. (iii)

$$(K^*_F)'(t) = \alpha (t)$$.

4. (iv)

The inverse of the saddlepoint$$\alpha = \alpha (t)$$is given by$$t = K'_F(\alpha )$$.

5. (v)

$$t = \mu$$, $$\alpha (t) = 0$$and$$K^*_F(t) = 0$$are equivalent.

### Proof

It should be noted that (1) implies that $$K_F(\alpha ) \in C^{\infty }$$ for $$\alpha \in \mathrm{dom} K_F$$, and $$K_F$$ uniquely corresponds to the distribution F through the behavior on a neighborhood of the origin in $$\mathrm{dom} K_F$$ (Lukacs 1970).
1. (i)

As (1) implies that $$K''_F(0) = \sigma ^2 > 0$$, a continuous function $$t = K'_F(\alpha )$$ is strictly increasing on a neighborhood of the origin. Hence, $$K'_F(0) = \mu$$ implies that there exists some $$\delta > 0$$ such that if $$|t - \mu | < \delta$$ then the saddlepoint equation (2) has a unique solution $$\alpha _t = (K'_F)^{-1}(t)$$.

2. (ii)

As $$g(\alpha ) :=t \alpha - K_F(\alpha )$$ satisfies $$g''(0) = - K''_F(0) = - \sigma ^2 < 0$$, g is continuous and strictly convex upward. We have $$g'(\alpha _t) = t - K_F'(\alpha _t) = 0$$ from (i), and thus $$\alpha _t \in \mathrm{dom} K_F$$ is a unique maximizer of $$g(\alpha )$$. Hence, by the definition of saddlepoint, we have $$\alpha (t) = \alpha _t$$.

3. (iii)

By the proof of (ii), we have that $$K^*_F(t) = \max _{ \alpha \in \mathrm{dom}{} { K_F}} \{ t \alpha - K_F(\alpha ) \} = t \alpha (t) - K_F(\alpha (t))$$ As $$\alpha (t)$$ is differentiable, and the conclusion follows from (2).

4. (iv)

It is obvious from the relation $$K'_F(\alpha _F(t)) = t$$.

5. (v)

It follows immediately from the properties of $$g(\alpha )$$ in (ii) above, the fact that $$K'_F(\alpha )$$ is strictly increasing, $$K^*_F(t) = t \alpha (t) - K_F(\alpha (t))$$ in (iii), and (2).

$$\square$$
Let $$\overline{{\mathbb {R}}}_{+}$$ be the extended interval $$(0, +\infty ]$$. A probability distribution F has an analytic characteristic function if and only if there exist some $$a, b \in \overline{{\mathbb {R}}}_{+}$$ such that $$\mathrm{dom} K_F = (-a, b)$$ (Lukacs 1970). Then by Takeuchi (2013), the corresponding saddlepoint $$\alpha _F$$ is a real-valued function such that
\begin{aligned} \alpha _F : t \in A_{\mathrm{dom}{} { K_F}} \mapsto \alpha _F(t) \in \mathrm{dom} K_F, \end{aligned}
where $$A_{\mathrm{dom}{} { K_F}}$$ is the open interval $$\bigl ( K'_F(-a + 0), \, K'_F(b -0) \bigr )$$. It should be noted that $$\mu \in A_{\mathrm{dom}{} { K_F}}$$. The relationship between the saddlepoint and the moments of the corresponding distribution is given by Lemma 8.1.

The uniqueness of the correspondence between a probability distribution and its saddlepoint is given by Theorem 2.2, which was first proved in Takeuchi (2013). However, an improved proof will be provided here.

### Theorem 2.2

(Takeuchi 2013) We assume that (1) is satisfied. The behavior of the saddlepoint on a neighborhood of its zero point uniquely determines the corresponding probability distribution. Moreover, we have the following inversion formula:
$$K_{F} (\alpha ) = \left( {\int_{{\alpha ^{{ - 1}} (0)}}^{t} {\alpha _{F} } (u){\text{d}}u} \right)^{*} (\alpha )\quad {\text{for}}\quad t \in A_{{{\text{dom}}K_{F} }} .$$
(4)

### Proof

By Theorem 2.1 (iii), $$K^*_F(t)$$ is a primitive of $$\alpha _F(t)$$, and thus, with $$K^*_F(\mu ) = 0$$, we have
$$K_{F}^{*} (t) = \int_{\mu }^{t} {\alpha _{F} } (u){\text{d}}u,$$
(5)
for t in a neighborhood of $$t = \mu$$. This implies that there is a one-to-one correspondence between $$\alpha _F(t)$$ and $$K^*_F(t)$$ on this neighborhood. Furthermore, $$K^*_F(t)$$ uniquely corresponds to $$K_F(\alpha )$$ on a neighborhood of $$\alpha = 0$$ by the uniqueness of the Legendre transform for strictly convex functions. By the property of analytic characteristic functions, (5) uniquely determines the distribution F. Hence, the first part of the theorem is proved. Finally, we have (4) by the Legendre transform of (5) using $$\mu = \alpha ^{-1}(0)$$. $$\square$$

Field and Ronchetti (1990) mentioned the uniqueness and applied the saddlepoint to the central limit theorem.

Let $$t = t(\alpha )$$ be the inverse function of the saddlepoint $$\alpha = \alpha (t)$$. The inversion formula with respect to $$t(\alpha )$$ and $$K^*(t)$$ is given in Takeuchi (2013).

### Corollary 2.1

(Takeuchi 2013) We assume that (1) is satisfied. Then,
$$K_{F} (\alpha ) = \int_{0}^{\alpha } t (u){\text{d}}u\quad {\text{for}}\quad \alpha \in {\text{dom}}K_{F} .$$
(6)
Takeuchi (2013) obtained a dual relationship between these functions, as shown in Fig. 1.

In what follows, we provide examples of the cumulant generating function, the Legendre transform, and the corresponding saddlepoint for the univariate normal, the inverse Gaussian, the gamma, the binomial, and the multivariate normal distributions. As each effective domain $$\mathrm{dom} K_F$$ includes the origin $$\alpha = 0$$ as an inner point, the distributions have analytic characteristic function (Lukacs 1970). Hence, by Theorem 2.2, each distribution and its saddlepoint are in one-to-one correspondence. It should be noted that Daniels (1980) showed that the normal, the gamma, and the inverse Gaussian are the only distributions for which the saddlepoint approximations are exact up to normalization.

### Example 2.1

Univariate normal distribution. $$N(\mu , \sigma ^2)$$
\begin{aligned} K_N(\alpha ) & = \mu \alpha + \sigma ^2 \alpha ^2 / 2 \quad \text{ for } \quad \alpha \in {{\mathbb {R}}}, \\ K^*_N(t) & = (t - \mu )^2 / (2 \sigma ^2), \quad \alpha _N(t) = (t - \mu ) / \sigma ^2 \quad \text{ for } \quad t \in {{\mathbb {R}}}. \end{aligned}

### Example 2.2

Inverse Gaussian distribution. $${\text{IG}}(\mu ,\lambda )$$$$(\mu> 0, \ \lambda > 0)$$
\begin{aligned} K_{\text{IG}}(\alpha ) & = \frac{\lambda }{\mu } \left\{ 1 - \left( 1 - \frac{2 \mu ^2 \alpha }{\lambda } \right) ^{ 1 / 2} \right\} \quad \text{ for } \quad \alpha < \frac{\lambda }{2 \mu ^2}, \\ K^*_{\text{IG}}(t) & = \frac{\lambda }{2} \left( \frac{t}{\mu ^2} + \frac{1}{t} - \frac{2}{\mu } \right) , \quad \alpha _{\text{IG}}(t) = -\frac{\lambda }{2} \left( \frac{1}{t^2} - \frac{1}{\mu ^2} \right) \quad \text{ for } \quad t > 0. \end{aligned}

### Example 2.3

Gamma distribution. $${\varGamma }(\lambda , r)$$$$(\lambda> 0, \ r > 0)$$
\begin{aligned} K_{\varGamma }(\alpha ) & = -r \log \left( 1 - \lambda \alpha \right) \quad \text{ for } \quad \alpha < \frac{1}{\lambda }, \\ K^*_{\varGamma }(t) & = \frac{t}{\lambda } + r \log \frac{r \lambda }{t} - r, \quad \alpha _{\varGamma }(t) = -\frac{r}{t} + \frac{1}{\lambda } \quad \text{ for } \quad t > 0. \end{aligned}

### Example 2.4

Binomial distribution. B (np) $$(p, q > 0, \, p + q = 1)$$
\begin{aligned} K_B(\alpha ) & = n \log (pe^{\alpha } + q) \quad \text{ for } \quad \alpha \in {{\mathbb {R}}}, \\ K^*_B(t) & = t \log \frac{t}{p} + (n - t) \log \frac{n - t}{q} - n \log n \quad \text{ for } \quad 0< t< n, \\ \alpha _B(t) & = \log \left[ \frac{q t}{p(n - t)} \right] \quad \text{ for } \quad 0< t < n. \end{aligned}

### Example 2.5

Poisson distribution. $${\text{Po}}(\lambda )$$$$(\lambda > 0)$$
\begin{aligned} K_{\text{Po}}(\alpha ) & = \lambda (e^{\alpha } - 1) \quad \text{ for } \quad \alpha \in {{\mathbb {R}}}, \\ K^*_{\text{Po}}(t) & = t \left( \log \frac{t}{\lambda } - 1 \right) + \lambda , \quad \alpha _{\text{Po}}(t) = \log \frac{t}{\lambda } \quad \text{ for } \quad t > 0. \end{aligned}

### Example 2.6

Multivariate normal distribution. $$N(\varvec{\mu }, \varvec{\varSigma })$$
\begin{aligned} K_N(\varvec{\alpha })= & {} {}^t \varvec{\mu } \varvec{\alpha } + \frac{1}{2} \, {}^t \varvec{\alpha } \varvec{\varSigma } \varvec{\alpha } \quad \text{ for } \quad \varvec{\alpha } \in {{\mathbb {R}}}^n, \\ K^*_N(\varvec{t})= & {} \frac{1}{2} {}^t (\varvec{t} - \varvec{\mu })\varvec{\varSigma }^{-1} (\varvec{t} - \varvec{\mu }), \quad \varvec{\alpha }_N(\varvec{t}) = \varvec{\varSigma }^{-1} (\varvec{t} - \varvec{\mu }) \quad \text{ for } \quad \varvec{t} \in {{\mathbb {R}}}^n. \end{aligned}

In saddlepoint approximations, a solution to Eq. (2) has been used as a saddlepoint $$\alpha (t)$$ for fixed t. In this study, $$\alpha = \alpha (t)$$ will be treated as a path that belongs to a function space. Consequently, probability distributions will be treated as paths, and convergence in law will be described by an action integral. To use calculus of variations, an action integral will be defined as follows:

### Definition 2.1

Let $$\alpha _G$$ be the saddlepoint of a probability distribution $$G \in {{\mathcal {P}}}$$. A functional
$$A_{F} [\alpha _{G} ] = - \int_{{\mu - \delta }}^{{\mu + \delta }} \{ t\alpha _{G} (t) - K_{F} (\alpha _{G} (t))\} {\text{d}}t$$
(7)
is called an action integral of the distributionG with respect to F, where $$\mu$$ is the mean of F and $$\delta$$ is a sufficiently small positive number. Particularly, $$A_{F}[\alpha _{F}]$$ is called the action integral ofF.
Hereafter, we restrict ourselves to a class of distributions, denoted by $${\mathcal {P}}$$, satisfying (1). If
$${{\mathcal {A}}} = \{ \alpha _F : F \in {{\mathcal {P}}} \},$$
then there exists a one-to-one mapping $$T : {{\mathcal {P}}} \rightarrow {{\mathcal {A}}}$$ by Theorem 2.2.

The action integral (7) is a functional of saddlepoints in $${\mathcal {A}}$$. By Theorem 2.2, the interval $$(\mu - \delta , \mu + \delta )$$ for the integral (7) is valid. As the action integral $$A_{F}[\alpha ]$$ is minimized if and only if $$\alpha = \alpha _F(t)$$ by Theorem 3.1, it is quite natural to use saddlepoints as an argument for the action integral.

By Theorem 3.1, the inequality $$A_F[\alpha _{F_n}] \ge A_F[\alpha _F]$$ holds for any sequence of probability distributions $$\{ F_n \}_{n \in {{\mathbb {N}}}}$$. This means that the sequence $$\{ A_F[F_n] \}_{n \in {{\mathbb {N}}}}$$ may attain its minimum at the limit distribution F, and the minimum will be given by Proposition 2.1 (i) below. This is the reason why we need a minus sign for our definition of the action integral (7). This is essential for describing the convergence in law $$F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F$$ by the action integral. Furthermore, some fundamental properties are obtained in Proposition 2.1. In what follows, the notation $$A_X[\alpha _X]$$ implies $$A_F[\alpha _F]$$ if a random variable X has distribution F.

### Proposition 2.1

The action integral $$A_F[\alpha _F]$$ has the following properties.
1. (i)

$$A_{F}[\alpha _{F}] = - \int _{\mu -\delta }^{\mu + \delta } K^{*}_{F}(t) \, {\text{d}}t = - \int _{\mu -\delta }^{\mu + \delta } \left( \int _{\mu }^{t} \alpha _{F}(u) \, {\text{d}}u \right) {\text{d}}t.$$

2. (ii)

$$-\infty = \inf _{F \in {{\mathcal {P}}}} A_F[\alpha _F]< A_F[\alpha _F] < 0$$.

3. (iii)

$$A_{X+c}[\alpha_{X+c}]=A_X[\alpha_X]$$  for any fixed$$c \in {{\mathbb {R}}}$$.

4. (iv)

$$A_{cX}[\alpha_{cX}]>A_X[\alpha_X]$$   for$$c > 1$$.

### Proof

1. (i)

The first relation follows from (7) and the proof of Theorem 2.1 (iii). The second is obvious from (5).

2. (ii)

As $$K^*_F(t) = t \alpha _F(t) - K_F(\alpha _F(t))$$ is a continuous function on a closed interval $$[\mu - \delta , \mu + \delta ]$$ for sufficiently small $$\delta > 0$$, the integral $$\int_{{\mu - \delta }}^{{\mu + \delta }} {K_{F}^{*} } (t){\text{d}}t$$ exists. Furthermore, Example 2.7 implies that $$A_N[\alpha _N] \searrow -\infty$$ as $$\sigma \searrow 0$$. Hence in general, $$\inf _{F} A_F[\alpha _F] = -\infty$$. By the property of the Legendre transform with respect to convex functions, $$K^*_F(t) > 0$$ for $$t \ne \mu$$, whereas $$K^*_F(t) = 0$$ for $$t = \mu$$. Then (i) implies that $$A_F[\alpha _F] < 0$$.

3. (iii)
The Legendre transform of the cumulant generating function for $$X + c \sim F(x - c)$$ is given by $$K^*_{X + c}(t) = K^*_{F} (t - c)$$. By (i), we have
\begin{aligned} A_{X + c}[\alpha _{X + c}] = - \int _{(\mu + c) - \delta }^{(\mu + c) + \delta } K^*_{F}(t - c) \, {\text{d}}t = - \int _{\mu - \delta }^{\mu + \delta } K^*_F(u) \, {\text{d}}u = A_{F}[\alpha _{F}]. \end{aligned}

4. (iv)
The Legendre transform of a cumulant generating function for $$cX \sim F(x / c)$$ is $$K^*_{cX}(t) = K^*_F(t / c)$$. By (ii), if $$\mu = 0$$ and $$t \ne 0$$ then $$K^*_{F}(t / c) < K^*_{F}(t)$$ almost everywhere. Hence, (i) implies
\begin{aligned} A_X[\alpha _X]< & {} - \int _{-\delta }^{\delta } K^*_F(t / c) \, {\text{d}}t = - \int _{-\delta }^{\delta } K^*_{cX}(t) \, {\text{d}}t = A_{cX}[\alpha _{cX}]. \end{aligned}
Finally, by (iii), we have
\begin{aligned} A_X[\alpha _X] = A_{X - \mu }[\alpha _{X - \mu }] < A_{c(X - \mu )}[\alpha _{c(X - \mu )}] = A_{cX}[\alpha _{cX}], \end{aligned}
for the case $$\mu \ne 0$$. $$\square$$

Some examples of action integrals are now provided.

### Example 2.7

Univariate normal distribution. $$N(\mu , \sigma ^2)$$
\begin{aligned} A_N[\alpha _N] = - \frac{\delta ^3}{3 \sigma ^2} \quad \text{ for } \quad \delta > 0. \end{aligned}

### Example 2.8

Binomial distribution. B (np)
\begin{aligned} A_B[\alpha _B] = \frac{1}{2} \log \frac{(np - \delta )^{(np - \delta )^2}(nq - \delta )^{(nq - \delta )^2}}{(np + \delta )^{(np + \delta )^2}(nq + \delta )^{(nq + \delta )^2}} + \delta n (2 \log n p^p q^q + 1) \end{aligned}
for $$0< \delta < n \cdot \min \{ p, q \}$$, where $$q = 1 - p$$.

### Example 2.9

Poisson distribution. $${\text{Po}}(\lambda )$$
\begin{aligned} A_{\text{Po}}[\alpha _{\text{Po}}] = \frac{1}{2} \log \frac{(\lambda - \delta )^{(\lambda - \delta )^2}}{(\lambda + \delta )^{(\lambda + \delta )^2}} + \delta \lambda (2 \log \lambda + 1) \quad \text{ for } \quad 0< \delta < \lambda . \end{aligned}

### Example 2.10

Exponential distribution. $${\varGamma }( 1 / \lambda , 1 )$$
$$A_{{\text{e}}} [\alpha _{{\text{e}}} ] = \log \frac{{(1/\lambda + \delta )^{{1/\lambda + \delta }} }}{{(1/\lambda - \delta )^{{1/\lambda - \delta }} }} + 2\delta (\log \lambda - 1)\quad {\text{for}}\quad 0 < \delta < \frac{1}{\lambda }.$$

### Example 2.11

d-th dimensional multivariate normal. $$N(\varvec{\mu }, \varvec{\varSigma })$$
$$A_{N} [\varvec{\alpha }_{N} ] = - \delta ^{{d + 2}} \frac{2}{3}^{{d - 1}} \sum\limits_{{i = 1}}^{d} {\frac{1}{{\lambda _{i} }}} \quad {\text{for}}\quad \delta > 0,$$
where $$\lambda _i's$$ are the distinct eigenvalues of the covariance matrix $$\varvec{\varSigma }$$.

## 3 Probability distribution as a path

In this section, an optimal property of saddlepoints will be derived from the principle of least action. As by Definition 2.1, the action integral $$A_F[\alpha ]$$ is a functional of the saddlepoint, $$\alpha = \alpha (t)$$ is treated as a path belonging to the set $${{\mathcal {A}}}$$. To apply calculus of variations to the functional, we assume the following two conditions. We define the closed interval $$I_{\delta }$$ as $$\{ t : |t - \mu | \le \delta \}$$. (1) a real valued function $$\beta (t)$$ is continuous on $$I_{\delta }$$. (2) there exists an integrable function g such that $$|K_F(\alpha (t) + c \beta (t)) - K_F(\alpha (t))| \le |c| g(t)$$ for almost all $$t \in I_{\delta }$$.

### Theorem 3.1

LetFbe a probability distribution in the class$${{\mathcal {P}}}$$. Then the corresponding saddlepoint$$\alpha _F \in {{\mathcal {A}}}$$is the unique path that minimizes the action integral$$A_F[\alpha ]$$. The minimum$$\min _{\alpha \in {{\mathcal {A}}}} A_F[\alpha ]$$is
$$A_F[\alpha _F] = - \int _{t \in I_\delta } K^*_F(t) \, {\text{d}}t.$$
(8)

### Proof

The first variation of $$A_F[\alpha ]$$ is given by the Gâteaux derivative in the $$\beta$$-direction as follows:
\begin{aligned} \frac{{\text{d}} A_F}{{\text{d}} \alpha } [\alpha ; \beta ]: & = \lim _{c \rightarrow 0} \frac{A_F[\alpha + c \beta ] - A_F[\alpha ]}{c} \\ & = \int _{t \in I_{\delta } \cap \{ t : \beta (t) \ne 0 \}} \left\{ \lim _{c \rightarrow 0} \frac{K_{F}(\alpha + c \beta (t)) - K_{F}(\alpha )}{c \beta (t)} - t \right\} \beta (t) \, {\text{d}}t, \end{aligned}
since the integrand in the right-hand side vanish for $$t \in \{ t : \beta (t) = 0\}$$. Then by the interchange of the limit and the integration follows from the condition (2), with respect to Lebesgue integral, we have
$$\frac{{\text{d}} A_F}{{\text{d}} \alpha } [\alpha ; \beta ] = \int _{t \in I_{\delta }} \{ K'_F(\alpha ) - t \} \beta (t) \, {\text{d}}t.$$
(9)
By (9), a solution to the equation $$\frac{{\text{d}} A_F}{{\text{d}} \alpha } [\alpha ; \beta ] = 0$$ is equivalent to that of the saddlepoint equation (2) by virtue of the du Bois–Reymond’s lemma, the fundamental lemma in calculus of variations. This implies $$K'_F(\alpha (t)) - t = 0$$ for any $$t \in I_{\delta }$$. By Theorem 2.1, the saddlepoint equation has a unique solution $$\alpha _t$$ for any fixed $$t \in I_{\delta }$$, and it is equivalent to the saddlepoint $$\alpha _F(t)$$. Hence, $$\alpha _F$$ is a stationary point of the functional $$A_F[\alpha ]$$, and can be treated as a path in the class $${{\mathcal {A}}}$$ by (9).
Moreover, by the definition of the Legendre transform,
\begin{aligned} A_F[\alpha ] \ge - \int _{t \in I_{\delta }} \sup _{\alpha \in \mathrm{dom}{} { K_F}} \bigl \{ t \alpha - K_{F}(\alpha ) \bigr \} \, {\text{d}}t = - \int _{t \in I_{\delta }} K^*_F(t) \, {\text{d}}t, \end{aligned}
for any $$\alpha \in {{\mathcal {A}}}$$. The proof of Theorem 2.1 (iii) implies $$K^*_F(t) = t \alpha _F(t) - K_F(\alpha _F(t))$$, and we have (8) by (7). That is, the saddlepoint $$\alpha = \alpha _F(t)$$ minimizes globally the action integral $$A_F[\alpha ]$$. $$\square$$
The second variation of the action integral in the $$\beta$$-direction is given by
\begin{aligned} \frac{{\text{d}}^2 A_{F}}{{\text{d}} \alpha ^2}[\alpha ; \beta ] :=\int _{t \in I_{\delta }} \frac{{\text{d}}^2 K_{F}}{{\text{d}} \alpha ^2}(\alpha ) \{ \beta (t) \}^2 \, {\text{d}}t. \end{aligned}
As F is not a degenerate distribution, $$K''_F(\alpha )$$ is positive on a neighborhood of the origin. If $$\delta > 0$$ is sufficiently small, then $$K''_F(\alpha _F(t)) \approx \sigma ^2_F$$ for $$t \in I_{\delta }$$; we have,
$$\frac{{{\text{d}}^{2} A_{F} }}{{{\text{d}}\alpha ^{2} }}[\alpha _{F} ;\beta ] > 0,$$
(10)
for $$\beta \ne 0$$. As the second variation provides local information about the stationary point, (10) implies that $$A_F[\alpha _F]$$ is a local minimum.

In Sect. 2, the action integral (7) of a probability distribution $$F \in {{\mathcal {P}}}$$ was defined using entropy. In the proof of Theorem 3.1, the saddlepoint equation (2) naturally appeared in the Gâteaux derivative (9). Moreover, the saddlepoint as the solution to this variations problem is shown to be globally optimal by the principle of least action. Therefore, (7) is quite appropriate as a definition of the action integral for a probability distribution. It should be noted that the saddlepoint equation (2) corresponds to the Euler–Lagrange equation. For calculus of variations and related fields, the reader is referred to Gelfand (1963), for example.

The solution to equation (2) was defined to be a saddlepoint $$\alpha (t)$$ for fixed t; however, by Theorem 3.1, it can be treated as a path $$\alpha _F$$ in the class $${\mathcal {A}}$$. This provides a completely new interpretation of saddlepoints.

In the following, the left-hand side of (7) will be written as
\begin{aligned} A_F[G] :=A_F[\alpha _G]. \end{aligned}
This implies that “a probability distribution in $${\mathcal {A}}$$ could be regarded as a path” by the unique correspondence to a saddlepoint. This is justified by the least action principle.

## 4 Convergence in law by action integral

By Theorem 3.1 the convergence in law $$F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F$$ may be regarded as $$F_n$$ reducing its action integral and reaching a minimal state F. This resembles a marble released from the edge of a bowl, rolling along its surface, and finally stopping at some future time. The relation $$A_F[F_n] \ge A_F[F]$$ shows the situation that the stopping state is the lowest.

In this section, it will be shown that an action integral can clearly describe convergence in law. The local uniform convergence of a sequence of saddlepoints implies the convergence of the action integral as well as convergence in law. $$\mu$$ and $$\sigma ^2$$ denote the mean and the variance of F, respectively.

### Theorem 4.1

Let $$\{ F_n \}_{n \in {{\mathbb {N}}}}$$ be a sequence of probability distributions, and $$\{ \alpha _n \}_{n \in {{\mathbb {N}}}}$$ the corresponding sequence of saddlepoints. We assume that there exists some $$\delta > 0$$ such that
$$\mathop {\sup }\limits_{{t \in I_{\delta } }} |\alpha _{n} (t) - \alpha _{F} (t)| \to 0\quad {\text{as}}\quad n \to \infty .$$
(11)
Then the following hold:
1. (i)

$$F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F$$.

2. (ii)

$$A_F[F_n]$$ converges to $$A_F[F]$$ from above.

### Proof

(i) The inverse functions of $$\alpha _n(t)$$ and $$\alpha (t)$$ are $$K'_n(\alpha )$$ and $$K'_F(\alpha )$$, respectively, by Theorem 2.1 (iv). By (11) and Corollary 8.1, for some $$\eta > 0$$ we have $$\sup _{|\alpha | \le \eta } |K'_n(\alpha ) - K'_F(\alpha )| \rightarrow 0$$ as $$n \rightarrow \infty$$. Then $$K_n(\alpha ) \rightarrow K_F(\alpha )$$ holds by the bounded convergence theorem for each $$\alpha$$ in a neighborhood of the origin, and thus the conclusion follows from the continuity theorem for analytic characteristic functions Kotz et al. (1983).

(ii) By Theorem 3.1, $$A_F[F_n] \ge A_F[F]$$ holds for any $$n \in {{\mathbb {N}}}$$. As $$\alpha _F(t)$$ is a continuous function on $$I_{\delta }$$ for sufficiently small $$\delta > 0$$, there exists an $$L > 0$$ such that $$\sup _{t \in I_{\delta }} |\alpha _n(t)| < L$$ for all sufficiently large n. Hence, by the bounded convergence theorem
$$\int_{{t \in I_{\delta } }} t \alpha _{n} (t){\text{d}}t \to \int_{{t \in I_{\delta } }} t \alpha _{F} (t){\text{d}}t,\int_{{t \in I_{\delta } }} {K_{F} } (\alpha _{n} (t)){\text{d}}t \to \int_{{t \in I_{\delta } }} {K_{F} } (\alpha _{F} (t)){\text{d}}t$$
as $$n \rightarrow \infty$$; thus, the conclusion follows from (7). $$\square$$
Let $$S_n$$ be the normalized sum of independent random variables identically distributed to $$F \in {{\mathcal {P}}}$$. Then the corresponding saddlepoint is given by
$$\alpha _{n} (t) = \sigma \sqrt n \alpha _{F} \left( {\mu + \frac{\sigma }{{\sqrt n }}t} \right),$$
(12)
refer to Field and Ronchetti (1990). Takeuchi (2013) showed that $$\sup _{|t| \le \delta } |\alpha _n(t) - t| \rightarrow 0$$ as $$n \rightarrow \infty$$ under mild conditions. As $$\alpha (t) = t$$ corresponds to the standard normal, by Theorem 4.1, uniform convergence is a sufficient condition for the central limit theorem $$S_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} N$$. Example 4.1 illustrates the situation with respect to the convergence $$\alpha _n(t) \rightarrow t$$. The same approach is found in Field (1985) and Field and Ronchetti (1990). Alternatively, the theorem is elucidated by the curvature of a saddlepoint around the origin Takeuchi (2014).

### Example 4.1

(Takeuchi 2014) We see the central limit theorem $$S_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} N$$ by observing $$\alpha _n(t) \rightarrow t$$ in a graph. Figure 2 illustrates the local uniform convergence $$\alpha _n(t) \rightarrow t$$ for which $$S_n$$ is a normalized sum of the sample from the inverse Gaussian $$IG (\mu , \lambda )$$ with $$(\mu , \lambda ) = (1, 1)$$. By (12), the saddlepoint is
$$\alpha _{n} (t) = \frac{{\sqrt {n\lambda } }}{{2\sqrt \mu }}\left\{ {1 - \frac{{n\lambda }}{{(\sqrt \mu t + \sqrt {n\lambda } )^{2} }}} \right\},$$
for $$t > -(n \lambda / \mu )^{1 / 2}$$. The convergence is strictly monotone increasing for each point in a neighborhood of the origin except for $$t = 0$$.

Although the condition (11) implies path convergence, the action integral (7) compresses the information of a distribution into a scalar quantity. Therefore, to capture the asymptotic normality of $$S_n$$, the method in Example 4.1 could be improved by observing the action integral, that is, $$A_N[S_n] \searrow A_N[N]$$. This is demonstrated in Propositions 4.1, 4.2, and 4.3.

Let $$F_n$$ be the probability distribution of the statistic $$S_n$$. Theorem 4.2 resembles the Edgeworth expansion, but has different interpretation. The expansion (13) shows that a sequence of paths $$\{ F_n \}_{n \in {{\mathbb {N}}}}$$ reduces its action integral and reaches the least state N in terms of sample size n and the i-th order cumulants $$\kappa _i$$ ($$i = 1,\ldots,5$$). That is, we can obtain the central limit theorem by observing the convergence $$F_n \in {{\mathcal {P}}} \rightarrow N \in {{\mathcal {P}}}$$ from the perspective of the principle of least action.

### Theorem 4.2

We have the following expansion:
\begin{aligned} A_{N}[F_n] & = A_{N}[N] + \frac{1}{20 n} \frac{\kappa _3^2}{\kappa _2^3} \delta ^5 \nonumber \\&\quad + \frac{1}{n^2} \left\{ \frac{1}{8} \frac{\kappa _3^4}{\kappa _2^6} - \frac{1}{12}\frac{\kappa _3^2 \kappa _4}{\kappa _2^5} + \frac{2 \kappa _4^2 + 3 \kappa _3 \kappa _5}{504 \kappa _2^4} \right\} \delta ^7 + O \left( \frac{1}{n^3} \right) \end{aligned}.
(13)

### Proof

It should be noted that the saddlepoint of $$F_n$$ is given by (12). Then the condition (11) is satisfied because $$\sup _{|t| \le \delta } |\alpha _n(t) - t| \rightarrow 0$$, and we have $$A_N[F_n] \searrow A_N[N]$$ by Theorem 4.1. As by Lemma 8.1, $$\alpha _n(0) = 0$$ and $$\alpha '_n(0) = 1$$, the Taylor expansion of $$\alpha _n(t)$$ around $$t = 0$$ is
\begin{aligned} \alpha _n(t) & = t + \frac{\sigma ^3}{2 n^{1 / 2}} \alpha ''(\mu ) \, t^2 + \frac{\sigma ^4}{6 n} \alpha ^{(3)}(\mu ) \, t^3 \\&\quad + \frac{\sigma ^5}{24 n^{3 / 2}} \, \alpha ^{(4)}(\mu ) \,t^4 + \frac{\sigma ^6}{120 n^2} \, \alpha ^{(5)} \left( \mu + \frac{\sigma }{\sqrt{n}} \, \theta t \right) t^5, \end{aligned}
where $$0< \theta < 1$$. Let $$K_N(\alpha )$$ be the cumulant generating function of the standard normal. Then the integrand of $$A_N[F_n]$$ is expanded as
\begin{aligned}&t \alpha _n(t) - K_{N}(\alpha _n(t)) \\&\quad = \frac{1}{2} t^2 - \frac{\sigma ^6}{8 n} \{\alpha ''(\mu )\}^2 \, t^4 - \frac{\sigma ^7}{12 n^{3 / 2}} \alpha ''(\mu ) \, \alpha ^{(3)}(\mu ) \, t^5 \\&\qquad - \frac{\sigma ^8}{n^2} \left\{ \frac{1}{72} \{\alpha ^{(3)}(\mu )\}^2 + \frac{1}{48} \alpha ''(\mu ) \, \alpha ^{(4)}(\mu ) \right\} t^6 \\&\qquad - \frac{\sigma ^9}{n^{5 / 2}} \left\{ \frac{1}{240} \alpha ''(\mu ) \, \alpha ^{(5)} \left( \mu + \frac{\sigma }{\sqrt{n}} \theta t \right) + \frac{1}{144} \alpha ^{(3)}(\mu ) \alpha ^{(4)}(\mu ) \right\} t^7 \\&\qquad - \frac{\sigma ^{10}}{n^3} \left\{ \frac{1}{1152} \{ \alpha ^{(4)}(\mu ) \}^2 + \frac{1}{720} \alpha ^{(3)}(\mu ) \, \alpha ^{(5)} \left( \mu + \frac{\sigma }{\sqrt{n}} \theta t \right) \right\} t^8 \\&\qquad - \frac{\sigma ^{11}}{2880 n^{7 / 2}} \alpha ^{(4)}(\mu ) \, \alpha ^{(5)} \left( \mu + \frac{\sigma }{\sqrt{n}} \, \theta t \right) t^9 - \frac{\sigma ^{12}}{28800 n^4} \, \left\{ \alpha ^{(5)} \left( \mu + \frac{\sigma }{\sqrt{n}} \, \theta t \right) \right\} ^2 t^{10}. \end{aligned}
Hence, by integrating both sides for $$|t| < \delta$$, with $$A_N[\alpha _N] = - \int _{-\delta }^{\delta } t^2 {\text{d}}t / 2$$, and by using Proposition 2.1 (i), we have
\begin{aligned} A_{N}[F_n] & = A_{N}[N] + \frac{\sigma ^6}{20 n} \{\alpha ''(\mu )\}^2 \delta ^5 \\&\quad + \frac{\sigma ^8}{n^2} \left\{ \frac{1}{252} \{\alpha ^{(3)}(\mu )\}^2 + \frac{1}{168} \alpha ''(\mu ) \, \alpha ^{(4)}(\mu ) \right\} \delta ^7 + O \left( \frac{1}{n^3} \right) . \end{aligned}
Thus, we obtain (13) from Lemma 8.1. $$\square$$

It is well known that the rate of convergence in law of $$S_n$$ is faster when the distribution F is symmetric. By setting $$\kappa _3 = 0$$ in (13), this is also true for the action integral.

De Moivre–Laplace’s theorem is obtained similarly.

### Proposition 4.1

Let$$B_n$$be the distribution of a normalized sum of the sample fromB (1, p). Then we have$$A_N[B_n] \searrow A_N[N]$$as$$n \rightarrow \infty$$, and the following expansion:
\begin{aligned} A_{N}[B_n] & = A_{N}[N] + \frac{1}{20 n} \frac{(p - q)^2}{pq} \delta ^5 \nonumber \\&\quad + \frac{1}{252 n^2} \frac{108 p^4 - 216 p^3 + 186 p^2 - 78 p + 13}{p^2 q^2} \delta ^7 + O \left( \frac{1}{n^3} \right) , \end{aligned}
(14)
where$$p, q > 0$$and$$p + q = 1$$.

### Proof

The saddlepoint of $$B_n$$ is given by
$$\alpha _{n} (t) = \sqrt {npq} \log \frac{{q(\sqrt n p + \sqrt {pq} t)}}{{p(\sqrt n q - \sqrt {pq} t)}}\quad {\text{for}}\quad - \sqrt {np/q} < t < \sqrt {nq/p} .$$
(15)
We have $$A_N[B_n] \searrow A_N[N]$$ as in Theorem 4.2. Substituting the cumulants of B(1, p), i.e., $$\kappa _2 = pq, \ \kappa _3 = -pq(p - q), \ \kappa _4 = pq(6p^2 - 6p + 1), \ \kappa _5 = -pq(p - q)(12 p^2 - 12 p + 1)$$ to (13), we obtain the conclusion. $$\square$$
Using (15), the action integral of $$B_n$$ with respect to the standard normal distribution is
$$A_{N} [B_{n} ] = - \int_{{ - \delta }}^{\delta } {\left\{ {t - \frac{1}{2}\alpha _{n} (t)} \right\}} \alpha _{n} (t){\text{d}}t.$$
In the following numerical examples, we set $$\delta = 3$$ in the action integral (7). A numerical calculation of $$A_N[B_n]$$ and its approximation (14) are shown in Figs. 3 and 4, respectively. The values $$p = 0.1, \, 0.2, \, 0.3, \, 0.4, \, 0.5$$ were used for the parameter of the binomial distribution. It should be noted that (14) implies that the limit value of $$A_N[B_n]$$ is $$A_N[N] = -9$$, as in Example 2.7, which does not depends on the parameter p.

The convergence in law of a normalized Poisson to the standard normal is obtained as follows.

### Proposition 4.2

Let $$P_{\lambda }$$ be the distribution of a normalized Poisson random variable. Then we have $$A_{N}[P_{\lambda }] \searrow A_{N}[N]$$ and
$$A_{N} [P_{\lambda } ] = A_{N} [N] + \frac{1}{{20\lambda }}\delta ^{5} + O\left( {\frac{1}{{\lambda ^{2} }}} \right)\quad {\text{as}}\quad \lambda \to \infty .$$
(16)

### Proof

As the normalized random variable is $$(X - \lambda ) / \sqrt{\lambda } \sim P(\lambda )$$, the corresponding saddlepoint is $$\alpha _{\lambda } (t) = \sqrt \lambda \alpha _{{{\text{Po}}}} (\lambda + \sqrt \lambda t)$$, where $$\alpha _{\text{Po}}(t) = \log (t / \lambda )$$ ($$t > 0$$) is the saddlepoint of the Poisson distribution $${\text{Po}}(\lambda )$$ ($$\lambda > 0$$). If we fix any sequence $$\{ \lambda _n \}_{n \in {{\mathbb {N}}}}$$ such that $$\lambda _n \rightarrow \infty$$, then for sufficiently small $$\delta > 0$$, we have $$\sup _{|t| \le \delta } |\alpha _{\lambda _n}(t) - \alpha _N(t)| \rightarrow 0$$ as $$n \rightarrow \infty$$. Hence, $$A_{N}[P_{\lambda }] \searrow A_{N}[N]$$ by Theorem 4.1. As in the neighborhood of the origin $$t = 0$$ we have
\begin{aligned} \alpha _{\lambda }(t) = t - \frac{1}{2 \sqrt{\lambda }} t^2 + \frac{1}{3 \lambda } t^3 + o \left( \frac{1}{\lambda } \right) \quad \text{ as } \quad \lambda \rightarrow \infty , \end{aligned}
the integrand of the difference $$A_{N}[P_{\lambda }] - A_N[N]$$ is written as
\begin{aligned} - \{ t \alpha _{\lambda }(t) - K_N(\alpha _{\lambda }(t)) \} - \{ - K^*_N(t) \} = \frac{1}{8 \lambda } t^4 - \frac{1}{6 \lambda ^{3 / 2}} t^5 + \frac{13}{72 \lambda ^2} t^6 + O \left( \frac{1}{\lambda ^{5 / 2}} \right) , \end{aligned}
by using $$K_N^*(t) = t^2 / 2$$. Finally, we obtain (16) by integrating both sides for $$|t| < \delta$$. $$\square$$
If a random variable X follows the Poisson distribution $${\text{Po}}(\lambda )$$, then the saddlepoint of its normalization is given by
$$\frac{{X - \lambda }}{{\sqrt \lambda }}\mathop \sim \limits^{{{\text{sp}}{\text{.}}}} \alpha _{\lambda } (t) = \sqrt \lambda \log \left( {1 + \frac{t}{{\sqrt \lambda }}} \right),$$
where the symbol $$X\mathop \sim \limits^{{{\text{sp}}{\text{.}}}} \alpha$$ implies that $$\alpha$$ is the saddlepoint of the probability distribution function derived from the random variable X. With the cumulant generating function $$K_N(\alpha ) = \alpha ^2 / 2$$, the action integral of a normalized Poisson random variable with respect to the standard normal distribution is given as follows:
$$A_{N} [P_{\lambda } ] = - \sqrt \lambda \int_{{ - \delta }}^{\delta } {\left\{ {t - \frac{{\sqrt \lambda }}{2}\log \left( {1 + \frac{t}{{\sqrt \lambda }}} \right)} \right\}} \log \left( {1 + \frac{t}{{\sqrt \lambda }}} \right){\text{d}}t.$$
The numerical calculation of $$A_N[P_{\lambda }]$$ and its approximation (16) is shown in Fig. 5. The limit value of $$A_N[P_{\lambda }]$$ as $$\lambda \rightarrow \infty$$ is $$A_N[N] = -9$$.

In the non-normal limit case, we provide an action integral version of the Poisson’s law of small numbers as follows.

### Proposition 4.3

The action integral of the Binomial with respect to the Poisson distribution$$A_{\text{Po}}[B]$$converges to$$A_{\text{Po}}[{\text{Po}}]$$as$$n \rightarrow \infty$$, with$$\lambda = np$$constant. Furthermore,
$$A_{\text{Po}}[B] = A_{\text{Po}}[{\text{Po}}] + \frac{\lambda }{3(n - \lambda )^2} \, \delta ^3 + O \left( \frac{1}{n^3} \right) \quad \text{ as } \quad n \rightarrow \infty .$$
(17)

### Proof

The saddlepoint $$\alpha _B(t)$$ of B (np) is given in Example 2.4. For $$\lambda = np$$ constant,
\begin{aligned} \sup _{|t - \lambda | \le \delta } |\alpha _B(t) - \alpha _{\text{Po}}(t)| = \sup _{|t - \lambda | \le \delta } \left| \log \frac{n - \lambda }{n - t} \right| \rightarrow 0 \quad \text{ as } \quad n \rightarrow \infty \end{aligned}
holds for sufficiently small $$\delta > 0$$. Thus, we have $$A_{\text{Po}}[B] \searrow A_{\text{Po}}[{\text{Po}}]$$ by Theorem 4.1. The integrand of the difference $$A_{\text{Po}}[B] - A_{\text{Po}}[{\text{Po}}]$$ is written as follows:
\begin{aligned}&- \{ t \alpha _B(t) - K_{\text{Po}}(\alpha _B(t)) \} - \{ - K^*_{\text{Po}}(t) \} \\&\quad = \frac{\lambda }{2(n - \lambda )^2} (t - \lambda )^2 + \frac{3n + \lambda }{6(n - \lambda )^3} (t - \lambda )^3 + O \left( \frac{1}{n^3} \right) \quad \text{ as } \quad n \rightarrow \infty . \end{aligned}
Finally, we obtain (17) by integrating both sides for $$|t - \lambda | < \delta$$. $$\square$$
By (7), the action integral of B with respect to Po is
$$A_{{{\text{Po}}}} [B] = - \int_{{\lambda - \delta }}^{{\lambda + \delta }} {\{ t\alpha _{B} (t) - K_{{{\text{Po}}}} (} \alpha _{B} (t))\} {\text{d}}t\quad {\text{for}}\quad 0 < \delta < \lambda .$$
The numerical calculation of $$A_{\text{Po}}[B]$$ and its approximation (17) is shown in Figs. 6 and 7, respectively. $$\lambda = 4, \, 6, \, 8, 10, 12$$ are used as the Poisson parameter. The limit values of the action integral $$A_{\text{Po}}[B]$$ is $$A_{\text{Po}}[{\text{Po}}]$$. It depends on the parameter $$\lambda$$, and is shown in Table 1.
Table 1

Action integral of Poisson

$$\lambda$$

4

6

8

10

12

$$A_{\text{Po}}[{\text{Po}}]$$

− 2.4037

− 1.5405

− 1.1415

− 0.9083

− 0.7548

It should be noted that the sum on the right-hand side of (13), (14), (16) and (17) is not negative, except for the initial term. This follows from Theorem 3.1.

## 5 Action integral of the conjugate distribution

We assume that $${\hat{\alpha }}_n(t)$$ is the saddlepoint of a distribution $$\mathrm{Pr} \{ {\hat{\theta }}_n \le x \}$$ for some statistic $${\hat{\theta }}_n$$ based on a sample from a distribution $$F \in {{\mathcal {P}}}$$, where $$\beta = \beta (t)$$ is a continuous function. Then the difference of the action integrals $$A_F[{\hat{\alpha }}_n + c \beta ] - A_F[{\hat{\alpha }}_n]$$ may express the robustness of this statistic. Using this idea, we can characterize the conjugate distribution, which is defined as $$F_c(x) = \frac{1}{M_F(c)} \int _{-\infty }^{x} e^{cu} \, {\text{d}}F(u),$$ where $$M_F$$ is the moment generating function of F. If $$c = \alpha _F(t)$$ (the saddlepoint of F at t), then $$\int _{-\infty }^{\infty } x \, {\text{d}}F_c(x) = t,$$ see for example Barndorff-Nielsen and Cox (1989). The distribution plays an essential role in the large deviation principle, the method of steepest descent, and the method of saddlepoint approximations.

Let $$G_{c \beta }$$ be a distribution whose saddlepoint is $$\alpha _F - c \beta$$, and let $$\mu _{c \beta }$$ be its mean. By the inversion formula (4), the cumulant generating function of $$G_{c \beta }$$ is given by
$$K_{{G_{{c\beta }} }} (\alpha ) = \left\{ {\int_{{\mu _{{c\beta }} }}^{t} ( \alpha _{F} (u) - c\beta (u)){\text{d}}u} \right\}^{*} (\alpha ).$$
(18)

### Proposition 5.1

If$$\beta = 1$$, then$$G_{c \beta }$$is equivalent to the conjugate distribution ofF.

### Proof

By Theorem 2.1 (v), $$(\alpha _F - c \beta )(\mu _c) = 0$$; thus, if $$\beta (t) \equiv 1$$, then
$$\alpha _F(\mu _c) - c = 0,$$
(19)
and by (18) and (5), we have
\begin{aligned} K_{G_c}(\alpha ) & = \left\{ \int _{\mu _F}^t \alpha _F(u) \, {\text{d}}u + \int _{\mu _c}^{\mu _F} \alpha _F(u) \, {\text{d}}u - c(t - \mu _c) \right\} ^* (\alpha ) \\ & = \sup _{t} \left\{ (\alpha + c) t - \int _{\mu _F}^t \alpha _F(u) \, {\text{d}}u \right\} - \int _{\mu _c}^{\mu _F} \alpha _F(u) \, {\text{d}}u - c \mu _c \\ & = \left( \int _{\mu _F}^t \alpha _F(u) \, {\text{d}}u \right) ^* (\alpha + c) + \int _{\mu _F}^{\mu _c} \alpha _F(u) \, {\text{d}}u - c \mu _c \\ & = K_F(\alpha + c) + K^*_F(\mu _c) - c \mu _c. \end{aligned}
Using $$K^*_F(t) = t \alpha _F(t) - K_F(\alpha _F(t))$$ in the proof of Theorem 2.1 (iii), we have $$K^*_F(\mu _c) = \mu _c \alpha _F(\mu _c) - K_F(\alpha _F(\mu _c)) = c \mu _c - K_F(c).$$ Thus $$K_{G_c}(\alpha ) = K_F(\alpha + c) - K_F(c)$$, and this implies
$$\int _{-\infty }^{\infty } e^{\alpha x} \, {\text{d}}G_{c}(x) = \int _{-\infty }^{\infty } e^{\alpha x} \, {\text{d}} \left( \frac{1}{M_F(c)} \int _{-\infty }^x e^{cu} \, {\text{d}}F(u) \right) .$$
Hence, we obtain the conclusion by the uniqueness of the moment generating function. $$\square$$

Proposition 5.1 asserts that if the saddlepoint $$\alpha _F$$ moves in the $$\beta$$-direction, i.e., $$\alpha _F - c \beta$$, then $$\beta = 1$$ is “the direction of the conjugate distribution”.

The action integral of the conjugate distribution is evaluated as in Proposition 5.2. We also use $$I_{\delta } = \{ t : |t - \mu | \le \delta \}$$.

### Proposition 5.2

For sufficiently small$$\delta > 0$$, we have
$$A_{F} [F_{c} ] - A_{F} [F]\sim \frac{{c^{2} }}{2}\int_{{t \in I_{\delta } }} {K^{\prime\prime}_{F} } (\alpha _{F} (t)){\text{d}}t\quad {\text{as}}\quad c \to 0.$$
(20)

### Proof

For $$t \in I_{\delta }$$, by the Taylor expansion
\begin{aligned} K_F(\alpha _F(t) - c) & = K_F(\alpha _F(t)) - c K'_F(\alpha _F(t)) \\&\quad + \frac{1}{2} c^2 K''_F(\alpha _F(t)) + \frac{1}{6} c^3 K^{(3)}(\alpha _F(t) - \theta c). \end{aligned}
By the condition (1), $$K \in C^{\infty }$$, and thus for sufficiently small $$|c| > 0$$ and $$\delta > 0$$, there is $$L > 0$$ such that
$$\int _{t \in I_{\delta }} |K^{(3)}(\alpha _F(t) - \theta c)| \, {\text{d}}t < L,$$
for any $$0< \theta < 1$$. Furthermore, for sufficiently small $$\delta > 0$$, we have $$K''_F(\alpha _F(t)) > 0$$. Therefore,
\begin{aligned} A_F[F_c] & = - \int _{t \in I_{\delta }} \left\{ t (\alpha _F(t) - c) - K_F(\alpha _F(t) - c) \right\} \, {\text{d}}t \\ & = - \int _{t \in I_{\delta }} \{ t \alpha _F(t) - K_F(\alpha _F(t)) \} \, {\text{d}}t - \, c \int _{t \in I_{\delta }} \{ K_F'(\alpha _F(t)) - t \} \, {\text{d}}t \\&\quad + \frac{c^2}{2} \int _{t \in I_{\delta }} K_F''(\alpha _F(t)) \, {\text{d}}t + \frac{c^3}{6} \int _{t \in I_{\delta }} K_F^{(3)}(\alpha _F(t) - \theta c) \, {\text{d}}t \\ & = A_F[\alpha _F] + \frac{c^2}{2} \int _{t \in I_{\delta }} K_F''(\alpha _F(t)) \, {\text{d}}t + o(c^2), \quad \text{ as } \quad c \rightarrow 0, \end{aligned}
by the saddlepoint Eq. (2) $$\square$$

As in Proposition 5.2, the dispersion between F and its conjugate distribution with respect to action integral is proportional to the variance. The normal distribution provides an illustrative example.

### Example 5.1

If F in (20) is $$N(\mu , \sigma ^2)$$, then for any $$\delta > 0$$ and $$c \in {{\mathbb {R}}}$$,
\begin{aligned} A_N[\alpha _N + c] - A_N[\alpha _N] = \delta \sigma ^2 c^2. \end{aligned}
The sequence of conjugate distributions $$\{ F_c \}_{c \in {{\mathbb {R}}}}$$ converges in law to F as $$c \rightarrow 0$$. It can be shown that its second-order Gâteaux derivative satisfies
\begin{aligned} \sup _{\beta \in \{ \alpha _F - c \beta \in {{\mathcal {A}}} \}} \frac{{\text{d}}^2 A_F}{{\text{d}} \alpha ^2}[\alpha _F : \beta ] = \sup _{\beta \in \{ \alpha _F - c \beta \in {{\mathcal {A}}} \}} \int _{t \in I_{\delta }} K''_F(\alpha _F(t)) \{ \beta (t) \}^2 \, {\text{d}}t. \end{aligned}
This expression shows a sensitivity of the action integral when $$\alpha _F$$ changes to $$\alpha _F - c \beta$$. However, we should study the $$\beta$$-directions such that $$\alpha _F - c \beta$$ is the saddlepoint of some distribution in the class $${{\mathcal {P}}}$$. By Proposition 5.2 (the conjugate distribution case), we have
\begin{aligned} \sup _{\beta \in \{ \alpha _F - c \beta \in {{\mathcal {A}}} \}} \frac{{\text{d}}^2 A_F}{{\text{d}} \alpha ^2}[\alpha _F : \beta ] \ge \int _{t \in I_{\delta }} K''_F(\alpha _F(t)) \, {\text{d}}t \sim 2 \delta \sigma ^2 \quad \text{ as } \quad \delta \rightarrow 0, \end{aligned}
as $$\beta = 1$$. Finding a $$\beta$$-direction that maximizes, or minimizes, the second-order Gâteaux derivative would be helpful in a robustness study.

## 6 Action integral of the empirical distribution

It may be claimed that the validity of nonparametric methods in statistics is ensured by the uniform strong convergence of the empirical distribution function. Thus, it is worthwhile to study the action integral of this distribution. In this section, to evaluate the action integral of the distribution, the corresponding saddlepoint will be treated as a stochastic process. We assume that the probability distribution $$F \in {{\mathcal {P}}}$$ has a compact support $$[-L, L]$$, and $$F_n$$ is its empirical distribution. Let $$M_n$$ be the empirical moment generating function and $$t_F(\alpha )$$ the inverse of the saddlepoint of F. The saddlepoint $$\alpha _n(t)$$ corresponding to $$F_n$$ is called the empirical saddlepoint of F. Furthermore, $$t_n(\alpha )$$ is also defined as its inverse. As before, we will use the closed interval $$I_{\delta } = \{ t : |t - \mu | \le \delta \}$$.

### Lemma 6.1

There exists some$$\eta > 0$$such that if$$n \rightarrow \infty$$, then
\begin{aligned} \sup _{|\alpha | \le \eta } |t_n(\alpha ) - t_F(\alpha )| = O \left ( \sup _{|x| \le L} |F_n(x) - F(x)| \right ), \quad \text{ a.s. } \end{aligned}

### Proof

By Theorem 2.1 (iv), the inverse of a saddlepoint is equivalent to the derivative of the corresponding cumulant generating function. Hence, by Lemma 8.4 and 8.5, we have
\begin{aligned}&\sup _{|\alpha | \le \eta } |t_n(\alpha ) - t_F(\alpha )| \\&\quad \le \frac{\sup _{|\alpha | \le \eta } |M'_n(\alpha ) M(\alpha ) - M_n(\alpha ) M'(\alpha )|}{\inf _{|\alpha | \le \eta } |M_n(\alpha )| \inf _{|\alpha | \le \eta } |M(\alpha )|} \\&\quad \le \frac{1}{C (1 - \varepsilon )} \Bigl \{ \max _{|\alpha | \le \eta } M(\alpha ) \sup _{|\alpha | \le \eta } |M'_n(\alpha ) - M'(\alpha )| \\&\qquad + \max _{|\alpha | \le \eta } |M'(\alpha )| \sup _{|\alpha | \le \eta } |M_n(\alpha ) - M(\alpha )| \Bigr \} \\&\quad = O \left ( \sup _{|x| \le L} |F_n(x) - F(x)| \right ), \quad \text{ as } \ n \rightarrow \infty , \end{aligned}
with probability one. $$\square$$

When F has finite support, a sequence of empirical saddlepoints $$\{ \alpha _n \}_{n \in {{\mathbb {N}}}}$$ converges uniformly with probability one.

### Theorem 6.1

We have the following:
\begin{aligned} \sup _{t \in I_{\delta }} |\alpha _n(t) - \alpha _F(t)|= & {} O \left ( \sup _{|x| \le L} |F_n(x) - F(x)| \right ) \quad \text{ a.s. }, \end{aligned}
(21)
\begin{aligned} A_F[F_n] - A_F[F]= & {} o \left ( \sup _{|x| \le L} |F_n(x) - F(x)| \right ) \quad \text{ a.s. }, \end{aligned}
(22)
where $$\delta$$ is a positive number.

### Proof

By Lemmas 8.3 and 8.2 (ii), for sufficiently large n we have
\begin{aligned} \sup _{t \in I_{\delta }} |\alpha _n(t) - \alpha _F(t)| \le \Delta (\varepsilon ) \le \varepsilon \max _{t \in I_{\delta '}} \alpha '_F(t), \end{aligned}
for some $$\delta > 0$$. Hence, Lemmas 8.3 and  6.1 imply (21). Let $$J_n(t)$$ be the integrand of the difference $$A_F[F_n] - A_F[F]$$. We assume that $$g_n(t) :=\{ K_F(\alpha _n(t)) - K_F(\alpha _F(t)) \} / \{ \alpha _n(t) - \alpha _F(t) \}.$$ Then $$J_n(t) = \{ g_n(t) - t \} \{ \alpha _n(t) - \alpha _F(t) \}.$$ By (2) and (21),
$$\lim _{n \rightarrow \infty } g_n(t) = \left. \frac{{\text{d}} K_F}{{\text{d}} \alpha }(\alpha ) \right| _{\alpha = \alpha _F(t)} = t \quad \text{ a.s. },$$
for each $$t \in I_{\delta }$$. For some $$0< \theta < 1$$, we have $$g_n(t) = K'_F\bigl ( \alpha _F(t) + \theta (\alpha _n(t) - \alpha _F(t)) \bigr )$$ and
\begin{aligned} \alpha _F(t) - \varepsilon< \alpha _F(t) + \theta (\alpha _n(t) - \alpha _F(t)) < \alpha _F(t) + \varepsilon , \end{aligned}
for sufficiently large n with probability one. As $$\alpha = \alpha _F(t)$$ is strictly increasing on $$I_{\delta }$$, there exists some $$\eta > 0$$ such that
\begin{aligned} \sup _{t \in I_\delta } |g_n(t)|\le & {} \sup _{{\mathop {|\alpha - \alpha _F(t)| \le \varepsilon }\limits ^{\scriptstyle t \in I_{\delta },}}} |K'_F(\alpha )| \\\le & {} \max _{\alpha _F(\mu - \delta ) - \varepsilon \le \alpha \le \alpha _F(\mu + \delta ) + \varepsilon } |K'_F(\alpha )| \le \max _{|\alpha | \le \eta } |K'_F(\alpha )|. \end{aligned}
Hence, by the bounded convergence theorem and (21), we have
\begin{aligned} |A_F[F_n] - A_F[F]| & = \left| \int _{t \in I_{\delta }} J_n(t) \, {\text{d}}t \right| \\\le & {} \sup _{t \in I_\delta } |\alpha _n(t) - \alpha _F(t)| \int _{t \in I_{\delta }} |g_n(t) - t| \, {\text{d}}t \\ & = o \left ( \sup _{|x| \le L} |F_n(x) - F(x)| \right ) \quad \text{ a.s. } \end{aligned}
as n tends to infinity. $$\square$$
A quantity
$$\frac{A_F[F_n] - A_F[F]}{\sup _{|x| \le L} |F_n(x) - F(x)|},$$
may represent a derivative of the action integral, though it will vanish by (22).

## 7 Conclusions and comments

We proposed an approach that introduces nonparametrically “the principle of least action” into statistics. Probability measures as well as saddlepoints can be treated as paths minimizing the action integral in calculus of variations. Moreover, as convergence in law is well described, we can obtain, for instance, the central limit theorem and de Moivre–Laplace’s theorem.

However, it has not been shown what functions would be saddlepoints, and in fact, this is an open problem. Nevertheless, if the sets $${{\mathcal {P}}}$$ and $${{\mathcal {A}}}$$ are homeomorphic, then we can characterize the former by studying the latter carefully. In this respect, Takeuchi (2015) showed that $${{\mathcal {A}}}$$ is generated by the saddlepoints corresponding to the normal distributions. This could be extended to the multivariate case. We may treat the inverse mapping of a saddlepoint $${\varvec{t}} : {\varvec{\alpha }} \in {{\mathbb {R}}}^n \mapsto {\varvec{t}}({\varvec{\alpha }}) \in {{\mathbb {R}}}^n$$ as a Riemannian manifold, whose basis of the tangent space is the saddlepoints corresponding to the multivariate normal distributions. Thus, the theory of saddlepoints and related methods are not only attractive but also fruitful.

## Notes

### Acknowledgements

The author would like to express his sincere thanks to the referees for their insightful comments.

## References

1. Amari, S. (1990). Differential-geometrical methods in statistics (2nd ed.). Berlin: Springer.
2. Bahadur, R. R. (1971). Some limit theorems in statistics, no 4 in regional conference series in applied mathematics. Philadelphia: SIAM.
3. Barndorff-Nielsen, O. E., & Cox, D. R. (1989). Asymptotic techniques for use in statistics. London: Chapman and Hall.
4. Brazzale, A. R., Davison, A. C., & Reid, N. (2007). Applied asymptotics. Cambridge: Cambridge University Press.
5. Butler, R. W. (2007). Saddlepoint approximations with applications. Cambridge: Cambridge University Press.
6. Daniels, H. E. (1954). Saddlepoint approximations in statistics. Annals of Mathematical Statistics, 25, 631–650.
7. Daniels, H. E. (1980). Exact saddlepoint approximations. Biometrika, 67, 59–63.
8. DasGupta, A. (2008). Asymptotic theory of statistics and probability. New York: Springer.
9. de Bruijn, N. G. (1970). Asymptotic methods in analysis (3rd ed.). Amsterdam: North Holland.Google Scholar
10. Dembo, A., & Zeitouni, O. (1998). Large deviations techniques and applications (2nd ed.). New York: Springer.
11. Dupuis, P., & Ellis, R. S. (1997). A weak convergence approach to the theory of large deviations. New York: Wiley.
12. Ellis, R. S. (2006). Entropy, large deviations, and statistical mechanics, classics in mathematics. Berlin: Springer.
13. Field, C. A. (1985). Approach to normality of mean and M-estimators of location. Canadian Journal of Statistics, 13, 201–210.
14. Field, C. A., & Ronchetti, E. M. (1990). Small sample asymptotics. Hayward: Institute of Mathematical Statistics.
15. Gelfand, I. M. (1963). Calculus of variations. New Jersey: Prentice-Hall.Google Scholar
16. Hall, P. (1992). The bootstrap and edgeworth expansion. New York: Springer.
17. Jensen, J. L. (1995). Saddlepoint approximations. Oxford: Clarendon Press.
18. Kolassa, J. E. (1997). Series approximation methods in statistics. Lecture notes in statistics (Vol. 88). New York: Springer.
19. Kotz, S., Johnson, N. L., & Read, C. B. (1983). Encyclopedia of statistical sciences (Vol. 3). New York: Wiley.
20. Laha, R. G., & Rohatgi, V. K. (1979). Probability theory. New York: Wiley.
21. Lukacs, E. (1970). Characteristic functions (2nd ed.). London: Charles Griffin.
22. Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley.
23. Shiryaev, A. N. (1996). Probability (2nd ed.). New York: Springer.
24. Takeuchi, H. (2006). Tauberian property in saddlepoint approximations. Bulletin of Informatics and Cybernetics, 38, 59–69.
25. Takeuchi, H. (2013). Correspondence between saddlepoint and probability distribution. Journal of the Japan Statistical Society, 42(2), 185–208. (in Japanese).
26. Takeuchi, H. (2014). On a convexity of saddlepoint and its curvature. Journal of the Japan Statistical Society, 44(1), 1–17. (in Japanese).
27. Takeuchi, H. (2015). The sp-transform of probability distributions. Journal of the Japan Statistical Society, 45(1), 19–40. (in Japanese).
28. Takeuchi, H. (2016). On $$\gamma$$-decomposition of probability distributions. Journal of the Japan Statistical Society, 45(2), 231–245. (in Japanese).
29. Takeuchi, H. (2017). On a comparison between Lévy’s inversion formula and saddlepoint approximations. Journal of the Japan Statistical Society, 46(2), 113–135. (in Japanese).