# Probability distribution as a path and its action integral

- 62 Downloads

## Abstract

To describe the convergence in law of a sequence of probability distributions, “the principle of least action” is introduced nonparametrically into statistics. A probability measure should be treated as a path (in some sense) to apply calculus of variations, and it is shown that saddlepoints, which appear in the method of saddlepoint approximations, play a crucial role. An action integral, i.e., a functional of the saddlepoint, is defined as a definite integral of entropy. As a saddlepoint equation naturally appears in the Gâteaux derivative of that integral, a unique saddlepoint may be found as an optimal path for this variations problem. Consequently, by virtue of the unique correspondence between probability measures and saddlepoints, the convergence in law is clearly described by a decreasing sequence of action integrals. Thereby, a new criterion for evaluating the convergence is introduced into statistics and a novel interpretation of saddlepoints is provided.

## Keywords

Action integral Analytic characteristic function Calculus of variations Gâteaux derivative Principle of least action Saddlepoint## Mathematics Subject Classification

62G20 62G30 62G99## 1 Introduction

There are several criteria for the discrepancy between probability distributions, such as the Lévy–Prokhorov metric, the Kolmogorov distance, or the Kullback–Leibler divergence in statistics, probability, or information theory (Amari 1990; Hall 1992; Serfling 1980; Shiryaev 1996). The Edgeworth expansion may also be effective in capturing the difference between a normalized sample mean distribution and the standard normal. Nevertheless, these criteria appear to be arbitrary in a sense, and it is natural to ask whether they can be unified. In this study, a concept of entropy is used as a criterion rather than these distances by introducing “the principle of least action” into statistics. That is, discrepancy will be measured by entropy. It has been widely and successfully used in physics, control theory and statistics. Even though there is the likelihood principle in statistics, it will be applied nonparametrically.

Through the *action integral*, which is defined as a definite integral of the Lagrangian with respect to the time parameter, the convergence in law \(F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F\) may be regarded as a decreasing process of it. Furthermore, convergence speed may be clearly evaluated by the asymptotic expansion of that integral. The action integral is a functional, and a stationary point that minimizes it is called an *optimal path*, if it exists.

To define the action integral for a sequence of probability distributions, the following should be considered. First, to apply the calculus of variations, it should be proved that there is a class of probability distributions that can be treated as paths. Secondly, the limit distribution *F* should be a unique stationary point of the integral. Finally, the action integral of *F* should be minimal, compared to that of the \(F_n\), to express that *F* is the most stable state.

In Sect. 2, the Legendre transform of the cumulant generating function plays a key role, which is called *entropy* in statistical mechanics (Ellis 2006). Then an action integral of a distribution is defined by the definite integral of this entropy, which is a functional of the saddlepoint. It is shown in Takeuchi (2013) that the saddlepoint is a strictly increasing function, which uniquely corresponds to the probability distribution through the behavior on a neighborhood of the expectation. Some of its properties resemble those of the analytic characteristic function of a distribution and they are obtained in Theorems 2.1 and 2.2.

In Sect. 3, the relationship between the Gâteaux derivative of an action integral and the saddlepoint equation is derived from the fundamental lemma of calculus of variations. A unique saddlepoint is obtained as an optimal solution, i.e., optimal path, of this problem. This ensures that the corresponding probability distribution can be regarded as a path. In Theorem 3.1, the main result of this paper, it is shown that the action integral is globally minimized by a unique saddlepoint.

In Sect. 4, it is shown that the local uniform convergence of the saddlepoint implies convergence of the action integral as well as convergence in law. The action integral of the distribution for a normalized sum of sample variables is evaluated for the central limit theorem. Similarly, the de Moivre–Laplace’s theorem, the normal approximation to the Poisson distribution, and Poisson’s law of small numbers are also obtained. The main concern in calculus of variations is to derive a stationary point, i.e., an optimal path, of an action integral, and valuable information of the integral may be lost. However, as a scalar quantity, the integral may be effectively used to describe convergence in law. Asymptotic expansions and numerical examples prove this fact.

A theory based on the principle of least action has several applications in statistics. For example, a new expression for the discrepancy of conjugate distributions is obtained in Sect. 5 by means of the Gâteaux derivative of the action integral with respect to the distribution.

In Sect. 6, the action integral of an empirical distribution is evaluated. Here, it is assumed that *F* has finite support. The lemmas required in this section and Sect. 4 are collected in Sect. 8.

On the complex plane, a saddlepoint is the point of intersection of the real axis and an integral path that localizes the integrand of Lévy’s inversion formula in the method of saddlepoint approximations (Daniels 1954; de Bruijn 1970; Field and Ronchetti 1990). Despite its importance in saddlepoint approximations, the steepest descent method, and the large deviation principle, this interesting function has not received proper attention. A saddlepoint looks like a “Kuroko”, a stage assistant attired and hooded in black at Japanese Kabuki, and will effectively replace probability distributions.

## 2 Action integral of probability distributions

Gibbs’ variational principle asserts that the probability distribution that describes the equilibrium state is given by the solution to a calculus of variations problem. In this problem, the large deviation principle plays a crucial role, and the Legendre transform of the cumulant generating function \(K^*_F(t) = \sup _{ \alpha \in {{\mathbb {R}}}} \{ t \alpha - K_F(\alpha ) \}\) is called *entropy*, see Ellis (2006). Bahadur (1971) used the Cramér-type large deviation theorem: \(\lim _{n \rightarrow \infty } \frac{1}{n} \log \mathrm{Pr} \left\{ \frac{1}{n} \sum _{i = 1}^n X_i \in A \right\} = - \inf _{t \in A} I(t)\) for an asymptotic theory of statistics. The rate function *I* (*t*) is entropy (Ellis 2006) and is given by Legendre transform of the cumulant generating function of distribution *F*.

*saddlepoint*of

*F*. Under the following conditions (1), the existence of a saddlepoint is shown in Theorem 2.1, which also provides some fundamental properties of it, and whose proof is significantly simpler than that in Takeuchi (2013).

*saddlepoint equation*:

*entropy of the probability distribution*

*F*, as in Ellis (2006).

Field (1985) and Field and Ronchetti (1990) mentioned that the asymptotic normality of the *M*-estimator can be obtained using saddlepoints. Takeuchi (2014) showed that the curvature of a saddlepoint around the origin describes the asymptotic normality of a statistic. The central limit theorem can be characterized by a projection from the set of saddlepoints to a Riemannian manifold with the Fisher–Rao metric, and the projection does not depend on the coordinate system (Takeuchi 2015). For other properties and applications of saddlepoints, see Takeuchi (2006, 2016, 2017). For the methods of steepest descent and saddlepoint approximations, refer to Brazzale et al. (2007), de Bruijn (1970), Butler (2007), DasGupta (2008), Field and Ronchetti (1990), Jensen (1995) and Kolassa (1997).

*effective domain*of \(K_F\).

### Theorem 2.1

*Under the condition*(1)

*we have the following on a neighborhood of*\(\mu\),

*that is*\(t \in \{ t : |t - \mu | < \delta \}\)

*for sufficiently small*\(\delta > 0\),

*where*\(\mu\)

*and*\(\sigma ^2\)

*are the mean and variance of a distribution*

*F*,

*respectively.*

- (i)
*The saddlepoint equation*(2)*has a unique solution*\(\alpha = \alpha _t\). - (ii)
*The saddlepoint*\(\alpha (t)\)*exists and is equivalent to the solution*\(\alpha _t\). - (iii)
\((K^*_F)'(t) = \alpha (t)\).

- (iv)
*The inverse of the saddlepoint*\(\alpha = \alpha (t)\)*is given by*\(t = K'_F(\alpha )\). - (v)
\(t = \mu\), \(\alpha (t) = 0\)

*and*\(K^*_F(t) = 0\)*are equivalent.*

### *Proof*

*F*through the behavior on a neighborhood of the origin in \(\mathrm{dom} K_F\) (Lukacs 1970).

- (i)
As (1) implies that \(K''_F(0) = \sigma ^2 > 0\), a continuous function \(t = K'_F(\alpha )\) is strictly increasing on a neighborhood of the origin. Hence, \(K'_F(0) = \mu\) implies that there exists some \(\delta > 0\) such that if \(|t - \mu | < \delta\) then the saddlepoint equation (2) has a unique solution \(\alpha _t = (K'_F)^{-1}(t)\).

- (ii)
As \(g(\alpha ) :=t \alpha - K_F(\alpha )\) satisfies \(g''(0) = - K''_F(0) = - \sigma ^2 < 0\),

*g*is continuous and strictly convex upward. We have \(g'(\alpha _t) = t - K_F'(\alpha _t) = 0\) from (i), and thus \(\alpha _t \in \mathrm{dom} K_F\) is a unique maximizer of \(g(\alpha )\). Hence, by the definition of saddlepoint, we have \(\alpha (t) = \alpha _t\). - (iii)
By the proof of (ii), we have that \(K^*_F(t) = \max _{ \alpha \in \mathrm{dom}{} { K_F}} \{ t \alpha - K_F(\alpha ) \} = t \alpha (t) - K_F(\alpha (t))\) As \(\alpha (t)\) is differentiable, and the conclusion follows from (2).

- (iv)
It is obvious from the relation \(K'_F(\alpha _F(t)) = t\).

- (v)
It follows immediately from the properties of \(g(\alpha )\) in (ii) above, the fact that \(K'_F(\alpha )\) is strictly increasing, \(K^*_F(t) = t \alpha (t) - K_F(\alpha (t))\) in (iii), and (2).

*F*has an analytic characteristic function if and only if there exist some \(a, b \in \overline{{\mathbb {R}}}_{+}\) such that \(\mathrm{dom} K_F = (-a, b)\) (Lukacs 1970). Then by Takeuchi (2013), the corresponding saddlepoint \(\alpha _F\) is a real-valued function such that

The uniqueness of the correspondence between a probability distribution and its saddlepoint is given by Theorem 2.2, which was first proved in Takeuchi (2013). However, an improved proof will be provided here.

### Theorem 2.2

*We assume that*(1)

*is satisfied. The behavior of the saddlepoint on a neighborhood of its zero point uniquely determines the corresponding probability distribution. Moreover, we have the following inversion formula:*

### *Proof*

*t*in a neighborhood of \(t = \mu\). This implies that there is a one-to-one correspondence between \(\alpha _F(t)\) and \(K^*_F(t)\) on this neighborhood. Furthermore, \(K^*_F(t)\) uniquely corresponds to \(K_F(\alpha )\) on a neighborhood of \(\alpha = 0\) by the uniqueness of the Legendre transform for strictly convex functions. By the property of analytic characteristic functions, (5) uniquely determines the distribution

*F*. Hence, the first part of the theorem is proved. Finally, we have (4) by the Legendre transform of (5) using \(\mu = \alpha ^{-1}(0)\). \(\square\)

Field and Ronchetti (1990) mentioned the uniqueness and applied the saddlepoint to the central limit theorem.

Let \(t = t(\alpha )\) be the inverse function of the saddlepoint \(\alpha = \alpha (t)\). The inversion formula with respect to \(t(\alpha )\) and \(K^*(t)\) is given in Takeuchi (2013).

### Corollary 2.1

In what follows, we provide examples of the cumulant generating function, the Legendre transform, and the corresponding saddlepoint for the univariate normal, the inverse Gaussian, the gamma, the binomial, and the multivariate normal distributions. As each effective domain \(\mathrm{dom} K_F\) includes the origin \(\alpha = 0\) as an inner point, the distributions have analytic characteristic function (Lukacs 1970). Hence, by Theorem 2.2, each distribution and its saddlepoint are in one-to-one correspondence. It should be noted that Daniels (1980) showed that the normal, the gamma, and the inverse Gaussian are the only distributions for which the saddlepoint approximations are exact up to normalization.

### Example 2.1

### Example 2.2

### Example 2.3

### Example 2.4

*B*(

*n*,

*p*) \((p, q > 0, \, p + q = 1)\)

### Example 2.5

### Example 2.6

In saddlepoint approximations, a solution to Eq. (2) has been used as a saddlepoint \(\alpha (t)\) for fixed *t*. In this study, \(\alpha = \alpha (t)\) will be treated as a path that belongs to a function space. Consequently, probability distributions will be treated as paths, and convergence in law will be described by an action integral. To use calculus of variations, an action integral will be defined as follows:

### Definition 2.1

*action integral of the distribution*

*G*with respect to

*F*, where \(\mu\) is the mean of

*F*and \(\delta\) is a sufficiently small positive number. Particularly, \(A_{F}[\alpha _{F}]\) is called the

*action integral of*

*F*.

The action integral (7) is a functional of saddlepoints in \({\mathcal {A}}\). By Theorem 2.2, the interval \((\mu - \delta , \mu + \delta )\) for the integral (7) is valid. As the action integral \(A_{F}[\alpha ]\) is minimized if and only if \(\alpha = \alpha _F(t)\) by Theorem 3.1, it is quite natural to use saddlepoints as an argument for the action integral.

By Theorem 3.1, the inequality \(A_F[\alpha _{F_n}] \ge A_F[\alpha _F]\) holds for any sequence of probability distributions \(\{ F_n \}_{n \in {{\mathbb {N}}}}\). This means that the sequence \(\{ A_F[F_n] \}_{n \in {{\mathbb {N}}}}\) may attain its minimum at the limit distribution *F*, and the minimum will be given by Proposition 2.1 (i) below. This is the reason why we need a minus sign for our definition of the action integral (7). This is essential for describing the convergence in law \(F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F\) by the action integral. Furthermore, some fundamental properties are obtained in Proposition 2.1. In what follows, the notation \(A_X[\alpha _X]\) implies \(A_F[\alpha _F]\) if a random variable *X* has distribution *F*.

### Proposition 2.1

*The action integral*\(A_F[\alpha _F]\)

*has the following properties.*

- (i)
\(A_{F}[\alpha _{F}] = - \int _{\mu -\delta }^{\mu + \delta } K^{*}_{F}(t) \, {\text{d}}t = - \int _{\mu -\delta }^{\mu + \delta } \left( \int _{\mu }^{t} \alpha _{F}(u) \, {\text{d}}u \right) {\text{d}}t.\)

- (ii)
\(-\infty = \inf _{F \in {{\mathcal {P}}}} A_F[\alpha _F]< A_F[\alpha _F] < 0\).

- (iii)
\(A_{X+c}[\alpha_{X+c}]=A_X[\alpha_X]\)

*for any fixed*\(c \in {{\mathbb {R}}}\). - (iv)
\(A_{cX}[\alpha_{cX}]>A_X[\alpha_X]\)

*for*\(c > 1\).

### Proof

- (i)
The first relation follows from (7) and the proof of Theorem 2.1 (iii). The second is obvious from (5).

- (ii)
As \(K^*_F(t) = t \alpha _F(t) - K_F(\alpha _F(t))\) is a continuous function on a closed interval \([\mu - \delta , \mu + \delta ]\) for sufficiently small \(\delta > 0\), the integral \(\int_{{\mu - \delta }}^{{\mu + \delta }} {K_{F}^{*} } (t){\text{d}}t\) exists. Furthermore, Example 2.7 implies that \(A_N[\alpha _N] \searrow -\infty\) as \(\sigma \searrow 0\). Hence in general, \(\inf _{F} A_F[\alpha _F] = -\infty\). By the property of the Legendre transform with respect to convex functions, \(K^*_F(t) > 0\) for \(t \ne \mu\), whereas \(K^*_F(t) = 0\) for \(t = \mu\). Then (i) implies that \(A_F[\alpha _F] < 0\).

- (iii)The Legendre transform of the cumulant generating function for \(X + c \sim F(x - c)\) is given by \(K^*_{X + c}(t) = K^*_{F} (t - c)\). By (i), we have$$\begin{aligned} A_{X + c}[\alpha _{X + c}] = - \int _{(\mu + c) - \delta }^{(\mu + c) + \delta } K^*_{F}(t - c) \, {\text{d}}t = - \int _{\mu - \delta }^{\mu + \delta } K^*_F(u) \, {\text{d}}u = A_{F}[\alpha _{F}]. \end{aligned}$$
- (iv)The Legendre transform of a cumulant generating function for \(cX \sim F(x / c)\) is \(K^*_{cX}(t) = K^*_F(t / c)\). By (ii), if \(\mu = 0\) and \(t \ne 0\) then \(K^*_{F}(t / c) < K^*_{F}(t)\) almost everywhere. Hence, (i) impliesFinally, by (iii), we have$$\begin{aligned} A_X[\alpha _X]< & {} - \int _{-\delta }^{\delta } K^*_F(t / c) \, {\text{d}}t = - \int _{-\delta }^{\delta } K^*_{cX}(t) \, {\text{d}}t = A_{cX}[\alpha _{cX}]. \end{aligned}$$for the case \(\mu \ne 0\). \(\square\)$$\begin{aligned} A_X[\alpha _X] = A_{X - \mu }[\alpha _{X - \mu }] < A_{c(X - \mu )}[\alpha _{c(X - \mu )}] = A_{cX}[\alpha _{cX}], \end{aligned}$$

Some examples of action integrals are now provided.

### Example 2.7

### Example 2.8

*B*(

*n*,

*p*)

### Example 2.9

### Example 2.10

### Example 2.11

*d*-th dimensional multivariate normal. \(N(\varvec{\mu }, \varvec{\varSigma })\)

## 3 Probability distribution as a path

In this section, an optimal property of saddlepoints will be derived from the principle of least action. As by Definition 2.1, the action integral \(A_F[\alpha ]\) is a functional of the saddlepoint, \(\alpha = \alpha (t)\) is treated as a path belonging to the set \({{\mathcal {A}}}\). To apply calculus of variations to the functional, we assume the following two conditions. We define the closed interval \(I_{\delta }\) as \(\{ t : |t - \mu | \le \delta \}\). (1) a real valued function \(\beta (t)\) is continuous on \(I_{\delta }\). (2) there exists an integrable function *g* such that \(|K_F(\alpha (t) + c \beta (t)) - K_F(\alpha (t))| \le |c| g(t)\) for almost all \(t \in I_{\delta }\).

### Theorem 3.1

*Let*

*F*

*be a probability distribution in the class*\({{\mathcal {P}}}\).

*Then the corresponding saddlepoint*\(\alpha _F \in {{\mathcal {A}}}\)

*is the unique path that minimizes the action integral*\(A_F[\alpha ]\).

*The minimum*\(\min _{\alpha \in {{\mathcal {A}}}} A_F[\alpha ]\)

*is*

### Proof

*F*is not a degenerate distribution, \(K''_F(\alpha )\) is positive on a neighborhood of the origin. If \(\delta > 0\) is sufficiently small, then \(K''_F(\alpha _F(t)) \approx \sigma ^2_F\) for \(t \in I_{\delta }\); we have,

In Sect. 2, the action integral (7) of a probability distribution \(F \in {{\mathcal {P}}}\) was defined using entropy. In the proof of Theorem 3.1, the saddlepoint equation (2) naturally appeared in the Gâteaux derivative (9). Moreover, the saddlepoint as the solution to this variations problem is shown to be globally optimal by the principle of least action. Therefore, (7) is quite appropriate as a definition of the action integral for a probability distribution. It should be noted that the saddlepoint equation (2) corresponds to the Euler–Lagrange equation. For calculus of variations and related fields, the reader is referred to Gelfand (1963), for example.

The solution to equation (2) was defined to be a saddlepoint \(\alpha (t)\) for fixed *t*; however, by Theorem 3.1, it can be treated as a path \(\alpha _F\) in the class \({\mathcal {A}}\). This provides a completely new interpretation of saddlepoints.

## 4 Convergence in law by action integral

By Theorem 3.1 the convergence in law \(F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F\) may be regarded as \(F_n\) reducing its action integral and reaching a minimal state *F*. This resembles a marble released from the edge of a bowl, rolling along its surface, and finally stopping at some future time. The relation \(A_F[F_n] \ge A_F[F]\) shows the situation that the stopping state is the lowest.

In this section, it will be shown that an action integral can clearly describe convergence in law. The local uniform convergence of a sequence of saddlepoints implies the convergence of the action integral as well as convergence in law. \(\mu\) and \(\sigma ^2\) denote the mean and the variance of *F*, respectively.

### Theorem 4.1

*Let*\(\{ F_n \}_{n \in {{\mathbb {N}}}}\)

*be a sequence of probability distributions, and*\(\{ \alpha _n \}_{n \in {{\mathbb {N}}}}\)

*the corresponding sequence of saddlepoints. We assume that there exists some*\(\delta > 0\)

*such that*

*Then the following hold:*

- (i)
\(F_n {\mathop {\rightarrow }\limits ^{{\mathcal {L}}}} F\).

- (ii)
\(A_F[F_n]\)

*converges to*\(A_F[F]\)*from above.*

### Proof

(i) The inverse functions of \(\alpha _n(t)\) and \(\alpha (t)\) are \(K'_n(\alpha )\) and \(K'_F(\alpha )\), respectively, by Theorem 2.1 (iv). By (11) and Corollary 8.1, for some \(\eta > 0\) we have \(\sup _{|\alpha | \le \eta } |K'_n(\alpha ) - K'_F(\alpha )| \rightarrow 0\) as \(n \rightarrow \infty\). Then \(K_n(\alpha ) \rightarrow K_F(\alpha )\) holds by the bounded convergence theorem for each \(\alpha\) in a neighborhood of the origin, and thus the conclusion follows from the continuity theorem for analytic characteristic functions Kotz et al. (1983).

*n*. Hence, by the bounded convergence theorem

### Example 4.1

Although the condition (11) implies path convergence, the action integral (7) compresses the information of a distribution into a scalar quantity. Therefore, to capture the asymptotic normality of \(S_n\), the method in Example 4.1 could be improved by observing the action integral, that is, \(A_N[S_n] \searrow A_N[N]\). This is demonstrated in Propositions 4.1, 4.2, and 4.3.

Let \(F_n\) be the probability distribution of the statistic \(S_n\). Theorem 4.2 resembles the Edgeworth expansion, but has different interpretation. The expansion (13) shows that a sequence of paths \(\{ F_n \}_{n \in {{\mathbb {N}}}}\) reduces its action integral and reaches the least state *N* in terms of sample size *n* and the *i*-th order cumulants \(\kappa _i\) (\(i = 1,\ldots,5\)). That is, we can obtain the central limit theorem by observing the convergence \(F_n \in {{\mathcal {P}}} \rightarrow N \in {{\mathcal {P}}}\) from the perspective of the principle of least action.

### Theorem 4.2

*We have the following expansion:*

### Proof

It is well known that the rate of convergence in law of \(S_n\) is faster when the distribution *F* is symmetric. By setting \(\kappa _3 = 0\) in (13), this is also true for the action integral.

De Moivre–Laplace’s theorem is obtained similarly.

### Proposition 4.1

*Let*\(B_n\)

*be the distribution of a normalized sum of the sample from*

*B*(1,

*p*).

*Then we have*\(A_N[B_n] \searrow A_N[N]\)

*as*\(n \rightarrow \infty\),

*and the following expansion:*

*where*\(p, q > 0\)

*and*\(p + q = 1\).

### Proof

*B*(1,

*p*), i.e., \(\kappa _2 = pq, \ \kappa _3 = -pq(p - q), \ \kappa _4 = pq(6p^2 - 6p + 1), \ \kappa _5 = -pq(p - q)(12 p^2 - 12 p + 1)\) to (13), we obtain the conclusion. \(\square\)

*p*.

The convergence in law of a normalized Poisson to the standard normal is obtained as follows.

### Proposition 4.2

*Let*\(P_{\lambda }\)

*be the distribution of a normalized Poisson random variable. Then we have*\(A_{N}[P_{\lambda }] \searrow A_{N}[N]\)

*and*

### Proof

*X*follows the Poisson distribution \({\text{Po}}(\lambda )\), then the saddlepoint of its normalization is given by

*X*. With the cumulant generating function \(K_N(\alpha ) = \alpha ^2 / 2\), the action integral of a normalized Poisson random variable with respect to the standard normal distribution is given as follows:

In the non-normal limit case, we provide an action integral version of the Poisson’s law of small numbers as follows.

### Proposition 4.3

*The action integral of the Binomial with respect to the Poisson distribution*\(A_{\text{Po}}[B]\)

*converges to*\(A_{\text{Po}}[{\text{Po}}]\)

*as*\(n \rightarrow \infty\),

*with*\(\lambda = np\)

*constant. Furthermore,*

### Proof

*B*(

*n*,

*p*) is given in Example 2.4. For \(\lambda = np\) constant,

*B*with respect to Po is

Action integral of Poisson

\(\lambda\) | 4 | 6 | 8 | 10 | 12 |
---|---|---|---|---|---|

\(A_{\text{Po}}[{\text{Po}}]\) | − 2.4037 | − 1.5405 | − 1.1415 | − 0.9083 | − 0.7548 |

It should be noted that the sum on the right-hand side of (13), (14), (16) and (17) is not negative, except for the initial term. This follows from Theorem 3.1.

## 5 Action integral of the conjugate distribution

We assume that \({\hat{\alpha }}_n(t)\) is the saddlepoint of a distribution \(\mathrm{Pr} \{ {\hat{\theta }}_n \le x \}\) for some statistic \({\hat{\theta }}_n\) based on a sample from a distribution \(F \in {{\mathcal {P}}}\), where \(\beta = \beta (t)\) is a continuous function. Then the difference of the action integrals \(A_F[{\hat{\alpha }}_n + c \beta ] - A_F[{\hat{\alpha }}_n]\) may express the robustness of this statistic. Using this idea, we can characterize the conjugate distribution, which is defined as \(F_c(x) = \frac{1}{M_F(c)} \int _{-\infty }^{x} e^{cu} \, {\text{d}}F(u),\) where \(M_F\) is the moment generating function of *F*. If \(c = \alpha _F(t)\) (the saddlepoint of *F* at *t*), then \(\int _{-\infty }^{\infty } x \, {\text{d}}F_c(x) = t,\) see for example Barndorff-Nielsen and Cox (1989). The distribution plays an essential role in the large deviation principle, the method of steepest descent, and the method of saddlepoint approximations.

### Proposition 5.1

*If*\(\beta = 1\), *then*\(G_{c \beta }\)*is equivalent to the conjugate distribution of**F*.

### Proof

Proposition 5.1 asserts that if the saddlepoint \(\alpha _F\) moves in the \(\beta\)-direction, i.e., \(\alpha _F - c \beta\), then \(\beta = 1\) is “the direction of the conjugate distribution”.

The action integral of the conjugate distribution is evaluated as in Proposition 5.2. We also use \(I_{\delta } = \{ t : |t - \mu | \le \delta \}\).

### Proposition 5.2

*For sufficiently small*\(\delta > 0\),

*we have*

### Proof

As in Proposition 5.2, the dispersion between *F* and its conjugate distribution with respect to action integral is proportional to the variance. The normal distribution provides an illustrative example.

### Example 5.1

*F*in (20) is \(N(\mu , \sigma ^2)\), then for any \(\delta > 0\) and \(c \in {{\mathbb {R}}}\),

*F*as \(c \rightarrow 0\). It can be shown that its second-order Gâteaux derivative satisfies

## 6 Action integral of the empirical distribution

It may be claimed that the validity of nonparametric methods in statistics is ensured by the uniform strong convergence of the empirical distribution function. Thus, it is worthwhile to study the action integral of this distribution. In this section, to evaluate the action integral of the distribution, the corresponding saddlepoint will be treated as a stochastic process. We assume that the probability distribution \(F \in {{\mathcal {P}}}\) has a compact support \([-L, L]\), and \(F_n\) is its empirical distribution. Let \(M_n\) be the empirical moment generating function and \(t_F(\alpha )\) the inverse of the saddlepoint of *F*. The saddlepoint \(\alpha _n(t)\) corresponding to \(F_n\) is called the *empirical saddlepoint* of *F*. Furthermore, \(t_n(\alpha )\) is also defined as its inverse. As before, we will use the closed interval \(I_{\delta } = \{ t : |t - \mu | \le \delta \}\).

### Lemma 6.1

*There exists some*\(\eta > 0\)

*such that if*\(n \rightarrow \infty\),

*then*

### Proof

When *F* has finite support, a sequence of empirical saddlepoints \(\{ \alpha _n \}_{n \in {{\mathbb {N}}}}\) converges uniformly with probability one.

### Theorem 6.1

*We have the following:*

*where*\(\delta\)

*is a positive number.*

### Proof

*n*we have

*n*with probability one. As \(\alpha = \alpha _F(t)\) is strictly increasing on \(I_{\delta }\), there exists some \(\eta > 0\) such that

*n*tends to infinity. \(\square\)

## 7 Conclusions and comments

We proposed an approach that introduces nonparametrically “the principle of least action” into statistics. Probability measures as well as saddlepoints can be treated as paths minimizing the action integral in calculus of variations. Moreover, as convergence in law is well described, we can obtain, for instance, the central limit theorem and de Moivre–Laplace’s theorem.

However, it has not been shown what functions would be saddlepoints, and in fact, this is an open problem. Nevertheless, if the sets \({{\mathcal {P}}}\) and \({{\mathcal {A}}}\) are homeomorphic, then we can characterize the former by studying the latter carefully. In this respect, Takeuchi (2015) showed that \({{\mathcal {A}}}\) is generated by the saddlepoints corresponding to the normal distributions. This could be extended to the multivariate case. We may treat the inverse mapping of a saddlepoint \({\varvec{t}} : {\varvec{\alpha }} \in {{\mathbb {R}}}^n \mapsto {\varvec{t}}({\varvec{\alpha }}) \in {{\mathbb {R}}}^n\) as a Riemannian manifold, whose basis of the tangent space is the saddlepoints corresponding to the multivariate normal distributions. Thus, the theory of saddlepoints and related methods are not only attractive but also fruitful.

## Notes

### Acknowledgements

The author would like to express his sincere thanks to the referees for their insightful comments.

## References

- Amari, S. (1990).
*Differential-geometrical methods in statistics*(2nd ed.). Berlin: Springer.zbMATHGoogle Scholar - Bahadur, R. R. (1971).
*Some limit theorems in statistics, no 4 in regional conference series in applied mathematics*. Philadelphia: SIAM.CrossRefGoogle Scholar - Barndorff-Nielsen, O. E., & Cox, D. R. (1989).
*Asymptotic techniques for use in statistics*. London: Chapman and Hall.CrossRefGoogle Scholar - Brazzale, A. R., Davison, A. C., & Reid, N. (2007).
*Applied asymptotics*. Cambridge: Cambridge University Press.CrossRefGoogle Scholar - Butler, R. W. (2007).
*Saddlepoint approximations with applications*. Cambridge: Cambridge University Press.CrossRefGoogle Scholar - Daniels, H. E. (1954). Saddlepoint approximations in statistics.
*Annals of Mathematical Statistics*,*25*, 631–650.MathSciNetCrossRefGoogle Scholar - Daniels, H. E. (1980). Exact saddlepoint approximations.
*Biometrika*,*67*, 59–63.MathSciNetCrossRefGoogle Scholar - DasGupta, A. (2008).
*Asymptotic theory of statistics and probability*. New York: Springer.zbMATHGoogle Scholar - de Bruijn, N. G. (1970).
*Asymptotic methods in analysis*(3rd ed.). Amsterdam: North Holland.Google Scholar - Dembo, A., & Zeitouni, O. (1998).
*Large deviations techniques and applications*(2nd ed.). New York: Springer.CrossRefGoogle Scholar - Dupuis, P., & Ellis, R. S. (1997).
*A weak convergence approach to the theory of large deviations*. New York: Wiley.CrossRefGoogle Scholar - Ellis, R. S. (2006).
*Entropy, large deviations, and statistical mechanics, classics in mathematics*. Berlin: Springer.CrossRefGoogle Scholar - Field, C. A. (1985). Approach to normality of mean and M-estimators of location.
*Canadian Journal of Statistics*,*13*, 201–210.MathSciNetCrossRefGoogle Scholar - Field, C. A., & Ronchetti, E. M. (1990).
*Small sample asymptotics*. Hayward: Institute of Mathematical Statistics.zbMATHGoogle Scholar - Gelfand, I. M. (1963).
*Calculus of variations*. New Jersey: Prentice-Hall.Google Scholar - Hall, P. (1992).
*The bootstrap and edgeworth expansion*. New York: Springer.CrossRefGoogle Scholar - Jensen, J. L. (1995).
*Saddlepoint approximations*. Oxford: Clarendon Press.zbMATHGoogle Scholar - Kolassa, J. E. (1997).
*Series approximation methods in statistics. Lecture notes in statistics*(Vol. 88). New York: Springer.CrossRefGoogle Scholar - Kotz, S., Johnson, N. L., & Read, C. B. (1983).
*Encyclopedia of statistical sciences*(Vol. 3). New York: Wiley.zbMATHGoogle Scholar - Laha, R. G., & Rohatgi, V. K. (1979).
*Probability theory*. New York: Wiley.zbMATHGoogle Scholar - Lukacs, E. (1970).
*Characteristic functions*(2nd ed.). London: Charles Griffin.zbMATHGoogle Scholar - Serfling, R. J. (1980).
*Approximation theorems of mathematical statistics*. New York: Wiley.CrossRefGoogle Scholar - Shiryaev, A. N. (1996).
*Probability*(2nd ed.). New York: Springer.CrossRefGoogle Scholar - Takeuchi, H. (2006). Tauberian property in saddlepoint approximations.
*Bulletin of Informatics and Cybernetics*,*38*, 59–69.MathSciNetzbMATHGoogle Scholar - Takeuchi, H. (2013). Correspondence between saddlepoint and probability distribution.
*Journal of the Japan Statistical Society*,*42*(2), 185–208.**(in Japanese)**.MathSciNetzbMATHGoogle Scholar - Takeuchi, H. (2014). On a convexity of saddlepoint and its curvature.
*Journal of the Japan Statistical Society*,*44*(1), 1–17.**(in Japanese)**.MathSciNetCrossRefGoogle Scholar - Takeuchi, H. (2015). The sp-transform of probability distributions.
*Journal of the Japan Statistical Society*,*45*(1), 19–40.**(in Japanese)**.MathSciNetGoogle Scholar - Takeuchi, H. (2016). On \(\gamma\)-decomposition of probability distributions.
*Journal of the Japan Statistical Society*,*45*(2), 231–245.**(in Japanese)**.MathSciNetGoogle Scholar - Takeuchi, H. (2017). On a comparison between Lévy’s inversion formula and saddlepoint approximations.
*Journal of the Japan Statistical Society*,*46*(2), 113–135.**(in Japanese)**.MathSciNetGoogle Scholar