Skip to main content

Review of Probability and Statistics

  • Chapter
  • First Online:
An Introduction to Thermodynamics and Statistical Physics

Part of the book series: UNITEXT for Physics ((UNITEXTPH))

  • 3919 Accesses

Abstract

Probability is one of those familiar concepts that turn out to be difficult to define formally. The commonly accepted definition, is the axiomatic one due to Kolmogorov, that provides the minimal set of properties that a probability must satisfy, but does not say anything about what it represents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The simpler way to be convinced of this fact is that the discrete walker will not be able, at finite \(t\), to reach infinity; thus \(\rho =0\) there. The principle extends to generic discrete Markov processes with local transitions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piero Olla .

Appendices

Appendix

A.1 The Central Limit Theorem

We want to determine the form of the probability distribution for sums of independent identically distributed random variables (i.i.d.) in the form

$$ X_N=\sum _{k=1}^Nx_k. $$

A fundamental question is the existence of limit forms at large \(N\) for the PDF \(\rho (X_N)\). Such limit distributions indeed exist, and their form depends solely on the behavior of the PDF \(\rho (x)\) at large values of the argument: the so-called tails of the distribution. In particular, the existence of the first moments of the distribution is crucial in the determination of the limit form for \(\rho (X_N)\).

We recall that the \(n\)the moment of a distribution will exist,

$$ \langle x^n\rangle =\int \text {d}x\ x^n\rho (x)<\infty , $$

provided \(\rho (x)\) goes to zero at infinity faster than \(x^{-1-n}\).

We can distinguish, substantially, three cases:

  • Both the variance \(\sigma _x^2\) and the mean \(\mu _x\) are finite, that is the situation, assumed in Sect. 2.3, for a thermodynamic limit to occur. In this case, \(\rho (X_N)\) will be a gaussian distribution, with mean \(\mu _{X_N}=N\mu _x\) and variance \(\sigma _{X_N}^2=N\sigma _x^2\).

  • The mean \(\mu _x\) is finite, but the variance \(\sigma _x^2\) is infinite, meaning that, for \(x\rightarrow \infty \), \(\rho (x)\sim x^{-1-\alpha }\) (within logarithms), with \(1<\alpha <2\). In this case, again, \(\mu _{X_N}=N\mu _x\), but the deviations \(\hat{X}_N=X_N-\mu _{X_N}\) are distributed with a so-called Lévy law of index \(\alpha \), whose asymptotic behavior at large \(\hat{X}_N\) is \(\rho (\hat{X}_N)\sim \hat{X}_N^{-1-\alpha }\).

  • Also the mean \(\mu _x\) is infinite. This is typically realized by an asymmetric distribution \(\rho (x)\), with asymptotic behavior at large \(x\): \(\rho (x)\sim x^{-1-\alpha }\), with \(0<\alpha <1\) (for \(\alpha <0\), the PDF would not be normalized). In this case, the limit distribution for \(X_N\) would be a Lévy law of index \(\alpha \), whose asymptotic behavior at large \(X_N\) is \(\rho (X_N)\sim X_N^{-1-\alpha }\).

The form of the limit distribution, can be calculated, exploiting the important property that the characteristic function of a sum of random variables, is the product of the characteristic functions of the addends in the sum. The PDF of a sum of independent random variables, is in fact the convolution of the PDF’s of the addends:

$$ \rho _{x+y}(z)=\int \text {d}y\ \rho _x(z-y)\rho _y(y), $$

and the characteristic function of \(x+y\), being the Fourier transform of a convolution, will be the product

$$ Z_{x+y}(j)=Z_x(j)Z_y(j). $$

Thus, in the case of a sum of i.i.d. random variables:

$$\begin{aligned} Z_{X_N}(j)=(Z_x(j))^N. \end{aligned}$$
(2.68)

This is the quantity on which we shall focus, to determine the asymptotic behavior, for \(N\rightarrow \infty \), of the PDF \(\rho (X_N)\).

2.1.1 A.1.1 The Gaussian Case

If the mean and the variance of the random variable \(x\) are both finite, we know from the law of large numbers that \(\mu _{X_N}=N\mu _x\) and \(\sigma _{X_N}^2=N\sigma _x^2\). We scale out the dependence of the PDF \(\rho (X_N)\) on the parameters \(\mu _{X_N}\) and \(\sigma _x^2\), by considering a rescaled version of the deviation from the mean \(\hat{X}_N=X_N-\mu _{X_N}\): \(Y_N=\hat{X}_N/\sigma _{X_N}\). We verify that the limit \(\rho =\lim _{N\rightarrow \infty }\rho _{Y_N}\) exists, and is indeed the Gaussian.

We have for the characteristic function of the rescaled variable \(Y_N\):

$$\begin{aligned} Z_{Y_N}(j)&=\int \text {d}Y_N\rho _{Y_N}(Y_N)(\text {i}jY_N) \nonumber \\&=\int \text {d}\hat{X}_N\ \rho _{\hat{X}_N}(\hat{X}_N)\exp (\text {i}j\hat{X}_N/\sigma _{X_N}) =Z_{\hat{X}_N}(j/\sigma _{X_N}), \end{aligned}$$
(2.69)

and, from Eq. (2.68):

$$\begin{aligned} Z_{Y_N}(j)=(Z_{\hat{x}}(j/\sigma _{X_N}))^N, \end{aligned}$$
(2.70)

where \(\hat{x}=x-\mu _x\). Taking the \(N\rightarrow \infty \) limit in \(\rho _{Y_N}\), corresponds to taking the limit \(j/\sigma _{X_N}\rightarrow 0\) in \(Z_{\hat{x}}(j/\sigma _{X_N})\). We can proceed by Taylor expansion. Since \(\langle \hat{x}\rangle =0\), the Taylor expansion of \(Z_{\hat{x}}\), does not contain the linear term:

$$\begin{aligned} Z_{\hat{x}}(j)=1-\frac{1}{2}\sigma _x^2j^2+o(j^2). \end{aligned}$$
(2.71)

Hence, substituting into Eq. (2.70), and using \(\sigma _{X_N}^2=N\sigma _x^2\):

$$\begin{aligned} Z(J)=\lim _{N\rightarrow \infty }Z_{Y_N}(j)=\lim _{N\rightarrow \infty }\Big (1-\frac{j^2}{2N}+o(N^{-1})\Big )^N =\exp (-j^2/2). \end{aligned}$$
(2.72)

Inverse Fourier transforming, we find that the limit of the distribution for \(Y_N\) is the Gaussian

$$\begin{aligned} \rho (Y_N)=\frac{1}{(2\pi )^{1/2}}\exp (-Y_N^2/2), \end{aligned}$$
(2.73)

as claimed.

The relevance of this result to the random walk dynamics, described in Sect. 2.7.1, should be apparent. If we attach a time label \(t_k=k\Delta t\) to each random variable \(x_k\), and take \(\mu _x=0\), the sum \(X_N\) will be precisely the displacement of a random walker, that, in the time \(t_N=N\Delta t\), has performed \(N\) independent steps \(x_k\). What Eq. (2.73) tells us, is that the density profile of a population of walkers, starting from a common initial position, will have, after a sufficiently long time (provided the domain is infinite), a Gaussian profile. This is in fact the result in Eq. (2.57), where the \(N\rightarrow \infty \) limit was obtained implicitly with the continuous limit \(t/\Delta t\rightarrow \infty \).

We have proved that, for fixed \(Y\), the PDF \(\rho _{Y_N}(Y)\) has the limit \(\rho (Y)\), given in Eq. (2.73). The natural question then arises about the range of \(Y_N\) for which the result in Eq. (2.73), for large but finite \(N\), holds. For sure, \(\rho _{Y_N}(Y)\) will begin to be sensitive to the properties of the tails of \(\rho _x\), when \(X_N-\mu _{X_N}\sim N\sigma _x\), i.e. when \(Y_N\sim N^{1/2}\). For instance, if \(\rho (x)=0\) for \(|x-\mu _x|>\Delta x\) (as in the case of the jump distribution of a random walker), \(\rho _{Y_N}\) will surely be zero for \(|Y_N|>N^{1/2}\). In other words, the central limit result of Gaussian statistics, for a large but finite sum of i.i.d. random variables, with finite mean and variance, will hold only far from the tails of the distribution.

We can provide a quantitative estimate of this effect, in the case the first correction to \(Z_{\hat{x}}\) is a non-zero third moment \(\langle \hat{x}^3\rangle \ne 0\):

$$ Z_{\hat{x}}(j)=1-\frac{1}{2}\sigma _x^2j^2-\frac{\text {i}}{6}\langle \hat{x}^3\rangle J^3+o(j^3). $$

Substituting into Eq. (2.70), we are able to include the leading large \(N\) correction in Eq. (2.72):

$$ Z_{Y_N}(j)=\Big (1+\text {i}\alpha s_3N^{-1/2}j^3+o(N^{-1/2})\Big )\exp (-j^2/2), $$

where \(\alpha \) is a numerical coefficient, and \(s_3=\langle \hat{x}^3\rangle /\sigma _x^{3/2}\) is the normalized third moment (so-called skewness) of the distribution \(\rho (\hat{x})\). Inverse Fourier transforming, we find the correction to Eq. (2.73):

$$\begin{aligned} \frac{\rho _{Y_N}-\rho }{\rho }\sim s_3N^{-1/2}Y_N^3. \end{aligned}$$
(2.74)

In order for the central limit to hold, we need that \(|Y_N|\ll s_3^{1/3}N^{1/6}\), i.e. \(|X_N-\mu _{X_N}|\ll s_3^{1/3}N^{2/3}\).

2.1.2 A.1.2 Lévy Distributions

We pass to consider the case in which the PDF \(\rho (x)\) does not have its first or second moment. We have seen that this corresponds to a power law behavior in the tails of the distribution: \(\rho (x)\sim x^{-1-\alpha }\), with \(0<\alpha <1\) in the first case, \(1<\alpha <2\) in the second. The presence of a power law in the tails of the distribution, \(\rho (x)\sim x^{-1-\alpha }\), will be associated, in general, with non-existence of moments \(\langle x^n\rangle \) with \(n>\alpha \). Thus, the Taylor expansion in \(j=0\) of the characteristic function \(Z_x\), will stop at \(\mathrm{int}(\alpha )\), and it is possible to see that the remnant in the Taylor expansion is in the form \(c|j|^\alpha \), with \(c\) a numerical constant. If \(\alpha > 2\), this non-analyticity will not modify the limiting form of \(Z_{\hat{x}}\), given in Eq. (2.71). If, on the other hand, either \(0<\alpha <1\), or \(1<\alpha <2\) and \(\mu _x=0\), this form must be replaced by

$$\begin{aligned} Z_x(j)\simeq 1-c|j|^\alpha . \end{aligned}$$
(2.75)

We thus see that, contrary to the finite \(\mu _x\) and \(\sigma _x^2\) case, the behavior of the tails of the PDF \(\rho _x\) is reflected in the behavior, for \(j\rightarrow 0\), of the characteristic function \(Z_x\).

To determine the PDF \(\rho _{X_N}\), we resort again to the method of characteristic functions. We find immediately the result \(Z_{X_N}(j)=(1-c|j|^\alpha )^N\), which suggests us that, in order to obtain a limit distribution, it is necessary to rescale \(X_N\). From Eq. (2.69), we see that the proper rescaling is

$$ Y_N=N^{-\alpha }X_N. $$

This leads to the asymptotic form of the characteristic function

$$\begin{aligned} Z(J)=\lim _{N\rightarrow \infty }Z_{Y_N}(j)=\lim _{N\rightarrow \infty }\Big (1-\frac{c|j|^\alpha }{N}+o(N^{-1})\Big )^N =\exp (-c|J|^\alpha ). \end{aligned}$$
(2.76)

The limit distribution \(\rho (Y_N)\), obtained inverse Fourier transforming Eq. (2.76), is called a Lévy distribution (or stable distribution) of order \(\alpha \). Its most important property, revealed by the non-analyticity of \(Z\) in \(j=0\), is the power law behavior in the tails of \(\rho (Y_N)\): \(\rho (Y_N)\sim Y_N^{-1-\alpha }\), that is the same as for the individual variable \(x\). The fact that \(Y_N\), that is a sum of infinite-mean i.i.d. random variables, is still an infinite-mean random variable, is not surprising. The fact that the scaling behavior, in the tails of \(\rho _x\) and \(\rho =\lim _{N\rightarrow \infty }\rho _{Y_N}\), is identical, reflects the different origin of the limit behavior of the \(\rho _{Y_N}\) in the Gaussian and in the Lévy case. While in the Gaussian case, the form of the limit distribution does not probe the tails of \(\rho _x\), in any way other than existence of the first moments of the distribution, in the Lévy case, it is precisely the scaling in the tails that determines the form of the limit distribution. In fact, we can interpret the law of large number result, \(\langle x\rangle _N\rightarrow \mu _x\), as a manifestation of the fact that each \(x_k\) contributes to \(X_N\) a term of the same order \(\sim \mu _x\). This property must apparently be lost in the \(\mu _x\rightarrow \infty \) case. What happens is that the value of \(X_N\) is determined typically by the largest \(x_k\) in the sequence \(\{x_k,k=1,\ldots ,N\}\).

We can get a quantitative feeling of this phenomenon, from the observation that the values of \(x_k\), in a typical sequence \(\{x_1,x_2,\ldots ,x_N\}\), will be distributed with the PDF \(\rho _x\). We see that, in order for a certain large value \(\bar{x}\) (or larger), in a typical sequence of \(N\), to be observed, it is necessary that \(NP(x>\bar{x})\gtrsim 1\), i.e.:

$$ \int \limits _{\bar{x}}^\infty \text {d}x\ \rho _x(x)\sim x^{-\alpha }\gtrsim 1/N. $$

Thus, the largest value of \(x\) observed in a typical sequence \(\{x_1,x_2,\ldots ,x_N\}\) will be:

$$ x^\textsc {max}_N\sim N^{1/\alpha }. $$

All the smaller \(x_k\)’s in the typical sequence, will be distributed with \(\rho _x\). Hence, we can estimate

$$ X_N\sim N\int \limits _0^{x^\textsc {max}_N}\text {d}x\ x\rho _x(x)\sim N(X^{1/\alpha })^{1-\alpha }=N^{1/\alpha }. $$

The sum \(X_N\) scales with the largest typical contribution \(x^\textsc {max}_N\), meaning that it is the largest contribution in the sequence \(\{x_k,k=1,\ldots ,N\}\), that dominates \(X_N\).

As we have done in the Gaussian case, we can map the problem of summing i.i.d. random variables, to a random walk, by attaching a time label \(t_k=k\Delta t\) to each increment \(x_k\). To have an unbiased random walk, we need \(\mu _x=0\), so that the interesting dynamics is the one originating from the \(1<\alpha <2\) regime of infinite variance—zero mean increments. The result is an infinite variance displacement \(X_N\), in time \(t_N=N\Delta t\). The resulting process goes under the name of Lévy flight, and could be used to describe migration of individuals, that have at their disposal means of locomotion of very diverse nature (think of human beings that, in a single day, can move by few meters, as well as embark on an intercontinental flight). The resulting evolution equation in the continuous limit, is an example of the non-local master equation (2.54), in which the propagation kernel is precisely the single increment distribution: \(w(x\rightarrow y)\propto \rho (x-y)\sim |x-y|^{-1-\alpha }\).

Additional discussion of these issues can be found e.g. in [J.-P. Bouchaud and A. Georges, Anomalous diffusion in disordered media: statistical mechanisms, models and physical applications, Phys. Rep. 195, 127 (1990)] and in [W. Feller, An introduction to probability theory and its applications (Wiley and Sons, 1968), Vol. 1].

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Olla, P. (2015). Review of Probability and Statistics. In: An Introduction to Thermodynamics and Statistical Physics. UNITEXT for Physics. Springer, Cham. https://doi.org/10.1007/978-3-319-06188-7_2

Download citation

Publish with us

Policies and ethics