# Introduction to Probability Theory

Chapter
Part of the Die Grundlehren der mathematischen Wissenschaften book series (GL, volume 202)

## Abstract

We have already mentioned in the introduction that the axioms of mathematical probability1 are to be so chosen that they reflect empirical situations when given an appropriate interpretation. We have seen that a characterization of mass phenomena can be given in a certain sense by the empirical probabilities of the events occuring. It is thus desirable to choose the notion of mathematical probability in such a way that the theorems of the mathematical theory yield empirically verifiable facts if the mathematical probability is replaced by the empirical. We then speak briefly of the frequency interpretation of the mathematical theory. The simplest calculation rules of empirical probability are expressed by 1. and 2. (p. 23). These serve as model for the axioms of mathematical probability. In this chapter, we will discuss the most important facts of probability theory. However, we should point out at once that our program is not a complete construction of the theory. Since the main emphasis in this book is on the application of probability in mathematical statistics, many of the important theorems in this chapter will be given without proof.

## Keywords

Characteristic Function Probability Theory Conditional Probability Pairwise Disjoint Independent Random Variable
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
We give here a summary of the most important texts on probability theory by means of which the reader can fill in any gaps left here and deepen his knowledge. Bauer, Heinz: Wahrscheinlichkeitstheorie und Grundzuge der Maßtheorie.Google Scholar
2. 2.
All sets here belong to S even when this is not explicitly stated.Google Scholar
3. 3.
Th. Bayes, Philos. Trans. Roy. Soc. 53, 376–398 (1763) and 54, 298–310 (1764).Google Scholar
4. 4.
5. 5.
Frequently, P(-∞< ξ <x) is defined to be the distribution function of ξ, e.g. by Kolmogorov, l.c. Intro.6 Google Scholar
6. 6.
This is precisely the case when F is absolutely continuous, i. e., to each ε>0 there corresponds a δ>0, such that for each finite or countably infinite set of pairwise disjoint intervals (xi,yi), Σ F(yi)-F(xi) <ε if Σ yi-xi <δ.Google Scholar
7. 7.
Properly, it should be referred to as “a” density; however, when no misunderstanding is likely—here and in similar cases—we apply the definite article.Google Scholar
8. 8.
Briefly, we usually write R.-N.-density.Google Scholar
9. 9.
In this case, trivial changes in notation have to be introduced.Google Scholar
10. 10.
Again this is precisely the case when F is absolutely continuous. f is the Radon-Nikodym density relative to n-dimensional Lebesgue measure.Google Scholar
11. 11.
Naturally, L(B) denotes the Lebesgue measure of B.Google Scholar
12. 12.
It is convenient to agree also to write g(ξ) or g1,...,ξn)Google Scholar
13. 13.
This definition can easily be extended to infinitely many random variables. Cf. the remark following (2.1).Google Scholar
14. 14.
A better terminology would be marginal distribution of (ξ1,...,ξn) relative to (ξ1,...,ξk), but the expression employed here has established itself in the literature.Google Scholar
15. 15.
16. 16.
We will also refer to $$({\sigma _{ij}})_{1n}^{1n}$$ as the covariance matrix of Pξ.Google Scholar
17. 17.
Obviously, each moment of odd order E[(ξ-a)2n+1],n⩾0, of a distribution which is symmetric with respect to a vanishes whenever it exists.Google Scholar
18. 18.
We also say: all versions of P(A ∣ G) differ from each other only on P-null sets.Google Scholar
19. 19.
See for this and related problems D. H. Blackwell, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1954–1955 Vol. II, pp. 1–6, University of California Press, Berkeley and Los Angeles (1956) and D.H. Blackwell and C. Ryll-Nardzewski, Ann. Math. Statist. 34, 223–225 (1963).Google Scholar
20. 20.
In place of PR1(Ay ξ)(z) we also write PR1(Ay ξ = z). See p. 57 and p. 60.Google Scholar
21. 21.
For more general investigations see M. Jifina, Czechosl. Math. J. 4, (79) 372–380 (1954) and Czechosl. Math. J. 9, (84) 445-451 (1959).Google Scholar
22. 22.
This concept is equivalent to that of the conditional probability given a σ-algebra as it is easy to see.Google Scholar
23. 23.
This condition can be dispensed with. See E.L. Lehmann l.e. Not.11 37–38Google Scholar
24. 24.
25. 25.
Of course, one cannot manage with the form of Theorem 6.1 given here, but requires a generalization of this result to set functions which are not necessarily non-negative. However, this generalization can easily be obtained from Theorem 6.1.Google Scholar
26. 26.
More precisely: If ξ and n are r.v.’s and E(ξ2) and E(n2) exist, then (in the notation of 20) $${(E(|\xi \eta ||))^2} \le E({\xi ^2}|)E({\eta ^2}|)P - a.e.$$.Google Scholar
27. 27.
Thus, in somewhat more general formulation, Theorem 21.1 states that if ξ is a r.v. and E(ξ2) exists, then =$$E(\xi |)$$ is the orthogonal projection of ξ onto the set of S-measurable functions.Google Scholar
28. 28.
This Theorem is due to P. Levy: P. Levy, Calcul des Probabilites, Gauthier-Villars et Cie., Paris, 1925, 166ff.Google Scholar
29. 29.
P. Levy, l.c.28 195 ff.Google Scholar
30. 30.
Moreover, it can always be assumed that this function is right-continuous.Google Scholar
31. 31.
From Lemma 23.2 and Lemma 23.1 one can easily infer Theorem 23.4.Google Scholar
32. 32.
This exist for all i,j from Theorem 17.3Google Scholar
33. 33.
Another example is given e.g. on p. 81.Google Scholar
34. 34.
Stieltjes, T.J. Nouv. Ann. Math., ser. 3, 9, 479–480 (1890).Google Scholar
35. 35.
Not all the coefficients in these linear combinations should be zero.Google Scholar
36. 36.
F. R. Helmert, Zeitschrift für Math, und Physik 21,192–219 (1876). K. Pearson, Philos. Mag. 50. Ser. 5, 157–175 (1900). 37 “Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).Google Scholar
37. 37.
“Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).Google Scholar
38. 38.
This distribution is also named for Snedecor.Google Scholar
39. 39.
R.A. Fisher, Metron 1, 1–32 (1921).Google Scholar
40. 40.
K. Pearson, Philos. Trans. Roy. Soc. London, Ser. A 185, 71–110 (1894).
41. 41.
We will no longer state the intervals over which the densities vanish. The constant C is always to be chosen in such a way that (6.3) holds in each case.Google Scholar
42. 42.
We can also show this without any calculations: Let p1 >p.Google Scholar
43. 43.
This formula already appears in A. Meyer, Vorlesungen uber Wahrscheinlichkeitsrechnung, B.G. Teubner, Leipzig 1879.Google Scholar
44. 44.
For a thorough treatement of limit theorems see B. V. Gnedenko and A. Kolmogoroff, Limit Distributions for Sums of Independent Random Variables, Cambridge, Mass., 1954.Google Scholar
45. 45.
If r-l>M1-M, then $$\left( \begin{array}{l} {M^1} - M \\ \,\,\,\,\,r - l \\ \end{array} \right) = 0.$$ Google Scholar
46. 46.
A. J. Hincin, C. R. Acad. Sci., Paris 189, 477–479 (1929).Google Scholar
47. 47.
A systematic treatment of the properties of this notion and of related concepts can be found in E. Lukacs, Stochastic Convergence, Math. Monographs, D. C. Heath, Lexington, Mass. 1968.Google Scholar
48. 48.
This short proof is due to Borges. Theorem 38.3 as well as Theorem 38.4 remain correct if the stochastic convergence to a real number (or to a k-tuple of real numbers) in the assumption and claim is replaced by stochastic convergence to a r.v. See K. Krickeberg, I.e. Not.7. We will make no use of this fact here.Google Scholar
49. 49.
See loc. cit.44 and P. Levy, Theorie de l’addition des variables aleatoires. Gauthier-Villars 2nd Ed., Paris 1954.Google Scholar
50. 50.
P. Levy, loc. cit.28, 233ff. It is not hard to show that the convergence of the sequence (Fn) is even uniform in-∞<x<∞.Google Scholar
51. 51.
That is: For every ε>0 there exists δ >0 such that $$|\phi _j^{11}(0) - \phi _j^{11}(t)| < \varepsilon \,if\,|t| < \delta$$ uniformly for j = 1,2,....Google Scholar
52. 52.
We refer to the fundamental paper of C.G. Esseen, Acta Math. 77,1–125 (1944) and to generalizations by E. Hlawka, Monatsh. Math. 55,105-137 (1951).
53. 53.
A good outline of this problem area is in Ju. V. Linnik, Proc. Fourth Berkeley Sympos. Math. Statist, and Prob, Vol. II, pp. 289–306. Univ. California Press, Berkeley, Calif. (1960).Google Scholar
54. 54.
See B.L. van der Waerden, Nieuw Arch. Wisk. 18, 40–45 (1936).Google Scholar
55. 55.
In a quite analogous way one shows that the multinomial distribution (see 37) can be approximated by a (k — l)-dimensional normal distribution. We mention for the sake of completeness that one can arrive at a multi-dimensional Poisson distribution by means of another passage to the limit. (See p. 92.) More precise and general results of this type can be found in M. Fisz, Studia Math. 14, 272–275 (1954).
56. 56.
There are many subtle investigations of this question. We mention only W. Feller, Ann. Math. Statist. 16, 319–329 (1945) and Ann. Math. Statist. 21, 301 (1950). See also Ibragimov, I. A. and Linnik, Ju. V. 1. c. 52.
57. 57.
This is an extension of Formula (39.11). See, for example, Lösch-Schoblik, Die Fakultat und verwandte Funktionen, Teubner, Leipzig 1951, 30.Google Scholar
58. 58.
For this and the following theorems, see H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, Princeton 1946. See also E. Lukacs loc. cit. 47.Google Scholar
59. 59.
F possesse only countably many discontinuities (see p. 33), whence follows the possibility of such a choice for infinitely many ε>0 with ε→>0.Google Scholar
60. 60.
K. Krickeberg, Metrika 10, 179–181 (1966), has pointed out that one can prove the following theorem.
61. 61.
A detailed presentation is J.A. Shohat and J.D. Tamarkin, The Problem of Moments (Mathematical Surveys, Vol. I), Amer. Math. Soc., New York: 1943 and 1950.Google Scholar
62. 62.
H. Hamburger, Math. Z. 4, 186–222 (1919), Math. Ann. 81, 31-45, 235-319 (1920); 82, 120-164,168-187 (1921).
63. 63.
F. Hausdorff, Math. Z. 9, 74–109 (1921). Also see S. Karlin and L.S. Shaple, Geometry of Moment Spaces, Mem. Amer. Math. Soc. No. 12, Providence 1953.