Introduction to Probability Theory

Part of the Die Grundlehren der mathematischen Wissenschaften book series (GL, volume 202)


We have already mentioned in the introduction that the axioms of mathematical probability1 are to be so chosen that they reflect empirical situations when given an appropriate interpretation. We have seen that a characterization of mass phenomena can be given in a certain sense by the empirical probabilities of the events occuring. It is thus desirable to choose the notion of mathematical probability in such a way that the theorems of the mathematical theory yield empirically verifiable facts if the mathematical probability is replaced by the empirical. We then speak briefly of the frequency interpretation of the mathematical theory. The simplest calculation rules of empirical probability are expressed by 1. and 2. (p. 23). These serve as model for the axioms of mathematical probability. In this chapter, we will discuss the most important facts of probability theory. However, we should point out at once that our program is not a complete construction of the theory. Since the main emphasis in this book is on the application of probability in mathematical statistics, many of the important theorems in this chapter will be given without proof.


Characteristic Function Probability Theory Conditional Probability Pairwise Disjoint Independent Random Variable 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    We give here a summary of the most important texts on probability theory by means of which the reader can fill in any gaps left here and deepen his knowledge. Bauer, Heinz: Wahrscheinlichkeitstheorie und Grundzuge der Maßtheorie.Google Scholar
  2. 2.
    All sets here belong to S even when this is not explicitly stated.Google Scholar
  3. 3.
    Th. Bayes, Philos. Trans. Roy. Soc. 53, 376–398 (1763) and 54, 298–310 (1764).Google Scholar
  4. 4.
  5. 5.
    Frequently, P(-∞< ξ <x) is defined to be the distribution function of ξ, e.g. by Kolmogorov, l.c. Intro.6 Google Scholar
  6. 6.
    This is precisely the case when F is absolutely continuous, i. e., to each ε>0 there corresponds a δ>0, such that for each finite or countably infinite set of pairwise disjoint intervals (xi,yi), Σ F(yi)-F(xi) <ε if Σ yi-xi <δ.Google Scholar
  7. 7.
    Properly, it should be referred to as “a” density; however, when no misunderstanding is likely—here and in similar cases—we apply the definite article.Google Scholar
  8. 8.
    Briefly, we usually write R.-N.-density.Google Scholar
  9. 9.
    In this case, trivial changes in notation have to be introduced.Google Scholar
  10. 10.
    Again this is precisely the case when F is absolutely continuous. f is the Radon-Nikodym density relative to n-dimensional Lebesgue measure.Google Scholar
  11. 11.
    Naturally, L(B) denotes the Lebesgue measure of B.Google Scholar
  12. 12.
    It is convenient to agree also to write g(ξ) or g1,...,ξn)Google Scholar
  13. 13.
    This definition can easily be extended to infinitely many random variables. Cf. the remark following (2.1).Google Scholar
  14. 14.
    A better terminology would be marginal distribution of (ξ1,...,ξn) relative to (ξ1,...,ξk), but the expression employed here has established itself in the literature.Google Scholar
  15. 15.
    See also Theorem 17.7.Google Scholar
  16. 16.
    We will also refer to \( ({\sigma _{ij}})_{1n}^{1n} \) as the covariance matrix of Pξ.Google Scholar
  17. 17.
    Obviously, each moment of odd order E[(ξ-a)2n+1],n⩾0, of a distribution which is symmetric with respect to a vanishes whenever it exists.Google Scholar
  18. 18.
    We also say: all versions of P(A ∣ G) differ from each other only on P-null sets.Google Scholar
  19. 19.
    See for this and related problems D. H. Blackwell, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1954–1955 Vol. II, pp. 1–6, University of California Press, Berkeley and Los Angeles (1956) and D.H. Blackwell and C. Ryll-Nardzewski, Ann. Math. Statist. 34, 223–225 (1963).Google Scholar
  20. 20.
    In place of PR1(Ay ξ)(z) we also write PR1(Ay ξ = z). See p. 57 and p. 60.Google Scholar
  21. 21.
    For more general investigations see M. Jifina, Czechosl. Math. J. 4, (79) 372–380 (1954) and Czechosl. Math. J. 9, (84) 445-451 (1959).Google Scholar
  22. 22.
    This concept is equivalent to that of the conditional probability given a σ-algebra as it is easy to see.Google Scholar
  23. 23.
    This condition can be dispensed with. See E.L. Lehmann l.e. Not.11 37–38Google Scholar
  24. 24.
    See II. 12.Google Scholar
  25. 25.
    Of course, one cannot manage with the form of Theorem 6.1 given here, but requires a generalization of this result to set functions which are not necessarily non-negative. However, this generalization can easily be obtained from Theorem 6.1.Google Scholar
  26. 26.
    More precisely: If ξ and n are r.v.’s and E(ξ2) and E(n2) exist, then (in the notation of 20) \( {(E(|\xi \eta ||))^2} \le E({\xi ^2}|)E({\eta ^2}|)P - a.e. \).Google Scholar
  27. 27.
    Thus, in somewhat more general formulation, Theorem 21.1 states that if ξ is a r.v. and E(ξ2) exists, then =\( E(\xi |) \) is the orthogonal projection of ξ onto the set of S-measurable functions.Google Scholar
  28. 28.
    This Theorem is due to P. Levy: P. Levy, Calcul des Probabilites, Gauthier-Villars et Cie., Paris, 1925, 166ff.Google Scholar
  29. 29.
    P. Levy, l.c.28 195 ff.Google Scholar
  30. 30.
    Moreover, it can always be assumed that this function is right-continuous.Google Scholar
  31. 31.
    From Lemma 23.2 and Lemma 23.1 one can easily infer Theorem 23.4.Google Scholar
  32. 32.
    This exist for all i,j from Theorem 17.3Google Scholar
  33. 33.
    Another example is given e.g. on p. 81.Google Scholar
  34. 34.
    Stieltjes, T.J. Nouv. Ann. Math., ser. 3, 9, 479–480 (1890).Google Scholar
  35. 35.
    Not all the coefficients in these linear combinations should be zero.Google Scholar
  36. 36.
    F. R. Helmert, Zeitschrift für Math, und Physik 21,192–219 (1876). K. Pearson, Philos. Mag. 50. Ser. 5, 157–175 (1900). 37 “Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).Google Scholar
  37. 37.
    “Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).Google Scholar
  38. 38.
    This distribution is also named for Snedecor.Google Scholar
  39. 39.
    R.A. Fisher, Metron 1, 1–32 (1921).Google Scholar
  40. 40.
    K. Pearson, Philos. Trans. Roy. Soc. London, Ser. A 185, 71–110 (1894).zbMATHCrossRefGoogle Scholar
  41. 41.
    We will no longer state the intervals over which the densities vanish. The constant C is always to be chosen in such a way that (6.3) holds in each case.Google Scholar
  42. 42.
    We can also show this without any calculations: Let p1 >p.Google Scholar
  43. 43.
    This formula already appears in A. Meyer, Vorlesungen uber Wahrscheinlichkeitsrechnung, B.G. Teubner, Leipzig 1879.Google Scholar
  44. 44.
    For a thorough treatement of limit theorems see B. V. Gnedenko and A. Kolmogoroff, Limit Distributions for Sums of Independent Random Variables, Cambridge, Mass., 1954.Google Scholar
  45. 45.
    If r-l>M1-M, then \( \left( \begin{array}{l} {M^1} - M \\ \,\,\,\,\,r - l \\ \end{array} \right) = 0. \) Google Scholar
  46. 46.
    A. J. Hincin, C. R. Acad. Sci., Paris 189, 477–479 (1929).Google Scholar
  47. 47.
    A systematic treatment of the properties of this notion and of related concepts can be found in E. Lukacs, Stochastic Convergence, Math. Monographs, D. C. Heath, Lexington, Mass. 1968.Google Scholar
  48. 48.
    This short proof is due to Borges. Theorem 38.3 as well as Theorem 38.4 remain correct if the stochastic convergence to a real number (or to a k-tuple of real numbers) in the assumption and claim is replaced by stochastic convergence to a r.v. See K. Krickeberg, I.e. Not.7. We will make no use of this fact here.Google Scholar
  49. 49.
    See loc. cit.44 and P. Levy, Theorie de l’addition des variables aleatoires. Gauthier-Villars 2nd Ed., Paris 1954.Google Scholar
  50. 50.
    P. Levy, loc. cit.28, 233ff. It is not hard to show that the convergence of the sequence (Fn) is even uniform in-∞<x<∞.Google Scholar
  51. 51.
    That is: For every ε>0 there exists δ >0 such that \( |\phi _j^{11}(0) - \phi _j^{11}(t)| < \varepsilon \,if\,|t| < \delta \) uniformly for j = 1,2,....Google Scholar
  52. 52.
    We refer to the fundamental paper of C.G. Esseen, Acta Math. 77,1–125 (1944) and to generalizations by E. Hlawka, Monatsh. Math. 55,105-137 (1951).MathSciNetCrossRefGoogle Scholar
  53. 53.
    A good outline of this problem area is in Ju. V. Linnik, Proc. Fourth Berkeley Sympos. Math. Statist, and Prob, Vol. II, pp. 289–306. Univ. California Press, Berkeley, Calif. (1960).Google Scholar
  54. 54.
    See B.L. van der Waerden, Nieuw Arch. Wisk. 18, 40–45 (1936).Google Scholar
  55. 55.
    In a quite analogous way one shows that the multinomial distribution (see 37) can be approximated by a (k — l)-dimensional normal distribution. We mention for the sake of completeness that one can arrive at a multi-dimensional Poisson distribution by means of another passage to the limit. (See p. 92.) More precise and general results of this type can be found in M. Fisz, Studia Math. 14, 272–275 (1954).MathSciNetGoogle Scholar
  56. 56.
    There are many subtle investigations of this question. We mention only W. Feller, Ann. Math. Statist. 16, 319–329 (1945) and Ann. Math. Statist. 21, 301 (1950). See also Ibragimov, I. A. and Linnik, Ju. V. 1. c. 52.MathSciNetzbMATHCrossRefGoogle Scholar
  57. 57.
    This is an extension of Formula (39.11). See, for example, Lösch-Schoblik, Die Fakultat und verwandte Funktionen, Teubner, Leipzig 1951, 30.Google Scholar
  58. 58.
    For this and the following theorems, see H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, Princeton 1946. See also E. Lukacs loc. cit. 47.Google Scholar
  59. 59.
    F possesse only countably many discontinuities (see p. 33), whence follows the possibility of such a choice for infinitely many ε>0 with ε→>0.Google Scholar
  60. 60.
    K. Krickeberg, Metrika 10, 179–181 (1966), has pointed out that one can prove the following theorem.MathSciNetzbMATHCrossRefGoogle Scholar
  61. 61.
    A detailed presentation is J.A. Shohat and J.D. Tamarkin, The Problem of Moments (Mathematical Surveys, Vol. I), Amer. Math. Soc., New York: 1943 and 1950.Google Scholar
  62. 62.
    H. Hamburger, Math. Z. 4, 186–222 (1919), Math. Ann. 81, 31-45, 235-319 (1920); 82, 120-164,168-187 (1921).MathSciNetCrossRefGoogle Scholar
  63. 63.
    F. Hausdorff, Math. Z. 9, 74–109 (1921). Also see S. Karlin and L.S. Shaple, Geometry of Moment Spaces, Mem. Amer. Math. Soc. No. 12, Providence 1953.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1974

Authors and Affiliations

  1. 1.University of ViennaAustria

Personalised recommendations