Abstract
We have already mentioned in the introduction that the axioms of mathematical probability1 are to be so chosen that they reflect empirical situations when given an appropriate interpretation. We have seen that a characterization of mass phenomena can be given in a certain sense by the empirical probabilities of the events occuring. It is thus desirable to choose the notion of mathematical probability in such a way that the theorems of the mathematical theory yield empirically verifiable facts if the mathematical probability is replaced by the empirical. We then speak briefly of the frequency interpretation of the mathematical theory. The simplest calculation rules of empirical probability are expressed by 1. and 2. (p. 23). These serve as model for the axioms of mathematical probability. In this chapter, we will discuss the most important facts of probability theory. However, we should point out at once that our program is not a complete construction of the theory. Since the main emphasis in this book is on the application of probability in mathematical statistics, many of the important theorems in this chapter will be given without proof.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
We give here a summary of the most important texts on probability theory by means of which the reader can fill in any gaps left here and deepen his knowledge. Bauer, Heinz: Wahrscheinlichkeitstheorie und Grundzuge der Maßtheorie.
All sets here belong to S even when this is not explicitly stated.
Th. Bayes, Philos. Trans. Roy. Soc. 53, 376–398 (1763) and 54, 298–310 (1764).
See p. 5.
Frequently, P(-∞< ξ <x) is defined to be the distribution function of ξ, e.g. by Kolmogorov, l.c. Intro.6
This is precisely the case when F is absolutely continuous, i. e., to each ε>0 there corresponds a δ>0, such that for each finite or countably infinite set of pairwise disjoint intervals (xi,yi), Σ F(yi)-F(xi) <ε if Σ yi-xi <δ.
Properly, it should be referred to as “a” density; however, when no misunderstanding is likely—here and in similar cases—we apply the definite article.
Briefly, we usually write R.-N.-density.
In this case, trivial changes in notation have to be introduced.
Again this is precisely the case when F is absolutely continuous. f is the Radon-Nikodym density relative to n-dimensional Lebesgue measure.
Naturally, L(B) denotes the Lebesgue measure of B.
It is convenient to agree also to write g(ξ) or g(ξ1,...,ξn)
This definition can easily be extended to infinitely many random variables. Cf. the remark following (2.1).
A better terminology would be marginal distribution of (ξ1,...,ξn) relative to (ξ1,...,ξk), but the expression employed here has established itself in the literature.
See also Theorem 17.7.
We will also refer to \( ({\sigma _{ij}})_{1n}^{1n} \) as the covariance matrix of Pξ.
Obviously, each moment of odd order E[(ξ-a)2n+1],n⩾0, of a distribution which is symmetric with respect to a vanishes whenever it exists.
We also say: all versions of P(A ∣ G) differ from each other only on P-null sets.
See for this and related problems D. H. Blackwell, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1954–1955 Vol. II, pp. 1–6, University of California Press, Berkeley and Los Angeles (1956) and D.H. Blackwell and C. Ryll-Nardzewski, Ann. Math. Statist. 34, 223–225 (1963).
In place of PR1(Ay ξ)(z) we also write PR1(Ay ξ = z). See p. 57 and p. 60.
For more general investigations see M. Jifina, Czechosl. Math. J. 4, (79) 372–380 (1954) and Czechosl. Math. J. 9, (84) 445-451 (1959).
This concept is equivalent to that of the conditional probability given a σ-algebra as it is easy to see.
This condition can be dispensed with. See E.L. Lehmann l.e. Not.11 37–38
See II. 12.
Of course, one cannot manage with the form of Theorem 6.1 given here, but requires a generalization of this result to set functions which are not necessarily non-negative. However, this generalization can easily be obtained from Theorem 6.1.
More precisely: If ξ and n are r.v.’s and E(ξ2) and E(n2) exist, then (in the notation of 20) \( {(E(|\xi \eta ||))^2} \le E({\xi ^2}|)E({\eta ^2}|)P - a.e. \).
Thus, in somewhat more general formulation, Theorem 21.1 states that if ξ is a r.v. and E(ξ2) exists, then =\( E(\xi |) \) is the orthogonal projection of ξ onto the set of S-measurable functions.
This Theorem is due to P. Levy: P. Levy, Calcul des Probabilites, Gauthier-Villars et Cie., Paris, 1925, 166ff.
P. Levy, l.c.28 195 ff.
Moreover, it can always be assumed that this function is right-continuous.
From Lemma 23.2 and Lemma 23.1 one can easily infer Theorem 23.4.
This exist for all i,j from Theorem 17.3
Another example is given e.g. on p. 81.
Stieltjes, T.J. Nouv. Ann. Math., ser. 3, 9, 479–480 (1890).
Not all the coefficients in these linear combinations should be zero.
F. R. Helmert, Zeitschrift für Math, und Physik 21,192–219 (1876). K. Pearson, Philos. Mag. 50. Ser. 5, 157–175 (1900). 37 “Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).
“Student”, Biometrika 6, 1–25 (1908), (Student is a pseudonym for W.S. Gosset). R. A. Fisher, Biometrika 10, 507–521 (1915).
This distribution is also named for Snedecor.
R.A. Fisher, Metron 1, 1–32 (1921).
K. Pearson, Philos. Trans. Roy. Soc. London, Ser. A 185, 71–110 (1894).
We will no longer state the intervals over which the densities vanish. The constant C is always to be chosen in such a way that (6.3) holds in each case.
We can also show this without any calculations: Let p1 >p.
This formula already appears in A. Meyer, Vorlesungen uber Wahrscheinlichkeitsrechnung, B.G. Teubner, Leipzig 1879.
For a thorough treatement of limit theorems see B. V. Gnedenko and A. Kolmogoroff, Limit Distributions for Sums of Independent Random Variables, Cambridge, Mass., 1954.
If r-l>M1-M, then \( \left( \begin{array}{l} {M^1} - M \\ \,\,\,\,\,r - l \\ \end{array} \right) = 0. \)
A. J. Hincin, C. R. Acad. Sci., Paris 189, 477–479 (1929).
A systematic treatment of the properties of this notion and of related concepts can be found in E. Lukacs, Stochastic Convergence, Math. Monographs, D. C. Heath, Lexington, Mass. 1968.
This short proof is due to Borges. Theorem 38.3 as well as Theorem 38.4 remain correct if the stochastic convergence to a real number (or to a k-tuple of real numbers) in the assumption and claim is replaced by stochastic convergence to a r.v. See K. Krickeberg, I.e. Not.7. We will make no use of this fact here.
See loc. cit.44 and P. Levy, Theorie de l’addition des variables aleatoires. Gauthier-Villars 2nd Ed., Paris 1954.
P. Levy, loc. cit.28, 233ff. It is not hard to show that the convergence of the sequence (Fn) is even uniform in-∞<x<∞.
That is: For every ε>0 there exists δ >0 such that \( |\phi _j^{11}(0) - \phi _j^{11}(t)| < \varepsilon \,if\,|t| < \delta \) uniformly for j = 1,2,....
We refer to the fundamental paper of C.G. Esseen, Acta Math. 77,1–125 (1944) and to generalizations by E. Hlawka, Monatsh. Math. 55,105-137 (1951).
A good outline of this problem area is in Ju. V. Linnik, Proc. Fourth Berkeley Sympos. Math. Statist, and Prob, Vol. II, pp. 289–306. Univ. California Press, Berkeley, Calif. (1960).
See B.L. van der Waerden, Nieuw Arch. Wisk. 18, 40–45 (1936).
In a quite analogous way one shows that the multinomial distribution (see 37) can be approximated by a (k — l)-dimensional normal distribution. We mention for the sake of completeness that one can arrive at a multi-dimensional Poisson distribution by means of another passage to the limit. (See p. 92.) More precise and general results of this type can be found in M. Fisz, Studia Math. 14, 272–275 (1954).
There are many subtle investigations of this question. We mention only W. Feller, Ann. Math. Statist. 16, 319–329 (1945) and Ann. Math. Statist. 21, 301 (1950). See also Ibragimov, I. A. and Linnik, Ju. V. 1. c. 52.
This is an extension of Formula (39.11). See, for example, Lösch-Schoblik, Die Fakultat und verwandte Funktionen, Teubner, Leipzig 1951, 30.
For this and the following theorems, see H. Cramer, Mathematical Methods of Statistics, Princeton Univ. Press, Princeton 1946. See also E. Lukacs loc. cit. 47.
F possesse only countably many discontinuities (see p. 33), whence follows the possibility of such a choice for infinitely many ε>0 with ε→>0.
K. Krickeberg, Metrika 10, 179–181 (1966), has pointed out that one can prove the following theorem.
A detailed presentation is J.A. Shohat and J.D. Tamarkin, The Problem of Moments (Mathematical Surveys, Vol. I), Amer. Math. Soc., New York: 1943 and 1950.
H. Hamburger, Math. Z. 4, 186–222 (1919), Math. Ann. 81, 31-45, 235-319 (1920); 82, 120-164,168-187 (1921).
F. Hausdorff, Math. Z. 9, 74–109 (1921). Also see S. Karlin and L.S. Shaple, Geometry of Moment Spaces, Mem. Amer. Math. Soc. No. 12, Providence 1953.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1974 Springer-Verlag Berlin · Heidelberg
About this chapter
Cite this chapter
Schmetterer, L. (1974). Introduction to Probability Theory. In: Introduction to Mathematical Statistics. Die Grundlehren der mathematischen Wissenschaften, vol 202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-65542-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-65542-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-65544-9
Online ISBN: 978-3-642-65542-5
eBook Packages: Springer Book Archive