# Theory of Regression and the Sampling Theory of Multidimensional Normal Distributions

Chapter
Part of the Die Grundlehren der mathematischen Wissenschaften book series (GL, volume 202)

## Abstract

Let ξ p+11,...,ξ p+1n , p⩽1 np+2 be r.v.’s possessing the following properties: E p+1i ) exists for 1⩾in and
$$E({\xi _{p + 1i}}) = {\beta _0} + {\beta _1}{x_{1i}} + ... + {\beta _p}{x_{pi.}}$$
(1.1)
Further let the covariance matrix of (ξ p+11,...,ξ p+1n ) exist and be denoted by U = (u ij )$$_{1n}^{1n}$$. Here, the x ji, 1⩾jp, 1⩾in are given real numbers and the βi0⩾ip, as well as the u ij, 1⩾i, jn, real parameters. The βi satisfy -<∞βi<∞ and the u ij need only satisfy the trivial restriction that U be positive semi-definite. To be more precise, we should denote the right side of (1.1) by E p+1i0,...,βp) or even by E p+1i0,...,βp, uij, 1⩾i,jn) but the abbreviated notation should cause no misunderstanding. Our task is to construct unbiased estimates for each βi0⩾ip. In order to bring this problem into the general framework of V.1, we let the sample space be given by $$({R_n},{\mathfrak{B}_n})$$ and the set of joint distributions of (ξ p+11,...,ξ p+1n ) be so restricted by the parameters β0,...,βp, uij,1⩾i,jn that (1.1) holds and (uij)$$_{1n}^{1n}$$ n is positive semi-definite. To obtain the estimates we make use of Gauss’ method of least squares, which is closely connected with the MLP.

## Keywords

Covariance Matrix Discriminant Function Joint Distribution Conditional Distribution Unbiased Estimate
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
The bj provided they exist, will in general also depend on the xq, xp. Since, however, we view these n-tuples here as given, we suppress this dependence.Google Scholar
2. 2.
From the extensive literature on the method of least squares we indicate only: A. C. Aitken, Proc. Roy. Soc. Edinburgh Sect. A, 55, 42–48 (1935), B.J. van Ijzeren, Statistica Rijswijk 8, 21–45 (1954), O. Kempthorne.Google Scholar
3. 3.
See B.J. van Ijzeren, l.c.2.Google Scholar
4. 4.
See A. N. Kolmogorov, Uspehi. Mat. Nauk 1, 57–70 (1946).
5. 5.
A. Markov, Wahrscheinlichkeitsrechnung, 2nd ed., Leipzig-Berlin 1912. Also see F.N. David and J. Neyman, l.c. V.1 and L. Schmetterer, l.c. V.5, second paper listed.Google Scholar
6. 6.
It should cause no difficulty when the symbol 0 denotes both the (p + 1)-dimensional null-vector and zero itself.Google Scholar
7. 7.
See H. Scheffe, l.c. III72 Google Scholar
8. 8.
9. 9.
The r.v. M (p+1) as well as M(p+1) and M(p+1)/M1 (p+1) are undefined on a set of probability zero, i.e., on the set where the denominator in the definition vanishes.Google Scholar
10. 10.
This distribution was first found by J. Wishart, Biometrika 20A, 32–52 (1928). The induction proof here is due essentially to P. L. Hsu, Proc. Cambridge Philos. Soc. 35, 336–338 (1939).Google Scholar
11. 11.
Note that here, as was not the case in 1 and 2, the symbols xi and ξidenote k-dimensional vectors.Google Scholar
12. 12.
C.R. Rao, Sankhya 9, 343–366 (1949).
13. 13.
The matrix $$({W_{ij}})_{1k}^{1k}$$ exists iff $$|{w_{ij}}|_{1k}^{1k} \ne 0.$$. The matrix $$({w_{ij}})_{1k}^{1k}$$ is positive-semi-definite.Google Scholar
14. 14.
This distribution was first discovered by H. Hotelling, Ann. Math. Statist. 2, 360–378 (1931). Also see H. Hotelling, Proceedings of 2nd Berkeley Symposium on Mathematical Statistics and Probability 1951, pp. 23–41 University of California Press, Berkeley and Los Angeles.Google Scholar
15. 15.
A comparison with the F-distribution shows that $$\frac{{n - k}}{{\left( {n - 1} \right)k}}{T^2}$$ possesses an F-distribution with (k, n-k) degrees of freedom.Google Scholar
16. 16.
$$({d_{ij}})_{1k}^{1k}$$ denotes t n e inverse of the covariance matrix D-1.Google Scholar
17. 17.
R.A. Fisher, Ann. Eugenics 7, 179–188 (1936); P.C. Mahalanobis, Proc. Nat. Inst. Sci. India 2, 49–55 (1936).
18. 18.
We refer only to P. C. Mahalanobis, Sankhya 9,237–239 (1949) and papers by C. R. Rao, such as Biometrika 35, 58–79 (1948); Sankhya, l.c.12; Sankhya 10, 257–268 (1950).Google Scholar
19. 19.
20. 20.
The notation here is so chosen that Xj is a k-dimensional vector but xj is the real number $$\frac{1}{{{n_1}}}\sum\limits_{i = 1}^{{n_1}} {{x_{ji}}}$$ and similarly for the yji.Google Scholar
21. 21.
22. 22.
That is the regression function of ξp+1 w.r.t.(ξ1,...,ξp), say.Google Scholar
23. 23.
G.U. Yule, Proc. Roy. Soc. London Ser. A, 79, 182–193 (1907).
24. 24.
It would perhaps be more consistent in the sense of the notation used in 1 and 2 to write bp+11.2...p instead of bp+11.2...p.Google Scholar
25. 25.
This terminology clearly points out the fact that one can calculate these quantities from a sample.Google Scholar
26. 26.
See M. S. Bartlett, Proc. Roy. Soc. Edinburgh Sect. A, 53, 260–283 (1932–1933). For a extension of this method see R.A. Wijsman, Ann. Math. Statist. 28, 415–422 (1957) and A.M. Kshirsagar, Ann. Math. Statist. 30, 239–241 (1959).Google Scholar
27. 27.
The symbol (ui.k...1) stands for the (p-K+1)-dimensional r.v. (up+1.k...1,...,uk+1.k...1).Google Scholar
28. 28.
First found by V. Romanovskij, Bull. Acad. Sci. Leningrad (6) 20, 643–648 (1926) and K. Pearson, Proc. Roy. Soc. London Ser. A, 112,1–14 (1926).Google Scholar
29. 29.
It is not entirely correct to use the symbol K2 p+1 here, but this should cause no confusion.Google Scholar
30. 30.
This distribution was first found by R.A. Fisher, Proc. Roy. Soc. London Ser. A, 121, 654–673 (1928).
31. 31.
R.A. Fisher pointed this out in this and other connections. See R.A. Fisher, Metron 3, 90–104 (1925).Google Scholar
32. 32.
R.A. Fisher, Biometrika 10, 507–521 (1915).Google Scholar
33. 33.
A.T. James, Ann. Math. Statist. 25, 40–75 (1954); A.G. Constantine and A.T. James, Ann. Math. Statist. 29, 1146–1166 (1958); A.T. James, Ann. Math. Statist. 31, 151–158 (1960) and A.T. James, Ann. Math. Statist. 32, 874–882 (1961), and others.