Abstract
Multivariate statistical analysis is concerned with analysing and understanding data in high dimensions. We suppose that we are given a set \(\{x_{i}\}^{n}_{i=1}\) of n observations of a variable vector X in \(\mathbb {R}^{p}\). That is, we suppose that each observation x i has p dimensions:
and that it is an observed value of a variable vector \(X \in \mathbb {R}^{p}\). Therefore, X is composed of p random variables:
where X j , for j=1,…,p, is a one-dimensional random variable. How do we begin to analyse this kind of data? Before we investigate questions on what inferences we can reach from the data, we should think about how to look at the data. This involves descriptive techniques. Questions that we could answer by descriptive techniques are:
-
Are there components of X that are more spread out than others?
-
Are there some elements of X that indicate sub-groups of the data?
-
Are there outliers in the components of X?
-
How “normal” is the distribution of the data?
-
Are there “low-dimensional” linear combinations of X that show “non-normal” behaviour?
Bibliography
ALLBUS (2006). Germany general social survey 1980–2004.
Andrews, D. (1972). Plots of high-dimensional data, Biometrics 28: 125–136.
Chernoff, H. (1973). Using faces to represent points in k-dimensional space graphically, Journal of the American Statistical Association 68: 361–368.
Flury, B. and Riedwyl, H. (1988). Multivariate Statistics, a Practical Approach, Cambridge University Press.
Graham, M. and Kennedy, J. (2003). Using curves to enhance parallel coordinate visualisations, in Information Visualization, 2003. IV 2003. Proceedings. Seventh International Conference on, pp. 10–16.
Härdle, W. (1991). Smoothing Techniques, with Implementations in S, Springer, New York.
Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2003). Non- and Semiparametric Models, Springer, Heidelberg.
Härdle, W. and Scott, D. (1992). Smoothing by weighted averaging of rounded points, Computational Statistics 7: 97–128.
Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air, J. Environ. Economics & Management 5: 81–102.
Hoaglin, W., Mosteller, F. and Tukey, J. (1983). Understanding Robust and Exploratory Data Analysis, Whiley, New York.
Inselberg, A. (1985). A goodness of fit test for binary regression models based on smoohting methods, The Visual Computer 1: 69–91.
Klinke, S. and Polzehl (1995). Implementation of kernel based indices in XGobi, Discussion paper 47, SFB 373, Humboldt-University of Berlin.
Lewin-Koh, N. (2006). Hexagon binnning, Technical report.
Parzen, E. (1962). On estimating of a probability density and mode, Annals of Mathematical Statistics 35: 1065–1076.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics 27: 832–837.
Scott, D. (1985). Averaged shifted histograms: Effective nonparametric density estimation in several dimensions, Annals of Statistics 13: 1024–1040.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Vol. 26 of Monographs on Statistics and Applied Probability, Chapman and Hall, London.
Tufte, E. (1983). The Visual Display of Quantitative Information, Graphics Press.
Whittle, P. (1958). On the smoothing of probability density functions, Journal of the Royal Statistical Society, Series B 55: 549–557.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Härdle, W.K., Simar, L. (2012). Comparison of Batches. In: Applied Multivariate Statistical Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17229-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-17229-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17228-1
Online ISBN: 978-3-642-17229-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)