Skip to main content

Abstract

Multivariate statistical analysis is concerned with analysing and understanding data in high dimensions. We suppose that we are given a set \(\{x_{i}\}^{n}_{i=1}\) of n observations of a variable vector X in \(\mathbb {R}^{p}\). That is, we suppose that each observation x i has p dimensions:

$$x_i = (x_{i1}, x_{i2}, \ldots , x_{ip}),$$

and that it is an observed value of a variable vector \(X \in \mathbb {R}^{p}\). Therefore, X is composed of p random variables:

$$X = (X_{1}, X_{2}, \ldots , X_{p})$$

where X j , for j=1,…,p, is a one-dimensional random variable. How do we begin to analyse this kind of data? Before we investigate questions on what inferences we can reach from the data, we should think about how to look at the data. This involves descriptive techniques. Questions that we could answer by descriptive techniques are:

  • Are there components of X that are more spread out than others?

  • Are there some elements of X that indicate sub-groups of the data?

  • Are there outliers in the components of X?

  • How “normal” is the distribution of the data?

  • Are there “low-dimensional” linear combinations of X that show “non-normal” behaviour?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Bibliography

  • ALLBUS (2006). Germany general social survey 1980–2004.

    Google Scholar 

  • Andrews, D. (1972). Plots of high-dimensional data, Biometrics 28: 125–136.

    Article  Google Scholar 

  • Chernoff, H. (1973). Using faces to represent points in k-dimensional space graphically, Journal of the American Statistical Association 68: 361–368.

    Google Scholar 

  • Flury, B. and Riedwyl, H. (1988). Multivariate Statistics, a Practical Approach, Cambridge University Press.

    Book  Google Scholar 

  • Graham, M. and Kennedy, J. (2003). Using curves to enhance parallel coordinate visualisations, in Information Visualization, 2003. IV 2003. Proceedings. Seventh International Conference on, pp. 10–16.

    Google Scholar 

  • Härdle, W. (1991). Smoothing Techniques, with Implementations in S, Springer, New York.

    Google Scholar 

  • Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2003). Non- and Semiparametric Models, Springer, Heidelberg.

    Google Scholar 

  • Härdle, W. and Scott, D. (1992). Smoothing by weighted averaging of rounded points, Computational Statistics 7: 97–128.

    MathSciNet  MATH  Google Scholar 

  • Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air, J. Environ. Economics & Management 5: 81–102.

    Article  MATH  Google Scholar 

  • Hoaglin, W., Mosteller, F. and Tukey, J. (1983). Understanding Robust and Exploratory Data Analysis, Whiley, New York.

    MATH  Google Scholar 

  • Inselberg, A. (1985). A goodness of fit test for binary regression models based on smoohting methods, The Visual Computer 1: 69–91.

    Article  MATH  Google Scholar 

  • Klinke, S. and Polzehl (1995). Implementation of kernel based indices in XGobi, Discussion paper 47, SFB 373, Humboldt-University of Berlin.

    Google Scholar 

  • Lewin-Koh, N. (2006). Hexagon binnning, Technical report.

    Google Scholar 

  • Parzen, E. (1962). On estimating of a probability density and mode, Annals of Mathematical Statistics 35: 1065–1076.

    Article  MathSciNet  Google Scholar 

  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics 27: 832–837.

    Article  MathSciNet  MATH  Google Scholar 

  • Scott, D. (1985). Averaged shifted histograms: Effective nonparametric density estimation in several dimensions, Annals of Statistics 13: 1024–1040.

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Vol. 26 of Monographs on Statistics and Applied Probability, Chapman and Hall, London.

    MATH  Google Scholar 

  • Tufte, E. (1983). The Visual Display of Quantitative Information, Graphics Press.

    Google Scholar 

  • Whittle, P. (1958). On the smoothing of probability density functions, Journal of the Royal Statistical Society, Series B 55: 549–557.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Härdle, W.K., Simar, L. (2012). Comparison of Batches. In: Applied Multivariate Statistical Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17229-8_1

Download citation

Publish with us

Policies and ethics