Comparison of Batches

Härdle, Wolfgang Karl; Simar, Léopold

doi:10.1007/978-3-642-17229-8_1

Wolfgang Karl Härdle³ &
Léopold Simar⁴

9312 Accesses

Abstract

Multivariate statistical analysis is concerned with analysing and understanding data in high dimensions. We suppose that we are given a set $\{x_{i}\}^{n}_{i=1}$ of n observations of a variable vector X in $\mathbb {R}^{p}$. That is, we suppose that each observation x _i has p dimensions:

$$x_i = (x_{i1}, x_{i2}, \ldots , x_{ip}),$$

and that it is an observed value of a variable vector $X \in \mathbb {R}^{p}$. Therefore, X is composed of p random variables:

$$X = (X_{1}, X_{2}, \ldots , X_{p})$$

where X _j, for j=1,…,p, is a one-dimensional random variable. How do we begin to analyse this kind of data? Before we investigate questions on what inferences we can reach from the data, we should think about how to look at the data. This involves descriptive techniques. Questions that we could answer by descriptive techniques are:

Are there components of X that are more spread out than others?
Are there some elements of X that indicate sub-groups of the data?
Are there outliers in the components of X?
How “normal” is the distribution of the data?
Are there “low-dimensional” linear combinations of X that show “non-normal” behaviour?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Bibliography

ALLBUS (2006). Germany general social survey 1980–2004.
Google Scholar
Andrews, D. (1972). Plots of high-dimensional data, Biometrics 28: 125–136.
Article Google Scholar
Chernoff, H. (1973). Using faces to represent points in k-dimensional space graphically, Journal of the American Statistical Association 68: 361–368.
Google Scholar
Flury, B. and Riedwyl, H. (1988). Multivariate Statistics, a Practical Approach, Cambridge University Press.
Book Google Scholar
Graham, M. and Kennedy, J. (2003). Using curves to enhance parallel coordinate visualisations, in Information Visualization, 2003. IV 2003. Proceedings. Seventh International Conference on, pp. 10–16.
Google Scholar
Härdle, W. (1991). Smoothing Techniques, with Implementations in S, Springer, New York.
Google Scholar
Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2003). Non- and Semiparametric Models, Springer, Heidelberg.
Google Scholar
Härdle, W. and Scott, D. (1992). Smoothing by weighted averaging of rounded points, Computational Statistics 7: 97–128.
MathSciNet MATH Google Scholar
Harrison, D. and Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air, J. Environ. Economics & Management 5: 81–102.
Article MATH Google Scholar
Hoaglin, W., Mosteller, F. and Tukey, J. (1983). Understanding Robust and Exploratory Data Analysis, Whiley, New York.
MATH Google Scholar
Inselberg, A. (1985). A goodness of fit test for binary regression models based on smoohting methods, The Visual Computer 1: 69–91.
Article MATH Google Scholar
Klinke, S. and Polzehl (1995). Implementation of kernel based indices in XGobi, Discussion paper 47, SFB 373, Humboldt-University of Berlin.
Google Scholar
Lewin-Koh, N. (2006). Hexagon binnning, Technical report.
Google Scholar
Parzen, E. (1962). On estimating of a probability density and mode, Annals of Mathematical Statistics 35: 1065–1076.
Article MathSciNet Google Scholar
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics 27: 832–837.
Article MathSciNet MATH Google Scholar
Scott, D. (1985). Averaged shifted histograms: Effective nonparametric density estimation in several dimensions, Annals of Statistics 13: 1024–1040.
Article MathSciNet MATH Google Scholar
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Vol. 26 of Monographs on Statistics and Applied Probability, Chapman and Hall, London.
MATH Google Scholar
Tufte, E. (1983). The Visual Display of Quantitative Information, Graphics Press.
Google Scholar
Whittle, P. (1958). On the smoothing of probability density functions, Journal of the Royal Statistical Society, Series B 55: 549–557.
Google Scholar

Download references

Author information

Authors and Affiliations

L.v.Bortkiewicz Chair of Statistics, C.A.S.E. Centre f. Appl. Stat. & Econ., School of Business and Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Inst. Statistics, Center of Operations Research & Econometrics (CORE), Katholieke Universiteit Leuven, Leuven, Belgium
Léopold Simar

Authors

Wolfgang Karl Härdle
View author publications
You can also search for this author in PubMed Google Scholar
Léopold Simar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Härdle, W.K., Simar, L. (2012). Comparison of Batches. In: Applied Multivariate Statistical Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17229-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-17229-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17228-1
Online ISBN: 978-3-642-17229-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics