Abstract
The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples.
Similar content being viewed by others
References
Becker, R. A., Cleveland, W. S. and Shyu, M.-J. (1996) A Tour of Trellis Graphics. Technical Report. Statistics Research Department, Bell Laboratories, Murray Hill, New Jersey, USA.
Bellman, R. E. (1961) Adaptive Control Processes. Princeton University Press, Princeton, NJ.
Bowman, A. W. (1985) A comparative study of some kernel-based nonparametric density estimates. Journal of Statistics and Computer Simulation, 21, 313–327.
Bowman, A. W. and Foster, P. J. (1993) Density based exploration of bivariate data. Statistics and Computing, 3, 171–177.
Cook, D., Buja, A. and Cabrera, J. (1993) Projection pursuit indicies based on expansions with orthonormal functions. Journal of Computing and Graph Statistics, 2, 225–250.
Davenport, M. and Studdert-Kennedy, G. (1972) The statistical analysis of aesthetic judgement: an exploration. Applied Statistics, 21, 324–333.
Friedman, J. H. (1987) Exploratory projection pursuit. Journal of the American Statistical Association, 82, 249–266.
Hall, P., Watson, G. S. and Cabrera, J. (1987) Kernal density estimation with spherical data. Biometrika, 74, 751–762.
Hartigan, J. A. (1977) Clusters as modes, in First International Symposium on Data Analysis and Informatics, Vol. 2, IRIA, Versailles.
Huber, P. J. (1985) Projection pursuit. Annals of Statistics, 13, 435–475.
Jolliffe, I. T. (1986) Principal Component Analysis. Springer-Verlag, New York.
Jones, M. C. and Sibson, R. (1987) What is projection pursuit? Journal of the Royal Statistical Society. Series A, 150, 1–36.
Mardia, K. V. (1972) Statistics of Directional Data. London, Academic Press.
Nason, G. (1995) Three-dimensional projection pursuit. Applied Statistics, 44, 411–430.
Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice and Visualisation. Wiley, New York.
Scott, D. W. and Factor, L. E. (1981) Monte Carlo study of the three data-based nonparametric density estimators. Journal of the American Statistical Association, 76, 9–15.
Sneath, P. H. A. (1957) The application of computers to taxonomy. Journal of General Microbiology, 17, 201–226.
Swayne, D. F. and Cook, D. (1990) XGobi. Available from the StatLib via anonymous ftp from lib. stat. cmu. edu.
Swayne, D. F., Cook, D. and Buja, A. (1991) User's manual for XGobi, a dynamic graphic program for data analysis implemented in the X window system (release 2). Available from the StatLib archive via anonymous ftp from lib. stat. CMU. edu.
Tukey, P. A. and Tukey, J. W. (1981) Preparation; prechosen sequences of views, in Interpreting Multivariate Data Barnett. V. (ed.), Wiley, Chichester, pp. 189–213.
Wand, M. P. and Jones, M. C. (1994) Multivariate plug-in bandwidth selection. Computational Statistics, 9, 97–116.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
FOSTER, P. Exploring multivariate data using directions of high density. Statistics and Computing 8, 347–355 (1998). https://doi.org/10.1023/A:1008828723097
Issue Date:
DOI: https://doi.org/10.1023/A:1008828723097