Abstract
In recent years, with our increased ability to collect and store data, have come enormous datasets. These datasets may consist of billions of observations and millions of variables. Some of the classical methods of statistical inference, in which a parametric model is studied, are neither feasible nor relevant for analysis of these datasets. The objective is to identify interesting structures in the data, such as clusters of observations, or relationships among the variables. Sometimes, the structures allow a reduction in the dimensionality of the data. Many of the classical methods of multivariate analysis, such as principal components analysis, factor analysis, canonical correlations analysis, and multidimensional scaling, are useful in identifying interesting structures. These methods generally attempt to combine variables in such a way as to preserve information yet reduce the dimension of the dataset. Dimension reduction generally carries a loss of some information. Whether the lost information is important is the major concern in dimension reduction. Another set of methods for reducing the complexity of a dataset attempts to group observations together, combining observations, as it were.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag New York
About this chapter
Cite this chapter
Gentle, J.E. (2009). Tools for Identification of Structure in Data. In: Computational Statistics. Statistics and Computing. Springer, New York, NY. https://doi.org/10.1007/978-0-387-98144-4_9
Download citation
DOI: https://doi.org/10.1007/978-0-387-98144-4_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-98143-7
Online ISBN: 978-0-387-98144-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)