Data Exploration

Part of the Use R! book series (USE R)

Abstract

In this chapter, we assume that the scientific question of interest has been clearly defined, the study has been designed, and the data have been collected from randomly selected members of the population. Our objective is then to obtain a high-level understanding of the data through summary statistics and data visualization techniques. The focus of this chapter is on exploring one variable at a time regardless of any possible relationships between those variables. (Exploring the relationships among variables is discussed in the next chapter.) Here, we discuss different variable types. More specifically, we divide variables into categorical and numerical. Distinguishing these two types of variables is important because the summary statistics and data visualization techniques appropriate for a variable usually depend on the type of that variable. This chapter also provides some discussion on data preprocessing.

References

  1. 9.
    Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, 1st edn. Chapman & Hall Statistics Texts. Chapman and Hall/CRC, London (1993) Google Scholar
  2. 12.
    Houchens, R.L., Schoeps, N.: Comparison of hospital length of stay between two insurers for patients with pediatric asthma. In: Peck, L.H.R., Goodman, A. (eds.) Statistical Case Studies: A Collaboration Between Academe and Industry, pp. 45–64. The American Statistical Society, and the Society for Industrial and Applied Mathematics, Philadelphia (1998) Google Scholar
  3. 18.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, New York (2002) MATHGoogle Scholar
  4. 32.
    Sturges, H.A.: The choice of a class interval. Am. Stat. Assoc. 21, 65–66 (1926) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of California, IrvineIrvineUSA

Personalised recommendations