Biostatistics with R

Part of the series Use R! pp 17-59

Data Exploration

  • Babak ShahbabaAffiliated withDepartment of Statistics, University of California, Irvine Email author 

* Final gross prices may vary according to local VAT.

Get Access


In this chapter, we assume that the scientific question of interest has been clearly defined, the study has been designed, and the data have been collected from randomly selected members of the population. Our objective is then to obtain a high-level understanding of the data through summary statistics and data visualization techniques. The focus of this chapter is on exploring one variable at a time regardless of any possible relationships between those variables. (Exploring the relationships among variables is discussed in the next chapter.) Here, we discuss different variable types. More specifically, we divide variables into categorical and numerical. Distinguishing these two types of variables is important because the summary statistics and data visualization techniques appropriate for a variable usually depend on the type of that variable. This chapter also provides some discussion on data preprocessing.