Measuring and Visualizing Associations
One goal of statistical studies is to highlight associations between pairs of variables. This is particularly useful when one wants to get a clear picture of a multi-dimensional data set and motivate a specific policy intervention (Sect. 4.1). Yet, the choice of a method is not straightforward. Testing for correlation is the relevant approach to investigate a linear association between two numerical variables (Sect. 4.2). The chi-square test is an inferential test that uses data from a sample to make conclusions about the relationship between two categorical variables (Sect. 4.3). When one variable is numerical and the other is categorical, the usual approach is to test for differences between means or to implement an analysis of variance (Sect. 4.4). When faced with more than two variables, it is also possible to provide a multidimensional representation of the problem using methods such as principal component analysis (Sect. 4.5) and multiple correspondence analysis (Sect. 4.6). The idea is to reduce the dimensionality of a data set by plotting all the observations on 2D graphs describing how observations cluster with respect to various characteristics. These groups can for instance serve to identify the beneficiaries of a particular intervention. Using R-CRAN, several examples are included in this chapter to illustrate the different methods.
KeywordsAssociation Correlation Chi-square test ANOVA Principal component analysis Factor analysis
- Giudici, P. (2005). Applied data mining: Statistical methods for business and industry. New York: Wiley.Google Scholar
- Lang, T. A., & Secic, M. (2006). How to report statistics in medicine: Annotated guidelines for authors, editors, and reviewers. Philadelphia, PA: ACP.Google Scholar
- Pearson, K. (1906). On certain points connected with scale order in the case of a correlation of two characters which for some arrangement give a linear regression line. Biometrika, 5, 176–178.Google Scholar
- Rosenthal, G., & Rosenthal, J. A. (2011). Statistics and data interpretation for social work. New York: Springer.Google Scholar
- Tufféry, S. (2011). Data mining and statistics for decision making. Wiley.Google Scholar