Measuring and Visualizing Associations

  • Jean-Michel Josselin
  • Benoît Le Maux


One goal of statistical studies is to highlight associations between pairs of variables. This is particularly useful when one wants to get a clear picture of a multi-dimensional data set and motivate a specific policy intervention (Sect. 4.1). Yet, the choice of a method is not straightforward. Testing for correlation is the relevant approach to investigate a linear association between two numerical variables (Sect. 4.2). The chi-square test is an inferential test that uses data from a sample to make conclusions about the relationship between two categorical variables (Sect. 4.3). When one variable is numerical and the other is categorical, the usual approach is to test for differences between means or to implement an analysis of variance (Sect. 4.4). When faced with more than two variables, it is also possible to provide a multidimensional representation of the problem using methods such as principal component analysis (Sect. 4.5) and multiple correspondence analysis (Sect. 4.6). The idea is to reduce the dimensionality of a data set by plotting all the observations on 2D graphs describing how observations cluster with respect to various characteristics. These groups can for instance serve to identify the beneficiaries of a particular intervention. Using R-CRAN, several examples are included in this chapter to illustrate the different methods.


Association Correlation Chi-square test ANOVA Principal component analysis Factor analysis 


  1. Galton, F. (1877). Typical laws of heredity. Nature, 15, 492–495.CrossRefGoogle Scholar
  2. Galton, F. (1889). Natural inheritance. London: Macmillan.CrossRefGoogle Scholar
  3. Giudici, P. (2005). Applied data mining: Statistical methods for business and industry. New York: Wiley.Google Scholar
  4. Lang, T. A., & Secic, M. (2006). How to report statistics in medicine: Annotated guidelines for authors, editors, and reviewers. Philadelphia, PA: ACP.Google Scholar
  5. MacDonell, W. R. (1902). On criminal anthropometry and the identification of criminals. Biometrika, 1, 177–227.CrossRefGoogle Scholar
  6. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series, 5, 157–175.CrossRefGoogle Scholar
  7. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series, 6, 559–572.CrossRefGoogle Scholar
  8. Pearson, K. (1906). On certain points connected with scale order in the case of a correlation of two characters which for some arrangement give a linear regression line. Biometrika, 5, 176–178.Google Scholar
  9. Rosenthal, G., & Rosenthal, J. A. (2011). Statistics and data interpretation for social work. New York: Springer.Google Scholar
  10. Tufféry, S. (2011). Data mining and statistics for decision making. Wiley.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jean-Michel Josselin
    • 1
  • Benoît Le Maux
    • 1
  1. 1.Faculty of EconomicsUniversity of Rennes 1RennesFrance

Personalised recommendations