Counterfactual Causal Analysis and Nonlinear Probability Models

  • Richard BreenEmail author
  • Kristian Bernt Karlson
Part of the Handbooks of Sociology and Social Research book series (HSSR)


Nonlinear probability models, such as logits and probits for binary dependent variables, the ordered logit and ordered probit for ordinal dependent variables and the multinomial logit, together with log-linear models for contingency tables, have become widely used by social scientists in the past 30 years. In this chapter, we show that the identification and estimation of causal effects using these models present severe challenges, over and above those usually encountered in identifying causal effects in a linear setting. These challenges are derived from the lack of separate identification of the mean and variance in these models. We show their impact in experimental and observational studies, and we investigate the problems that arise in the use of standard approaches to the causal analysis of nonexperimental data, such as propensity scores, instrumental variables, and control functions. Naive use of these approaches with nonlinear probability models will yield biased estimates of causal effects, though the estimates will be a lower bound of the true causal effect and will have the correct sign. We show that the technique of Y-standardization brings the parameters of nonlinear probability models on a scale that we can meaningfully interpret but cannot measure. Other techniques, such as average partial effects, can yield causal effects on the probability scale, but, in this case, the linear probability model provides a simple and effective alternative.


Propensity Score Causal Effect Potential Outcome Multinomial Logit Model Latent Variable Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Achen, C. H. (1977). Measuring representation: Perils of the correlation coefficient. American Journal of Political Science, 21, 805–821.CrossRefGoogle Scholar
  2. Allison, P. D. (1999). Comparing logit and probit coefficients across groups. Sociological Methods & Research, 28, 186–208.CrossRefGoogle Scholar
  3. Amemiya, T. (1975). Qualitative response models. Annals of Economic and Social Measurement, 4, 363–388.Google Scholar
  4. Angrist, J. D., & Pischke, J.-S. (2008). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.Google Scholar
  5. Blalock, H. M. (1967a). Path coefficients versus regression coefficients. The American Journal of Sociology, 72, 675–676.CrossRefGoogle Scholar
  6. Blalock, H. M. (1967b). Causal inference, closed populations, and measures of association. American Political Science Review, 61, 130–136.CrossRefGoogle Scholar
  7. Blundell, R., Dearden, L., & Sianesi, B. (2005). Evaluating the effect of education on earnings: Models, methods and results from the National Child Development Survey. Journal of the Royal Statistical Society, Series A, 168, 473–512.Google Scholar
  8. Breen, R., Karlson, K. B., & Holm, A. (2012). Correlations and non-linear probability models. Unpublished paper.Google Scholar
  9. Cameron, S. V., & Heckman, J. J. (1998). Life cycle schooling and dynamic selection bias: Models and evidence for five cohorts of American males. Journal of Political Economy, 106, 262–333.CrossRefGoogle Scholar
  10. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.Google Scholar
  11. Cox, D. R. (1958). Planning of experiments. New York: Wiley.Google Scholar
  12. Cramer, J. S. (2007). Robustness of logit analysis: Unobserved heterogeneity and mis-specified disturbances. Oxford Bulletin of Economics and Statistics, 69, 545–555.CrossRefGoogle Scholar
  13. Fienberg, S. E. (1977). The analysis of cross-classified categorical data. Cambridge, MA: MIT Press.Google Scholar
  14. Fisher, R. A. (1932). Statistical methods for research workers. Edinburgh: Oliver and Boyd.Google Scholar
  15. Gail, M. H. (1986). Adjusting for covariates that have the same distribution in exposed and unexposed cohorts. In S. H. Moolgavkar & R. L. Prentice (Eds.), Modern statistical methods in chronic disease epidemiology (pp. 3–18). New York: Wiley.Google Scholar
  16. Gail, M. H., Wieand, S., & Piantdosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71, 431–444.CrossRefGoogle Scholar
  17. Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology, 36, 21–48.CrossRefGoogle Scholar
  18. Hauck, W. W., Neuhaus, J. M., Kalbfleisch, J. D., & Anderson, S. (1991). A consequence of omitted covariates when estimating odds ratios. Journal of Clinical Epidemiology, 44, 77–81.CrossRefGoogle Scholar
  19. Heckman, J. J. (1979). Sample selection bias as specification error. Econometrica, 47, 153–161.CrossRefGoogle Scholar
  20. Heckman, J. J., Ichimura, H., Smith, J., & Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica, 66, 1017–1098.CrossRefGoogle Scholar
  21. Imbens, G. W., & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–475.CrossRefGoogle Scholar
  22. Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47, 5–86.CrossRefGoogle Scholar
  23. Karlson, K. B., Holm, A., & Breen, R. (2012). Comparing regression coefficients between same sample nested models using logit and probit: A new method. Sociological Methodology, 42(1), 286–313.CrossRefGoogle Scholar
  24. Kim, J.-O., & Mueller, C. W. (1976). Standardized and unstandardized coefficients in causal analysis: An expository note. Sociological Methods & Research, 4, 423–438.CrossRefGoogle Scholar
  25. Mare, R. D. (2006). Response: Statistical models of educational stratification – Hauser and Andrew’s models for school transitions. Sociological Methodology, 36, 27–37.CrossRefGoogle Scholar
  26. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic.Google Scholar
  27. McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103–120.CrossRefGoogle Scholar
  28. Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26, 67–82.CrossRefGoogle Scholar
  29. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press.CrossRefGoogle Scholar
  30. Olsen, R. J. (1982). Independence from irrelevant alternatives and attrition bias: Their relation to one another in the evaluation of experimental programs. Southern Economic Journal, 49, 521–535.CrossRefGoogle Scholar
  31. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710.CrossRefGoogle Scholar
  32. Pearl, J. (2006). Causality: Models, reasoning and inference. Cambridge: Cambridge University Press.Google Scholar
  33. Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese, 121, 151–179.CrossRefGoogle Scholar
  34. Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.CrossRefGoogle Scholar
  35. Robinson, L. D., & Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. International Statistical Review, 58, 227–240.Google Scholar
  36. Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30, 305–314.CrossRefGoogle Scholar
  37. Train, K. (2009). Discrete choice methods with simulation. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  38. Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence result. Econometrica, 70, 331–441.CrossRefGoogle Scholar
  39. Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49, 512–525.CrossRefGoogle Scholar
  40. Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.Google Scholar
  41. Xie, Y. (2011). Values and limitations of statistical models. Research in Social Stratification and Mobility, 29, 343–349.CrossRefGoogle Scholar
  42. Yatchew, A., & Griliches, Z. (1985). Specification error in probit models. The Review of Economics and Statistics, 67, 134–139.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of SociologyYale UniversityNew HavenUSA
  2. 2.Department of SociologySFI – The Danish National Centre for Social ResearchCopenhagenDenmark
  3. 3.Department of EducationAarhus UniversityAarhusDenmark

Personalised recommendations