Skip to main content

Counterfactual Causal Analysis and Nonlinear Probability Models

  • Chapter
  • First Online:
Book cover Handbook of Causal Analysis for Social Research

Part of the book series: Handbooks of Sociology and Social Research ((HSSR))

Abstract

Nonlinear probability models, such as logits and probits for binary dependent variables, the ordered logit and ordered probit for ordinal dependent variables and the multinomial logit, together with log-linear models for contingency tables, have become widely used by social scientists in the past 30 years. In this chapter, we show that the identification and estimation of causal effects using these models present severe challenges, over and above those usually encountered in identifying causal effects in a linear setting. These challenges are derived from the lack of separate identification of the mean and variance in these models. We show their impact in experimental and observational studies, and we investigate the problems that arise in the use of standard approaches to the causal analysis of nonexperimental data, such as propensity scores, instrumental variables, and control functions. Naive use of these approaches with nonlinear probability models will yield biased estimates of causal effects, though the estimates will be a lower bound of the true causal effect and will have the correct sign. We show that the technique of Y-standardization brings the parameters of nonlinear probability models on a scale that we can meaningfully interpret but cannot measure. Other techniques, such as average partial effects, can yield causal effects on the probability scale, but, in this case, the linear probability model provides a simple and effective alternative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In general, we do not use a subscript to indicate individual observations except where its omission might lead to confusion.

  2. 2.

    Indeed, when we apply these models, we also assume that the latent error has a given distribution (e.g., logistic), and we cannot know whether this is an accurate assumption either. But, in general, it seems that these models are more robust (at least when we are concerned about comparisons of parameter values across models or samples) to violations of the assumption about the distributional form of the error than they are to violations of the assumptions about the standard deviation of that distribution (Cramer 2007).

  3. 3.

    We can write the standard deviation of the error in this way even though, given that we assumed e in Eq. (10.1) had a logistic distribution, \( \nu \) will almost certainly not have a logistic distribution.

  4. 4.

    But Robinson and Jewell (1991: 239) point out that “to test the null hypothesis of no treatment effect in a randomized study, it is always as or more efficient to adjust for the covariate [Z in our example] … when logistic models are used” (parentheses added by authors).

  5. 5.

    Or, equally, (YX), if we collapse the three-way table over the Z margin.

  6. 6.

    A third approach we do not discuss here is the use of average effects on the predicted probability. Wooldridge (2002) and Cramer (2007) show that average partial effects (APEs) are unaffected by the attenuation bias created by omitted covariates orthogonal to the treatment variable. See also the concluding section where we discuss the use of the linear probability model.

  7. 7.

    Had we used the probit model for estimating c 1, then \( h=\sqrt{{c_1^2\operatorname{var}(X)+1}} \), reflecting the assumption placed on the latent error term which, for the probit, differs from that of the logit.

  8. 8.

    As noted by Karlson et al. (2012), this can only be hold under the assumption that the latent error distribution of both models, (10.6b) and (10.13), is logistic and we know this cannot be true. However, as noted in footnote 2, violating this assumption appears not to be very consequential for the model’s ability to recover the parameters of interest (see also Cramer 2007).

  9. 9.

    The probit is used here because the error terms, e 3 through e 8, and all the variables are normally distributed.

  10. 10.

    To simplify exposition, in the following, we assume that the causal effect is constant across individuals in the population. Under this assumption, the IV identifies the average causal effect. Whenever that assumption does not hold, an additional assumption—monotonicity—is required in order for the IV to recover the average causal effect for a subset of the population that is affected or moved by the instrument (see Imbens and Angrist 1994; Blundell et al. 2005). However, the problem we sketch in the following also pertains to the situation in which we recover a local average treatment effect.

  11. 11.

    In what follows, we once again assume that causal effects are constant across individuals in the population. Under the assumption of heterogeneous effects, interpretation in terms of the recovered estimate changes somewhat, but this is of less concern here (see Vytlacil 2002; Blundell et al. 2005).

References

  • Achen, C. H. (1977). Measuring representation: Perils of the correlation coefficient. American Journal of Political Science, 21, 805–821.

    Article  Google Scholar 

  • Allison, P. D. (1999). Comparing logit and probit coefficients across groups. Sociological Methods & Research, 28, 186–208.

    Article  Google Scholar 

  • Amemiya, T. (1975). Qualitative response models. Annals of Economic and Social Measurement, 4, 363–388.

    Google Scholar 

  • Angrist, J. D., & Pischke, J.-S. (2008). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.

    Google Scholar 

  • Blalock, H. M. (1967a). Path coefficients versus regression coefficients. The American Journal of Sociology, 72, 675–676.

    Article  Google Scholar 

  • Blalock, H. M. (1967b). Causal inference, closed populations, and measures of association. American Political Science Review, 61, 130–136.

    Article  Google Scholar 

  • Blundell, R., Dearden, L., & Sianesi, B. (2005). Evaluating the effect of education on earnings: Models, methods and results from the National Child Development Survey. Journal of the Royal Statistical Society, Series A, 168, 473–512.

    Google Scholar 

  • Breen, R., Karlson, K. B., & Holm, A. (2012). Correlations and non-linear probability models. Unpublished paper.

    Google Scholar 

  • Cameron, S. V., & Heckman, J. J. (1998). Life cycle schooling and dynamic selection bias: Models and evidence for five cohorts of American males. Journal of Political Economy, 106, 262–333.

    Article  Google Scholar 

  • Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.

    Google Scholar 

  • Cox, D. R. (1958). Planning of experiments. New York: Wiley.

    Google Scholar 

  • Cramer, J. S. (2007). Robustness of logit analysis: Unobserved heterogeneity and mis-specified disturbances. Oxford Bulletin of Economics and Statistics, 69, 545–555.

    Article  Google Scholar 

  • Fienberg, S. E. (1977). The analysis of cross-classified categorical data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Fisher, R. A. (1932). Statistical methods for research workers. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Gail, M. H. (1986). Adjusting for covariates that have the same distribution in exposed and unexposed cohorts. In S. H. Moolgavkar & R. L. Prentice (Eds.), Modern statistical methods in chronic disease epidemiology (pp. 3–18). New York: Wiley.

    Google Scholar 

  • Gail, M. H., Wieand, S., & Piantdosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71, 431–444.

    Article  Google Scholar 

  • Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology, 36, 21–48.

    Article  Google Scholar 

  • Hauck, W. W., Neuhaus, J. M., Kalbfleisch, J. D., & Anderson, S. (1991). A consequence of omitted covariates when estimating odds ratios. Journal of Clinical Epidemiology, 44, 77–81.

    Article  Google Scholar 

  • Heckman, J. J. (1979). Sample selection bias as specification error. Econometrica, 47, 153–161.

    Article  Google Scholar 

  • Heckman, J. J., Ichimura, H., Smith, J., & Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica, 66, 1017–1098.

    Article  Google Scholar 

  • Imbens, G. W., & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–475.

    Article  Google Scholar 

  • Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47, 5–86.

    Article  Google Scholar 

  • Karlson, K. B., Holm, A., & Breen, R. (2012). Comparing regression coefficients between same sample nested models using logit and probit: A new method. Sociological Methodology, 42(1), 286–313.

    Article  Google Scholar 

  • Kim, J.-O., & Mueller, C. W. (1976). Standardized and unstandardized coefficients in causal analysis: An expository note. Sociological Methods & Research, 4, 423–438.

    Article  Google Scholar 

  • Mare, R. D. (2006). Response: Statistical models of educational stratification – Hauser and Andrew’s models for school transitions. Sociological Methodology, 36, 27–37.

    Article  Google Scholar 

  • McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic.

    Google Scholar 

  • McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103–120.

    Article  Google Scholar 

  • Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26, 67–82.

    Article  Google Scholar 

  • Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press.

    Book  Google Scholar 

  • Olsen, R. J. (1982). Independence from irrelevant alternatives and attrition bias: Their relation to one another in the evaluation of experimental programs. Southern Economic Journal, 49, 521–535.

    Article  Google Scholar 

  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710.

    Article  Google Scholar 

  • Pearl, J. (2006). Causality: Models, reasoning and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  • Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese, 121, 151–179.

    Article  Google Scholar 

  • Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.

    Article  Google Scholar 

  • Robinson, L. D., & Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. International Statistical Review, 58, 227–240.

    Google Scholar 

  • Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30, 305–314.

    Article  Google Scholar 

  • Train, K. (2009). Discrete choice methods with simulation. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence result. Econometrica, 70, 331–441.

    Article  Google Scholar 

  • Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49, 512–525.

    Article  Google Scholar 

  • Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Xie, Y. (2011). Values and limitations of statistical models. Research in Social Stratification and Mobility, 29, 343–349.

    Article  Google Scholar 

  • Yatchew, A., & Griliches, Z. (1985). Specification error in probit models. The Review of Economics and Statistics, 67, 134–139.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Breen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Breen, R., Karlson, K.B. (2013). Counterfactual Causal Analysis and Nonlinear Probability Models. In: Morgan, S. (eds) Handbook of Causal Analysis for Social Research. Handbooks of Sociology and Social Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6094-3_10

Download citation

Publish with us

Policies and ethics