Skip to main content

Counterfactual Causal Analysis and Nonlinear Probability Models

Part of the Handbooks of Sociology and Social Research book series (HSSR)

Abstract

Nonlinear probability models, such as logits and probits for binary dependent variables, the ordered logit and ordered probit for ordinal dependent variables and the multinomial logit, together with log-linear models for contingency tables, have become widely used by social scientists in the past 30 years. In this chapter, we show that the identification and estimation of causal effects using these models present severe challenges, over and above those usually encountered in identifying causal effects in a linear setting. These challenges are derived from the lack of separate identification of the mean and variance in these models. We show their impact in experimental and observational studies, and we investigate the problems that arise in the use of standard approaches to the causal analysis of nonexperimental data, such as propensity scores, instrumental variables, and control functions. Naive use of these approaches with nonlinear probability models will yield biased estimates of causal effects, though the estimates will be a lower bound of the true causal effect and will have the correct sign. We show that the technique of Y-standardization brings the parameters of nonlinear probability models on a scale that we can meaningfully interpret but cannot measure. Other techniques, such as average partial effects, can yield causal effects on the probability scale, but, in this case, the linear probability model provides a simple and effective alternative.

Keywords

  • Propensity Score
  • Causal Effect
  • Potential Outcome
  • Multinomial Logit Model
  • Latent Variable Model

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-94-007-6094-3_10
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-94-007-6094-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 10.1
Fig. 10.2
Fig. 10.3

Notes

  1. 1.

    In general, we do not use a subscript to indicate individual observations except where its omission might lead to confusion.

  2. 2.

    Indeed, when we apply these models, we also assume that the latent error has a given distribution (e.g., logistic), and we cannot know whether this is an accurate assumption either. But, in general, it seems that these models are more robust (at least when we are concerned about comparisons of parameter values across models or samples) to violations of the assumption about the distributional form of the error than they are to violations of the assumptions about the standard deviation of that distribution (Cramer 2007).

  3. 3.

    We can write the standard deviation of the error in this way even though, given that we assumed e in Eq. (10.1) had a logistic distribution, \( \nu \) will almost certainly not have a logistic distribution.

  4. 4.

    But Robinson and Jewell (1991: 239) point out that “to test the null hypothesis of no treatment effect in a randomized study, it is always as or more efficient to adjust for the covariate [Z in our example] … when logistic models are used” (parentheses added by authors).

  5. 5.

    Or, equally, (YX), if we collapse the three-way table over the Z margin.

  6. 6.

    A third approach we do not discuss here is the use of average effects on the predicted probability. Wooldridge (2002) and Cramer (2007) show that average partial effects (APEs) are unaffected by the attenuation bias created by omitted covariates orthogonal to the treatment variable. See also the concluding section where we discuss the use of the linear probability model.

  7. 7.

    Had we used the probit model for estimating c 1, then \( h=\sqrt{{c_1^2\operatorname{var}(X)+1}} \), reflecting the assumption placed on the latent error term which, for the probit, differs from that of the logit.

  8. 8.

    As noted by Karlson et al. (2012), this can only be hold under the assumption that the latent error distribution of both models, (10.6b) and (10.13), is logistic and we know this cannot be true. However, as noted in footnote 2, violating this assumption appears not to be very consequential for the model’s ability to recover the parameters of interest (see also Cramer 2007).

  9. 9.

    The probit is used here because the error terms, e 3 through e 8, and all the variables are normally distributed.

  10. 10.

    To simplify exposition, in the following, we assume that the causal effect is constant across individuals in the population. Under this assumption, the IV identifies the average causal effect. Whenever that assumption does not hold, an additional assumption—monotonicity—is required in order for the IV to recover the average causal effect for a subset of the population that is affected or moved by the instrument (see Imbens and Angrist 1994; Blundell et al. 2005). However, the problem we sketch in the following also pertains to the situation in which we recover a local average treatment effect.

  11. 11.

    In what follows, we once again assume that causal effects are constant across individuals in the population. Under the assumption of heterogeneous effects, interpretation in terms of the recovered estimate changes somewhat, but this is of less concern here (see Vytlacil 2002; Blundell et al. 2005).

References

  • Achen, C. H. (1977). Measuring representation: Perils of the correlation coefficient. American Journal of Political Science, 21, 805–821.

    CrossRef  Google Scholar 

  • Allison, P. D. (1999). Comparing logit and probit coefficients across groups. Sociological Methods & Research, 28, 186–208.

    CrossRef  Google Scholar 

  • Amemiya, T. (1975). Qualitative response models. Annals of Economic and Social Measurement, 4, 363–388.

    Google Scholar 

  • Angrist, J. D., & Pischke, J.-S. (2008). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.

    Google Scholar 

  • Blalock, H. M. (1967a). Path coefficients versus regression coefficients. The American Journal of Sociology, 72, 675–676.

    CrossRef  Google Scholar 

  • Blalock, H. M. (1967b). Causal inference, closed populations, and measures of association. American Political Science Review, 61, 130–136.

    CrossRef  Google Scholar 

  • Blundell, R., Dearden, L., & Sianesi, B. (2005). Evaluating the effect of education on earnings: Models, methods and results from the National Child Development Survey. Journal of the Royal Statistical Society, Series A, 168, 473–512.

    Google Scholar 

  • Breen, R., Karlson, K. B., & Holm, A. (2012). Correlations and non-linear probability models. Unpublished paper.

    Google Scholar 

  • Cameron, S. V., & Heckman, J. J. (1998). Life cycle schooling and dynamic selection bias: Models and evidence for five cohorts of American males. Journal of Political Economy, 106, 262–333.

    CrossRef  Google Scholar 

  • Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic.

    Google Scholar 

  • Cox, D. R. (1958). Planning of experiments. New York: Wiley.

    Google Scholar 

  • Cramer, J. S. (2007). Robustness of logit analysis: Unobserved heterogeneity and mis-specified disturbances. Oxford Bulletin of Economics and Statistics, 69, 545–555.

    CrossRef  Google Scholar 

  • Fienberg, S. E. (1977). The analysis of cross-classified categorical data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Fisher, R. A. (1932). Statistical methods for research workers. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Gail, M. H. (1986). Adjusting for covariates that have the same distribution in exposed and unexposed cohorts. In S. H. Moolgavkar & R. L. Prentice (Eds.), Modern statistical methods in chronic disease epidemiology (pp. 3–18). New York: Wiley.

    Google Scholar 

  • Gail, M. H., Wieand, S., & Piantdosi, S. (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika, 71, 431–444.

    CrossRef  Google Scholar 

  • Gangl, M. (2010). Causal inference in sociological research. Annual Review of Sociology, 36, 21–48.

    CrossRef  Google Scholar 

  • Hauck, W. W., Neuhaus, J. M., Kalbfleisch, J. D., & Anderson, S. (1991). A consequence of omitted covariates when estimating odds ratios. Journal of Clinical Epidemiology, 44, 77–81.

    CrossRef  Google Scholar 

  • Heckman, J. J. (1979). Sample selection bias as specification error. Econometrica, 47, 153–161.

    CrossRef  Google Scholar 

  • Heckman, J. J., Ichimura, H., Smith, J., & Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica, 66, 1017–1098.

    CrossRef  Google Scholar 

  • Imbens, G. W., & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–475.

    CrossRef  Google Scholar 

  • Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature, 47, 5–86.

    CrossRef  Google Scholar 

  • Karlson, K. B., Holm, A., & Breen, R. (2012). Comparing regression coefficients between same sample nested models using logit and probit: A new method. Sociological Methodology, 42(1), 286–313.

    CrossRef  Google Scholar 

  • Kim, J.-O., & Mueller, C. W. (1976). Standardized and unstandardized coefficients in causal analysis: An expository note. Sociological Methods & Research, 4, 423–438.

    CrossRef  Google Scholar 

  • Mare, R. D. (2006). Response: Statistical models of educational stratification – Hauser and Andrew’s models for school transitions. Sociological Methodology, 36, 27–37.

    CrossRef  Google Scholar 

  • McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic.

    Google Scholar 

  • McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103–120.

    CrossRef  Google Scholar 

  • Mood, C. (2010). Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26, 67–82.

    CrossRef  Google Scholar 

  • Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York: Cambridge University Press.

    CrossRef  Google Scholar 

  • Olsen, R. J. (1982). Independence from irrelevant alternatives and attrition bias: Their relation to one another in the evaluation of experimental programs. Southern Economic Journal, 49, 521–535.

    CrossRef  Google Scholar 

  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710.

    CrossRef  Google Scholar 

  • Pearl, J. (2006). Causality: Models, reasoning and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  • Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese, 121, 151–179.

    CrossRef  Google Scholar 

  • Robins, J. M., Hernán, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550–560.

    CrossRef  Google Scholar 

  • Robinson, L. D., & Jewell, N. P. (1991). Some surprising results about covariate adjustment in logistic regression models. International Statistical Review, 58, 227–240.

    Google Scholar 

  • Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30, 305–314.

    CrossRef  Google Scholar 

  • Train, K. (2009). Discrete choice methods with simulation. Cambridge: Cambridge University Press.

    CrossRef  Google Scholar 

  • Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence result. Econometrica, 70, 331–441.

    CrossRef  Google Scholar 

  • Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49, 512–525.

    CrossRef  Google Scholar 

  • Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Xie, Y. (2011). Values and limitations of statistical models. Research in Social Stratification and Mobility, 29, 343–349.

    CrossRef  Google Scholar 

  • Yatchew, A., & Griliches, Z. (1985). Specification error in probit models. The Review of Economics and Statistics, 67, 134–139.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Breen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Breen, R., Karlson, K.B. (2013). Counterfactual Causal Analysis and Nonlinear Probability Models. In: Morgan, S. (eds) Handbook of Causal Analysis for Social Research. Handbooks of Sociology and Social Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6094-3_10

Download citation