Evaluating Decision Makers over Selectively Labelled Data: A Causal Modelling Approach

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12323)


We present a Bayesian approach to evaluate AI decision systems using data from past decisions. Our approach addresses two challenges that are typically encountered in such settings and prevent a direct evaluation. First, the data may not have included all factors that affected past decisions. And second, past decisions may have led to unobserved outcomes. This is the case, for example, when a bank decides whether a customer should be granted a loan, and the outcome of interest is whether the customer will repay the loan. In this case, the data includes the outcome (if loan was repaid or not) only for customers who were granted the loan, but not for those who were not. To address these challenges, we formalize the decision making process with a causal model, considering also unobserved features. Based on this model, we compute counterfactuals to impute missing outcomes, which in turn allows us to produce accurate evaluations. As we demonstrate over real and synthetic data, our approach estimates the quality of decisions more accurately and robustly compared to previous methods.


Selective labels Selection bias Causal modelling Bayesian inference Model evaluation 



Authors acknowledge the computer capacity from the Finnish Grid and Cloud Infrastructure (urn:nbn:fi:research-infras-2016072533). RL was supported by HICT; AH by Academy of Finland grants 295673, 316771 and by HIIT; and MM by Research Funds of the University of Helsinki.


  1. 1.
    Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)CrossRefGoogle Scholar
  2. 2.
    Brennan, T., Dieterich, W., Ehret, B.: Evaluating the predictive validity of the COMPAS risk and needs assessment system. Crim. Justice Behav. 36(1), 21–40 (2009)CrossRefGoogle Scholar
  3. 3.
    Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the ACM SIGKDD (2017)Google Scholar
  4. 4.
    Coston, A., Mishler, A., Kennedy, E.H., Chouldechova, A.: Counterfactual risk assessments, evaluation, and fairness. In: Proceedings of the FAT, pp. 582–593 (2020)Google Scholar
  5. 5.
    De-Arteaga, M., Dubrawski, A., Chouldechova, A.: Learning under selective labels in the presence of expert consistency. arXiv preprint arXiv:1807.00905 (2018)
  6. 6.
    Hernán, M.A., Hernández-Díaz, S., Robins, J.M.: A structural approach to selection bias. Epidemiology 15(5), 615–625 (2004)CrossRefGoogle Scholar
  7. 7.
    Jung, J., Concannon, C., Shroff, R., Goel, S., Goldstein, D.G.: Simple rules to guide expert classifications. J. Roy. Stat. Soc.: Ser. A 183, 771–800 (2020)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jung, J., Shroff, R., Feller, A., Goel, S.: Bayesian sensitivity analysis for offline policy evaluation. In: Proceedings of the AIES (2020)Google Scholar
  9. 9.
    Kallus, N., Zhou, A.: Confounding-robust policy improvement. In: Advances in Neural Information Processing Systems, pp. 9269–9279 (2018)Google Scholar
  10. 10.
    Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., Mullainathan, S.: Human decisions and machine predictions. Q. J. Econ. 133(1), 237–293 (2018)zbMATHGoogle Scholar
  11. 11.
    Kusner, M.J., Russell, C., Loftus, J.R., Silva, R.: Making decisions that reduce discriminatory impacts. In: Proceedings of the ICML (2019)Google Scholar
  12. 12.
    Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J., Mullainathan, S.: The selective labels problem: evaluating algorithmic predictions in the presence of unobservables. In: Proceedings of the ACM SIGKDD (2017)Google Scholar
  13. 13.
    Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)zbMATHGoogle Scholar
  14. 14.
    Madras, D., Creager, E., Pitassi, T., Zemel, R.: Fairness through causal awareness: learning causal latent-variable models for biased data. In: Proceedings of the FAT (2019)Google Scholar
  15. 15.
    McCandless, L.C., Gustafson, P.: A comparison of Bayesian and Monte Carlo sensitivity analysis for unmeasured confounding. Stat. Med. 36(18), 2887–2901 (2017)MathSciNetCrossRefGoogle Scholar
  16. 16.
    McCandless, L.C., Gustafson, P., Levy, A.: Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26(11), 2331–2347 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Pearl, J.: An introduction to causal inference. Int. J. Biostat. 6(2) (2010).
  18. 18.
    Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Thomas, P.S., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: Proceedings of the ICML (2016)Google Scholar
  20. 20.
    Tolan, S., Miron, M., Gómez, E., Castillo, C.: Why machine learning may lead to unfairness: evidence from risk assessment for Juvenile justice in Catalonia. In: Proceedings of the Artificial Intelligence and Law (2019)Google Scholar
  21. 21.
    Zhang, J., Bareinboim, E.: Fairness in decision-making - the causal explanation formula. In: Proceedings of the AAAI (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of HelsinkiHelsinkiFinland
  2. 2.HIIT, Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations