Skip to main content

Evaluating Decision Makers over Selectively Labelled Data: A Causal Modelling Approach

  • 982 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12323)


We present a Bayesian approach to evaluate AI decision systems using data from past decisions. Our approach addresses two challenges that are typically encountered in such settings and prevent a direct evaluation. First, the data may not have included all factors that affected past decisions. And second, past decisions may have led to unobserved outcomes. This is the case, for example, when a bank decides whether a customer should be granted a loan, and the outcome of interest is whether the customer will repay the loan. In this case, the data includes the outcome (if loan was repaid or not) only for customers who were granted the loan, but not for those who were not. To address these challenges, we formalize the decision making process with a causal model, considering also unobserved features. Based on this model, we compute counterfactuals to impute missing outcomes, which in turn allows us to produce accurate evaluations. As we demonstrate over real and synthetic data, our approach estimates the quality of decisions more accurately and robustly compared to previous methods.


  • Selective labels
  • Selection bias
  • Causal modelling
  • Bayesian inference
  • Model evaluation

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-61527-7_1
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-61527-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.


  1. 1.


  1. Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)

    CrossRef  Google Scholar 

  2. Brennan, T., Dieterich, W., Ehret, B.: Evaluating the predictive validity of the COMPAS risk and needs assessment system. Crim. Justice Behav. 36(1), 21–40 (2009)

    CrossRef  Google Scholar 

  3. Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the ACM SIGKDD (2017)

    Google Scholar 

  4. Coston, A., Mishler, A., Kennedy, E.H., Chouldechova, A.: Counterfactual risk assessments, evaluation, and fairness. In: Proceedings of the FAT, pp. 582–593 (2020)

    Google Scholar 

  5. De-Arteaga, M., Dubrawski, A., Chouldechova, A.: Learning under selective labels in the presence of expert consistency. arXiv preprint arXiv:1807.00905 (2018)

  6. Hernán, M.A., Hernández-Díaz, S., Robins, J.M.: A structural approach to selection bias. Epidemiology 15(5), 615–625 (2004)

    CrossRef  Google Scholar 

  7. Jung, J., Concannon, C., Shroff, R., Goel, S., Goldstein, D.G.: Simple rules to guide expert classifications. J. Roy. Stat. Soc.: Ser. A 183, 771–800 (2020)

    MathSciNet  CrossRef  Google Scholar 

  8. Jung, J., Shroff, R., Feller, A., Goel, S.: Bayesian sensitivity analysis for offline policy evaluation. In: Proceedings of the AIES (2020)

    Google Scholar 

  9. Kallus, N., Zhou, A.: Confounding-robust policy improvement. In: Advances in Neural Information Processing Systems, pp. 9269–9279 (2018)

    Google Scholar 

  10. Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., Mullainathan, S.: Human decisions and machine predictions. Q. J. Econ. 133(1), 237–293 (2018)

    MATH  Google Scholar 

  11. Kusner, M.J., Russell, C., Loftus, J.R., Silva, R.: Making decisions that reduce discriminatory impacts. In: Proceedings of the ICML (2019)

    Google Scholar 

  12. Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J., Mullainathan, S.: The selective labels problem: evaluating algorithmic predictions in the presence of unobservables. In: Proceedings of the ACM SIGKDD (2017)

    Google Scholar 

  13. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)

    MATH  Google Scholar 

  14. Madras, D., Creager, E., Pitassi, T., Zemel, R.: Fairness through causal awareness: learning causal latent-variable models for biased data. In: Proceedings of the FAT (2019)

    Google Scholar 

  15. McCandless, L.C., Gustafson, P.: A comparison of Bayesian and Monte Carlo sensitivity analysis for unmeasured confounding. Stat. Med. 36(18), 2887–2901 (2017)

    MathSciNet  CrossRef  Google Scholar 

  16. McCandless, L.C., Gustafson, P., Levy, A.: Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26(11), 2331–2347 (2007)

    MathSciNet  CrossRef  Google Scholar 

  17. Pearl, J.: An introduction to causal inference. Int. J. Biostat. 6(2) (2010).

  18. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    MathSciNet  CrossRef  Google Scholar 

  19. Thomas, P.S., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: Proceedings of the ICML (2016)

    Google Scholar 

  20. Tolan, S., Miron, M., Gómez, E., Castillo, C.: Why machine learning may lead to unfairness: evidence from risk assessment for Juvenile justice in Catalonia. In: Proceedings of the Artificial Intelligence and Law (2019)

    Google Scholar 

  21. Zhang, J., Bareinboim, E.: Fairness in decision-making - the causal explanation formula. In: Proceedings of the AAAI (2018)

    Google Scholar 

Download references


Authors acknowledge the computer capacity from the Finnish Grid and Cloud Infrastructure (urn:nbn:fi:research-infras-2016072533). RL was supported by HICT; AH by Academy of Finland grants 295673, 316771 and by HIIT; and MM by Research Funds of the University of Helsinki.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Riku Laine .

Editor information

Editors and Affiliations


Appendix 1 Counterfactual Inference

Here we derive Eq. 4, via Pearl’s counterfactual inference protocol involving three steps: abduction, action, and inference [17]. Our model can be represented with the following structural equations over the graph structure in Fig. 2:

$$\begin{aligned} \mathsf {J}&:= \epsilon _{\mathsf {J}}, \quad \mathsf {Z}:= \epsilon _\mathsf {Z}, \quad \mathsf {X}:= \epsilon _\mathsf {X}, \quad \mathsf {T}:= g(\mathsf {H},\mathsf {X},\mathsf {Z},\epsilon _{\mathsf {T}}), \quad \mathsf {Y}:= f(\mathsf {T},\mathsf {X},\mathsf {Z},\epsilon _\mathsf {Y}). \end{aligned}$$

For any cases where \(\mathsf {T} =0\) in the data, we calculate the counterfactual value of \(\mathsf {Y} \) if we had \(\mathsf {T} =1\). We assume here that all these parameters, functions and distributions are known. In the abduction step we determine \(\mathbf {P}(\epsilon _\mathsf {H}, \epsilon _\mathsf {Z}, \epsilon _\mathsf {X}, \epsilon _{\mathsf {T}},\epsilon _\mathsf {Y} |j,x,\mathsf {T} =0)\), the distribution of the stochastic disturbance terms updated to take into account the observed evidence on the decision maker, observed features and the decision (given the decision \(\mathsf {T} =0\) disturbances are independent of \(\mathsf {Y} \)). We directly know \(\epsilon _\mathsf {X} =x \) and \(\epsilon _{_\mathsf {J}}=j \). Due to the special form of f the observed evidence is independent of \(\epsilon _\mathsf {Y} \) when \(\mathsf {T} = 0\). We only need to determine \(\mathbf {P}(\epsilon _\mathsf {Z},\epsilon _{\mathsf {T}}|h ,x,\mathsf {T} =0)\). Next, the action step involves intervening on \(\mathsf {T} \) and setting \(\mathsf {T} =1\) by intervention. Finally in the prediction step we estimate \(\mathsf {Y} \):

where we used \(\epsilon _\mathsf {Z} =z \) and integrated out \(\epsilon _\mathsf {T} \) and \(\epsilon _\mathsf {Y} \). This gives us the counterfactual expectation of Y for a single subject.

Appendix 2 On the Priors of the Bayesian Model

The priors for \(\gamma _\mathsf {X},~\beta _\mathsf {X},~\gamma _\mathsf {Z} \) and \(\beta _\mathsf {Z} \) were defined using the gamma-mixture representation of Student’s t-distribution with \(\nu =6\) degrees of freedom. The gamma-mixture is obtained by first sampling a precision parameter from \(\varGamma \)() and then drawing the coefficient from zero-mean Gaussian with that precision. This procedure was applied to the scale parameters \(\eta _\mathsf {Z},~\eta _{\beta _\mathsf {X}}\) and \(\eta _{\gamma _\mathsf {X}}\) as shown below. For vector-valued \(\mathsf {X}\), the components of \(\gamma _\mathsf {X} \) (\(\beta _\mathsf {X} \)) were sampled independently with a joint precision parameter \(\eta _{\gamma _\mathsf {X}}\) (\(\beta _{\gamma _\mathsf {X}}\)). The coefficients for the unobserved confounder \(\mathsf {Z}\) were bounded to the positive values to ensure identifiability.

$$\begin{aligned} \eta _\mathsf {Z}, \eta _{\beta _\mathsf {X}}, \eta _{\gamma _\mathsf {X}} \sim \varGamma (3, 3), \; \gamma _\mathsf {Z},\beta _\mathsf {Z} \sim N_+(0, \eta _\mathsf {Z} ^{-1}),\; \gamma _\mathsf {X} \sim N(0, \eta _{\gamma _\mathsf {X}}^{-1}),\; \beta _\mathsf {X} \sim N(0, \eta _{\beta _\mathsf {X}}^{-1}) \end{aligned}$$

The intercepts for the decision makers in the data and outcome \(\mathsf {Y}\) had hierarchical Gaussian priors with variances \(\sigma _\mathsf {T} ^2\) and \(\sigma _\mathsf {Y} ^2\). The decision makers had a joint variance parameter \(\sigma _\mathsf {T} ^2\).

$$\begin{aligned} \sigma _\mathsf {T} ^2,~\sigma _\mathsf {Y} ^2 \sim N_+(0, \tau ^2),\quad \alpha _j \sim N(0, \sigma _\mathsf {T} ^2),\quad \alpha _\mathsf {Y} \sim N(0, \sigma _\mathsf {Y} ^2) \end{aligned}$$

The parameters \(\sigma _\mathsf {T} ^2\) and \(\sigma _\mathsf {Y} ^2\) were drawn independently from Gaussian distributions with mean 0 and variance \(\tau ^2=1\), and restricted to the positive real axis.

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Laine, R., Hyttinen, A., Mathioudakis, M. (2020). Evaluating Decision Makers over Selectively Labelled Data: A Causal Modelling Approach. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61526-0

  • Online ISBN: 978-3-030-61527-7

  • eBook Packages: Computer ScienceComputer Science (R0)