# Propensity Score Modeling and Evaluation

## Abstract

In causal inference for binary treatments, the propensity score is defined as the probability of receiving the treatment given covariates. Under the ignorability assumption, causal treatment effects can be estimated by conditioning on/adjusting for the propensity scores. However, in observational studies, propensity scores are unknown and need to be estimated from the observed data. Estimation of propensity scores is essential in making reliable causal inference. In this chapter, we first briefly discuss the modeling of propensity scores for a binary treatment; then we will focus on the estimation of the generalized propensity scores for categorical treatment variables with more than two levels and continuous treatment variables. We will review both parametric and nonparametric approaches for estimating the generalized propensity scores. In the end, we discuss how to evaluate the performance of different propensity score models and how to choose an optimal one among several candidate models.

## 1 Propensity Score Modeling for a Binary Treatment

The potential outcomes framework [23] has been a popular framework for estimating causal treatment effects. An important quantity to facilitate causal inference has been the propensity score [22], defined as the probability of receiving the treatment given a set of measured covariates. In observational studies, propensity scores are unknown and need to be estimated from the observed data. Consistent estimation of propensity scores is essential in making reliable causal inference. In this section, we briefly review the modeling of propensity scores for a binary treatment variable.

*Y*denote the response of interest,

*T*be the treatment variable, and

*X*be a

*p*-dimensional vector of baseline covariates. The data can be represented as (

*Y*

_{i},

*T*

_{i},

*X*

_{i}),

*i*= 1,

*…*,

*n*, a random sample from (

*Y*,

*T*,

*X*). In addition to the observed quantities, we further define

*Y*

_{i}(

*t*) as the potential outcome if subject

*i*were assigned to treatment level

*t*. Here,

*T*is a random variable and

*t*is a specific level of

*T*. In the case of a binary treatment, let

*T*= 1 if treated and

*T*= 0 if untreated. The propensity score is then defined as

*r*(

*X*) ≡

*P*(

*T*= 1 |

*X*). The quantities we are interested in estimating are usually the average treatment effect (ATE):

### 1.1 Parametric Approaches

In the causal inference literature, propensity score for a binary treatment variable is usually estimated by logistic regression. Using logistic regression to estimate propensity scores can be easily implemented in R. However, logistic regression is not without drawbacks. First of all, a parametric form of *r*(*X*) needs to be specified. Consistent estimation of ATE and ATT relies on the correct logistic regression model. In most cases, only including main effects into the model is not adequate, but it is also hard to determine which interaction terms should be included, especially when the vector of covariates is high-dimensional. In addition, logistic regression is not resistant to outliers [11, 18]. In particular, Kang and Schafer [11] show when the logistic regression model is mildly misspecified, propensity score-based approaches lead to large bias and variance of the estimated treatment effects.

Other parametric approaches for estimating propensity scores include Probit regression modeling and linear discriminant analysis, both of which assume normality. However, through a simulation study, Zhu et al. [31] found that these parametric models give very similar treatment effect estimates.

### 1.2 Machine Learning Techniques

Due to the above-mentioned drawbacks of parametric approaches for modeling propensity scores, more recent literature advocates using machine learning algorithms to model propensity scores [13, 24]. Since in causal inference, propensity scores are auxiliary in the sense that one usually is not interested in interpreting or making inference for the propensity score model, the nonparametric black-box algorithms can be directly used to estimate the propensity scores. Examples are classification and regression trees (CART, [2]) and its various extensions, such as pruned CART, bagged CART, random forests (RF [1]), and boosting [16]. Other classification methods that can indirectly yield class probability estimates include support vector machines (SVM) and K-nearest neighbors (KNN), etc. R packages are readily available, such as *rpart* for CART; *randomForest* for RF, *twang* or *gbm* package for boosting models, and *e1071* for SVM. A detailed review of each approach for estimating propensity scores can be found in [31]. In a simulation study, Zhu et al. found there is a trade-off between bias and variance among parametric and nonparametric approaches. More specifically, parametric methods tend to yield lower bias but higher variance than nonparametric methods for estimating ATE and ATT.

### 1.3 Propensity Score Modeling via Balancing Covariates

*β*is solved by satisfying the following condition:

*X*specified by the researcher. If setting \(\widetilde{X} = \frac{dr_{\beta }(X)} {d\beta }\), one solves the maximum likelihood estimator (MLE) of

*β*because Eq. (6.2) is the score function for MLE. However, if setting \(\widetilde{X} = X\), one aims to achieve optimal balance in the first order of the covariates, because this balancing condition implies the weighted mean value of each covariate is the same between the treatment and the control group. If letting \(\widetilde{X} = \frac{dr_{\beta }(X)} {d\beta }\) and \(\widetilde{X} = X\) at the same time, there will be more equations than unknown parameters to solve and a generalized method of moments [5] is employed for estimation. The above balancing condition is for the estimation of ATE. For estimating ATT, the balancing condition becomes

A related issue is whether we should achieve balance in all the measured covariates in a study or a subset of the available covariates. This is a variable selection issue. Zhu et al. [32] have shown through a simulation study that one should aim to achieve balance in the real confounders, i.e. covariates related to both the treatment variable and the outcome variable, as well as the covariates related only to the outcome variable. Adding additional balancing condition on covariates that are only related to the treatment variable may increase the bias and variance of the estimated treatment effects.

## 2 Propensity Score Modeling for a Multi-level Treatment

In most of the causal inference literature based on potential outcomes framework, researchers have focused on binary treatments. Imbens [10] extended this framework to more general case by defining the generalized propensity score, which is the conditional probability of being assigned to a particular treatment group given the observed covariates. In the past decade, a few studies (e.g., [9, 12, 28]) have extended the propensity score-based approaches to multi-level treatments. Compared with binary treatments, there are two important issues specific to the causal inference with multi-level treatments. The first issue is to define the parameters of interest and to determine whether the parameters are identifiable. As discussed by Imbens [10] and Tchernis et al. [28], for a multi-level treatment, the following parameters may be of interest: (1) the average causal effect of treatment *t* relative to *k*, i.e., *E*[*Y* (*t*) − *Y* (*k*)]; (2) the average causal effect of treatment *t* relative to *k* among those who receive treatment *t*, i.e., *E*[*Y* (*t*) − *Y* (*k*) | *T* = *t*] or (3) the average causal effect of treatment *t* relative to all other treatments among those who receive treatment *t*, i.e., \(E[Y (t) - Y (\bar{t})\vert T = t]\), where \(\bar{t}\) refers to other treatment groups except group *t*. In any of the three definitions, the multi-level treatment variable is dichotomized; in this sense, causal inference with multiple treatments is essentially an extension of the binary case. Therefore, matching, stratification, or inverse probability weighting methods can be employed to estimate the targeted causal effects in a similar way as in binary treatments. The second issue is that in many studies, the treatments are correlated: the odds ratio of receiving one treatment against the other is affected by whether a third treatment is taken into consideration or not. Tchernis et al. [28] pointed out in a simulation study that if the treatments are correlated, ignoring correlations while estimating propensity scores will lead to biased estimation of the causal effect. The commonly used multinomial logistic regression model does not account for correlation. Therefore, the nested logit model or multinomial probit model has been suggested for modeling propensity scores to allow specification of a correlation matrix among treatments. Due to developments in machine learning methods, nonparametric algorithms such as random forests or boosting algorithms can be easily implemented to estimate propensity scores for multiple treatments.

We define some additional notations here. Let *T*_{i} be the treatment status for the *i*th subject, so *T*_{i} = *t* if subject *i* was observed under treatment *t* ∈ { 1, *…*, *M*}, where there are *M* total treatment groups. We further define an indicator variable, indicating membership of a particular treatment group *t*, as *A*_{i}(*t*) = *I*(*T*_{i} = *t*), *t* ∈ { 1, *…*, *M*}. According to Imai and Van Dyk [9], the generalized propensity score is defined as *r*(*t* | *X*) ≡ *Pr*(*T* = *t* | *X*), for *t* = 1, *…*, *M*.

### 2.1 Parametric Approaches

- 1.We assume the following model for the generalized propensity scores:and$$\displaystyle{r(t\vert X)_{\mathrm{MLR}} = \frac{1} {1 +\sum _{ s=2}^{M}e^{\beta '_{s}X}}\quad \mathrm{for}\quad t = 1}$$$$\displaystyle{r(t\vert X)_{\mathrm{MLR}} = \frac{e^{\beta '_{t}x}} {1 +\sum _{ s=2}^{M}e^{\beta '_{s}X}}\quad \mathrm{for}\quad t = 2,\ldots,M}$$
- 2.We maximize the multinomial likelihood function with respect to all the
*β*’s:where$$\displaystyle{L(\beta ) =\prod _{ i=1}^{n}\prod _{ t=1}^{M}r_{ i}(t\vert X)^{A_{i}(t)}}$$*r*_{i}(*t*|*X*) follows the model as defined in Step 1. Equivalently, we maximize the log likelihood function:$$\displaystyle{l(\beta ) =\sum _{ i=1}^{n}\sum _{ t=1}^{M}A_{ i}(t)\log (r_{i}(t\vert X)).}$$ - 3.
The solution \(\hat{\beta }_{s}\) for

*s*= 2,*…*,*M*is substituted into the model to obtain the estimates for the generalized propensity score.

*t*≠

*s*, we have

In R, to fit an MLR model, we can use the package *nnet* [29].

### 2.2 Machine Learning Techniques

In this section, we are going to introduce two machine learning approaches for the modeling of generalized propensity scores: generalized boosted model (GBM) and random forests (RF).

GBM uses an iterative procedure that adds together many simple regression trees to approximate the propensity score function. A regression tree algorithm divides the dataset into two non-overlapping regions based on one of the covariates. Then, it recursively divides each of those regions into two smaller regions, where each split is based on one of the covariates [2]. Note that the splits may occur on a different covariate or the same covariate each time. The splits are chosen so that the prediction error is minimized. After the allowed number of splits have occurred, for each region of the dataset, the estimated response value equals the average response values of the data points within the region.

*g*(

*X*) = log[

*r*(

*X*)∕(1 −

*r*(

*X*))] and the maximum likelihood function can be rewritten as

*l*(

*g*) in (6.4),

*g*(

*X*) is updated at each iteration with

*g*(

*X*) +

*h*(

*X*) where

*h*(

*X*) is the fitted value from a regression tree which models

*γ*

_{i}=

*T*

_{i}− 1∕{1 + exp[−

*g*(

*X*

_{i})]}, the largest increase in (6.4). To avoid overfitting, a shrinkage parameter

*α*is introduced so the update is

*g*(

*X*) +

*α h*(

*X*), where

*α*is usually a small value, such as 0.0001. This iterative estimation procedure can be tuned to yield propensity scores that achieve optimal balance in covariate distribution between the treatment and control groups. The key is to stop the algorithm at the optimal number of trees when a certain balance statistic (e.g., average standardized absolute mean difference in the covariates) is minimized. Interactions are automatically included when multi-level splits are allowed in regression trees and since splits are automatically determined by the algorithm based on a criterion, variable selection is automatically done [16].

McCaffrey et al. [17] extended this algorithm to the multi-level treatment case. We first note that while estimating the generalized propensity score for a particular treatment level *t*, we are interested in the probability that each subject is assigned to a particular treatment *t* as opposed to any other treatment. So essentially we have two groups: those assigned to treatment *t* (equivalent to the treatment group in the binary case), and those that were not assigned to treatment *t* (equivalent to the control group in the binary case). Then we can fit a GBM that balances the covariates between the treatment *t* group and the entire sample [17]. We do this for each of the *M* treatments to obtain the generalized propensity scores \(\hat{r}(t\vert X)\). The estimation of the generalized propensity scores for multi-level treatment can be realized in the R package *twang* [19].

The downside to this method is that by fitting separate GBMs for all *M* treatment groups, it is not guaranteed that the generalized propensity scores for each treatment group will add up to 1. McCaffrey et al. [17] justified that estimating the ATE only requires the propensity scores for the particular treatment groups involved, so as long as the estimated generalized propensity scores are not biased, they do not need to add up to 1.

*t*is the fraction of votes for

*t*from the collection of the random forest classification trees. The specific random forest algorithm for estimating the generalized propensity score is

- 1.
Draw a random sample with replacement of size

*n*(size of dataset), called a bootstrap sample, from the dataset. - 2.
Fit a random forest classification tree to the bootstrap sample.

- 3.
Repeat steps 1 and 2 a large number,

*B*, times and obtain a collection of*B*classification trees (usually,*B*= 500). - 4.For a given vector of covariates
*X*, predict the class label from each fitted tree. The estimated generalized propensity score is then$$\displaystyle{\hat{r}(t\vert X)_{\mathrm{RF}} = \frac{\mbox{ number of trees that voted for class}\ \mathit{t}}{B} }$$

*r*(

*t*|

*X*) < 1 for all

*X*and

*t*, is violated. In addition, since inverse probability weighting and double-robust estimation involve the reciprocal of the estimated propensity score or one minus the estimated propensity score, an estimated score close to 1 or 0 may result in extreme weights. This issue has been frequently discussed in the literature (e.g., [11, 14, 31]). One way to deal with this issue is to trim extreme weights to a percentile. For example, the inverse probability weights higher than the 95th percentile are set to the 95th percentile. Lee et al. [14] showed that trimming extreme weights gain little benefit in terms of bias, standard error, and 95 % confidence interval coverage, and trimming beyond the optimal level increases bias. Another way to deal with extreme weights is to use a weighted average between a parametric model (such as an MLR model) and RFs as the generalized propensity score estimator [31]. This so-called data-adaptive matching score is

As explained by Zhu et al. [31], the intuition of this approach comes from the fact there is a trade-off in bias and variance between parametric and nonparametric approaches. By combining, both bias and variance of the estimated causal effects will be reduced. The choice of *λ* in (6.6) gives more weight to the estimate that is closer to the observed value of *A*(*t*), so it trims extreme weights to more reasonable values without ad hoc adjustment. In addition, it would not attain 0 or 1 as a possible value due to the MLR component.

## 3 Propensity Score Estimation for a Continuous Treatment

Finally, we are going to focus on the case when the treatment variable is continuous. In this case, we are interested in estimating the so-called dose–response function: *μ*(*t*) = *E*[*Y*_{i}(*t*)]. We assume *Y*_{i}(*t*) is well defined for *t* ∈ *τ*, where *τ* = [*t*_{0}, *t*_{1}].

*f*(

*t*| ⋅ ) refers to the conditional density. In other words, we assume the vector of covariates

*X*include all the real confounders that may jointly affect the treatment and the potential outcomes.

*r*(

*t*|

*X*) ≡

*f*

_{t | X}(

*t*|

*X*), which is the conditional density of the treatment level

*t*conditioning on the covariates [10]. The ignorability assumption also implies

*i*is

However, the estimation of the conditional probability function (generalized propensity score) in the denominator is a non-trivial problem because when *X* is high-dimensional, the traditional nonparametric approach for estimating conditional density (e.g., [4]) suffers from curse of dimensionality.

### 3.1 Parametric Approaches

*r*(

*T*

_{i}|

*X*

_{i}). The treatment variable

*T*is assumed to follow a parametric model:

*T*

_{i}on

*X*

_{i},

*i*= 1,

*…*,

*n*, and get \(\hat{T }_{i}\) and \(\hat{\sigma }\); Then, the residuals \(\hat{\epsilon _{i}} = T_{i} -\hat{T } _{i}\),

*i*= 1,

*…*,

*n*, are calculated and

*r*(

*T*

_{i}|

*X*

_{i}) can be approximated by

*T*does not follow a normal distribution (which can be checked based on data), we can always employ nonparametric density estimation approaches, such as Kernel density estimation to estimate

*r*(

*T*

_{i}|

*X*

_{i}) using residuals \(\hat{\epsilon _{i}},i = 1,\ldots,n\).

### 3.2 Machine Learning Techniques

*X*is high-dimensional, the parametric model (6.8) may not be true. A more general approach is to assume

*m*(

*X*) =

*E*(

*T*|

*X*) and we employ a nonparametric approach to estimate the mean function.

*m*(

*X*). The boosting model for a continuous response variable can be represented as

*M*is the total number of trees,

*K*

_{m}is the number of terminal nodes for the

*m*th tree,

*R*

_{mj}is the indicator of rectangular region in the feature space spanned by

*X*, and

*c*

_{mj}is the predicted constant in region

*R*

_{mj}.

*K*

_{m}and

*R*

_{mj}are determined by optimizing some nonparametric information criterion, such as Entropy, misclassification rate, or Gini Index.

*c*

_{mj}is simply the average value of

*T*

_{i}in the training data that falls in the region

*R*

_{mj}. Details about how to construct a classification/regression tree can be found in [2].

*M*is a tuning parameter. If

*M*is too large, the model tends to overfit and results in a large variance and if

*M*is too small, bias will occur. In [30], we propose an innovative criterion to determine the value of

*M*. Notice in the inverse probability weighting approach, if subject

*i*receives a weight

*w*

_{i}as in (6.7), it means the subject will be replicated

*w*

_{i}− 1 times in the weighted pseudo sample. In the weighted sample, if the propensity scores are correctly estimated, the treatment assignment and the covariates are supposed to be unconfounded under the ignorability assumption [21]. Therefore, a reasonable criterion is to stop the algorithm at the number of trees such that the treatment assignment and the covariates are independent (unconfounded) in the weighted sample. Based on this idea, we propose the following procedure to determine the optimal number of trees in [30]:

- 1.Calculate \(\hat{r}(T_{i}\vert X_{i})\) using boosting with
*M*trees. Then, calculatewhere \(\hat{r}(T_{i})\) is estimated by normal density.$$\displaystyle{w_{i} = \frac{\hat{r}(T_{i})} {\hat{r}(T_{i}\vert X_{i})}\quad \mathrm{for}\quad i = 1,\ldots,n.}$$ - 2.
For the

*j*th covariate, denoted as*X*^{j}, calculate the weighted correlation coefficient between*T*and*X*^{j}using weights*w*_{i},*i*= 1,*…*,*n*obtained in the first step and denote it as \(\bar{d}_{j}\); - 3.
Average the absolute value of \(\bar{d}_{j}\) over all the covariates and get the average absolute correlation coefficient (

*AA C C*).

For each value of *M* = 1, 2, *…*, 20,000, calculate *AACC* and find the optimal number of trees that lead to the smallest *AACC* value. In step 2, we employ a bootstrapping approach to obtain the weighted correlation coefficient. Also, we advocate distance correlation coefficient [26, 27] over other correlation metrics. The reason is that the distance correlation takes values between zero and one and it equals zero if and only if *T* and *X*^{j} are independent, regardless of the type of *X*^{j}. The R code for calculating *AACC* is displayed in the Appendix of [30]. After the value of *M* is determined, the generalized propensity score is estimated by (6.11). More details of this approach can be found in [30].

## 4 Propensity Score Evaluation

Given the buffet of methods available to researchers, it is important to select the best one among all the candidate propensity score models. On the other hand, it is commonly accepted that there is no uniformly best procedure for all the datasets. In this section, we briefly talk about how to evaluate a propensity score model and how to choose an optimal one among several candidate models. We are going to focus on the binary treatment case. One way to evaluate the performance of different propensity score models is to see how close the estimates are to the true propensity scores using simulations. However, Hirano et al. [7] and Lunceford and Davidian [15] showed that conditioning on the estimated propensity score rather than the true propensity score can yield smaller variance of the estimated causal effects. That is, even when the propensity score is estimated more accurately, it does not necessarily yield better causal inference estimates.

### 4.1 Evaluation by Checking Balance

*X*, the standardized mean difference is defined as

*s*

_{treated}is the standard deviation of

*X*in the treatment group and

*s*

_{control}is the standard deviation of

*X*in the control (untreated) group; \(\bar{X}_{\mathrm{treated}}^{w}\) is the weighted average of

*X*in the treatment group and \(\bar{X}_{\mathrm{control}}^{w}\) is the weighted average of

*X*in the control group. When estimating ATE,

*s*

_{treated}. We then look at the mean/mediation/maximum value of the ASMD among the covariates and the propensity score model that leads to the smallest value is usually claimed as the best model.

Other criteria to evaluate the balance in the covariates include Kolmogorov–Smirnov statistic [17], *t*-test statistic [6], and c statistic. Recently, an innovative prognostic score-based balance measurement has been proposed by Stuart et al. [25], which accounts for the information in the outcome variable while checking balance. The approach works as follows: first, a model of the outcome on the covariates is fitted and the predicted outcome if untreated is calculated for each subject in the study, which is termed the prognostic score. Then, the weighted ASMD in the prognostic score is calculated as a measure of balance. The authors show in a comprehensive simulation study that this measurement outperforms the other balance measurements, such as mean/median/maximum ASMD and KS statistic, in the sense that it is highly correlated with the bias in the estimated causal treatment effect.

### 4.2 Evaluation Based on a Two-Stage Procedure

*ψ*, which is the parameter of interest, and the propensity score as

*η*, which is the nuisance parameter. Assuming we have

*K*different candidate models for estimating

*η*, we aim to choose the optimal one in terms of estimating

*ψ*. Denote the resulting estimates of

*ψ*from the

*K*candidate models as \(\hat{\psi }_{1}(X)\),…,\(\hat{\psi }_{K}(X)\), and assume there exists an approximately unbiased but highly variable estimate of

*ψ*, denoted as \(\hat{\psi }_{0}(X)\). The model used to estimate

*η*in \(\hat{\psi }_{0}(X)\) is regarded as the reference model. To account for the fact that there is a trade-off between bias and variance while estimating

*ψ*, the authors proposed a cross-validation criterion for selecting the optimal estimator of the nuisance parameter among the

*K*candidate models. Let

*X*

_{v}

^{0}be the training sample and

*X*

_{v}

^{1}be the testing sample in the

*v*th iteration of the Monte-Carlo cross-validation, the criterion function is defined as follows:

*C*

_{v}among the

*K*models. Brookhart and van der Laan [3] proved that the optimal model selected by the Monte Carlo cross-validation criteria leads to the smallest mean square error of the parameter of interest. This approach has been adopted to compare different propensity score models in [33], in which an over-fitted logistic regression model using all the available covariates is treated as the reference propensity score model to obtain \(\hat{\psi }_{0}(X)\).

### References

- 1.Breiman, L.: Random forests. Mach. Learn.
**45**(1), 5–32 (2001)MathSciNetCrossRefMATHGoogle Scholar - 2.Breiman, L., Friedman, J.H., Olshen, R.A., Stone C.J.: Classification and Regression Trees Chapman & Hall/CRC, Boca Raton, FL (1984)MATHGoogle Scholar
- 3.Brookhart, M.A., van der Laan, M.J.: A semiparametric model selection criterion with applications to the marginal structural model. Comput. Stat. Data Anal.
**50**(2), 475–498 (2006)MathSciNetCrossRefMATHGoogle Scholar - 4.Hall, P., Wolff, R.C.L., Yao, Q.: Methods for estimating a conditional distribution function. J. Am. Stat. Assoc.
**94**(445), 154–163 (1999)MathSciNetCrossRefMATHGoogle Scholar - 5.Hansen, L.P.: Large sample properties of generalized method of moments estimators. Econometrica
**50**(4), 1029–1054 (1982)MathSciNetCrossRefMATHGoogle Scholar - 6.Hirano, K., Imbens, G.W.: Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv. Outcome Res. Methodol.
**2**(3), 259–278 (2001)CrossRefGoogle Scholar - 7.Hirano, K., Imbens, G.W., Ridder, G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica
**71**(4), 1161–1189 (2003)MathSciNetCrossRefMATHGoogle Scholar - 8.Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc. Ser. B (Stat Methodol.)
**76**(1), 243–263 (2014)Google Scholar - 9.Imai, K., Van Dyk, D.A.: Causal inference with general treatment regimes. J. Am. Stat. Assoc.
**99**(467), 854–866 (2004)CrossRefMATHGoogle Scholar - 10.Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika
**87**(3), 706–710 (2000)MathSciNetCrossRefMATHGoogle Scholar - 11.Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci.
**22**(4), 523–539 (2007)MathSciNetCrossRefMATHGoogle Scholar - 12.Lechner, M.: Program heterogeneity and propensity score matching: an application to the evaluation of active labor market policies. Rev. Econ. Stat.
**84**(2), 205–220 (2002)MathSciNetCrossRefGoogle Scholar - 13.Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med.
**29**(3), 337–346 (2010)MathSciNetGoogle Scholar - 14.Lee, B.K., Lessler, J., Stuart, E.A.: Weight trimming and propensity score weighting. PLoS ONE
**6**(3), e18174 (2011)CrossRefGoogle Scholar - 15.Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med.
**23**(19), 2937–2960 (2004)CrossRefGoogle Scholar - 16.McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods
**9**(4), 403–425 (2004)CrossRefGoogle Scholar - 17.McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med.
**32**(19), 3388–3414 (2013)MathSciNetCrossRefGoogle Scholar - 18.Pregibon, D.: Resistant fits for some commonly used logistic models with medical applications. Biometrics
**38**(2), 485–498 (1982)CrossRefGoogle Scholar - 19.Ridgeway, G., McCaffrey, D., Morral, A., Burgette, L., Griffin, B.A.: Toolkit for weighting and analysis of nonequivalent groups: a tutorial for the twang package. R vignette. RAND, 2015.Google Scholar
- 20.Robins, J.M.: Association, causation, and marginal structural models. Synthese
**121**(1), 151–179 (1999)MathSciNetCrossRefMATHGoogle Scholar - 21.Robins, J.M., Hernán, M.Á., Brumback, B.: Marginal structural models and causal inference in epidemiology. Epidemiology.
**11**(5), 550–560 (2000)CrossRefGoogle Scholar - 22.Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika
**70**(1), 41–55 (1983)MathSciNetCrossRefMATHGoogle Scholar - 23.Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol.
**66**(5), 688–701 (1974)CrossRefGoogle Scholar - 24.Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol. Drug Saf.
**17**(6), 546–555 (2008)CrossRefGoogle Scholar - 25.Stuart, E.A., Lee, B.K., Leacy, F.P.: Prognostic score–based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. J. Clin. Epidemiol.
**66**(8), S84–S90 (2013)CrossRefGoogle Scholar - 26.Székely, G.J., Rizzo, M.L.: Brownian distance covariance. Ann. Appl. Stat.
**32**(8), 1236–1265 (2009)MathSciNetCrossRefMATHGoogle Scholar - 27.Székely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distances. Ann. Stat.
**35**(6), 2769–2794 (2007)MathSciNetCrossRefMATHGoogle Scholar - 28.Tchernis, R., Horvitz-Lennon, M., Normand, S.L.T.: On the use of discrete choice models for causal inference. Stat. Med.
**24**(14), 2197–2212 (2005)MathSciNetCrossRefGoogle Scholar - 29.Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). ISBN 0-387-95457-0CrossRefMATHGoogle Scholar
- 30.Zhu, Y., Coffman, D.L., Ghosh, D.: A boosting algorithm for estimating generalized propensity scores with continuous treatments. J. Causal Inference
**3**(1), 25–40 (2015)CrossRefGoogle Scholar - 31.Zhu, Y., Ghosh, D., Mitra, N., Mukherjee, B.: A data-adaptive strategy for inverse weighted estimation of causal effect. Health Serv. Outcome Res. Methodol.
**14**(3), 69–91 (2014)CrossRefGoogle Scholar - 32.Zhu, Y., Schonbach, M., Coffman, D.L., Williams, J.S.: Variable selection for propensity score estimation via balancing covariates. Epidemiology
**26**(2), e14–e15 (2015)CrossRefGoogle Scholar - 33.Zhu, Y., Ghosh, D., Coffman, D.L., Savage, J.S.: Estimating controlled direct effects of restrictive feeding practices in the ‘early dieting in girls’ study. J. R. Stat. Soc.: Ser. C: Appl. Stat.
**65**(1), 115–130 (2016)Google Scholar