Skip to main content

Advertisement

Log in

Racial treatment disparities after machine learning surgical risk-adjustment

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

Black patients are less likely to receive certain surgical interventions. To test whether a health risk disparity and thus differential appropriateness for surgery explains a treatment disparity, researchers must adjust observed rates for patient-level health differences using valid contextual regression controls to increase patient comparability. As an alternative to the standard health adjustment with predetermined diagnosis groups, I propose a machine learning-based method that better captures clinical practices to adjust for the important predictors of invasive surgery applied to the context of acute myocardial infarction (AMI). With data from the Nationwide Inpatient Sample, this method decreases the standard adjusted AMI surgery disparity by 45–55%. Nonetheless, a significant surgery disparity of 5.9 percentage points with hospital fixed effects and 4.5 percentage points with physician fixed effects remains after adjusting for predictive controls. The smaller yet persistent disparity provides evidence of differential AMI treatment beyond that explained by health risk differences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. I follow the Office of Management and Budget and the Institute of Medicine and define Black race as“a person having origins in any of the Black racial groups of Africa” but also recognize the considerable criticism of racial classifications and the arguable lack of biological and genetic bases beyond their social construction (Jones 2001; Kaufman and Cooper 2001; Williams and Sternthal 2010; The Office of Management and Budget 2016; Sen and Wasow 2016; Yudell et al. 2016). I also follow the Institute of Medicine and the literature and use the term “Black” and “African American” interchangeably to refer to patient race as reported in the Nationwide Inpatient Sample while also noting the difficulty of recording information on race in the clinical record discussed further in Sect. 3 (Nelson and Smedley 2003; The Office of Management and Budget 2016).

  2. In this paper, I use “risk” in terms of the underlying chance of a given treatment and not risk of mortality or complication.

  3. Community hospitals, as defined by American Hospital Association, include non-federal, short-term, general and other specialty hospitals and exclude federal hospitals, long-term hospitals, psychiatric hospitals, alcohol/chemical dependency treatment facilities, and hospitals units within institutions such as prisons (Health Forum LLC 2018).

  4. 394,857 observations dropped due to missing data with the race variable as an important component assumed to be missing at random for purposes of this study. See Houchens (2015) for a more thorough discussion of race and missingness in the NIS.

  5. States that do not supply physician identifiers are AK, CA, CT, HI, IL, IN, LA, MA, MS, NC, OK, UT, VT, WI.

  6. NIS contains up to 25 diagnoses codes per patient (15 prior to the 2009 NIS) but the maximum number of diagnoses varies by state. The mean number of diagnoses per patient in the AMI sample is 10.

  7. A base surgery rate of 36%, near the middle of the distribution, suggests that a linear approximation yields the same conclusions as a conditional logistic regression. I ran logistic regressions on subsets of the data which confirmed this.

  8. HCUP excludes cardiac arrhythmia from the comorbidity software following concerns about its reliability as a comorbidity (Thompson et al. 2015).

  9. Full code list available in Appendix.

References

Download references

Acknowledgements

I would like to thank Indiana University professors Seth Freedman, Kosali Simon, Coady Wing, Predrag Radivojac, Brad Heim, Sergio Fernandez, Alex Hollingsworth, Justin Ross, Evan Ringquist, Dan Sacks, participants at the APPAM Fall Research Conference, Midwestern Economics Association Annual Meeting, ASHEcon Biennial Conference. A special thank you to Sean Mooney and the Mooney Lab, Anirban Basu, and participants of the PHEnOM series at the University of Washington. Thank you to the National Library of Medicine research training grant (T15-LM007442-19) and PI Peter Tarczy-Hornoch. I also thank the Agency for Healthcare Research and Quality and its HCUP data partners (https://www.hcup-us.ahrq.gov/db/hcupdatapartners.jsp). All Errors are my own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noah Hammarlund.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Cross-validation

Cross-validation repeatedly builds and evaluates a given model on different exclusive samples of the data, called folds, to estimate the model’s out-of-sample predictive accuracy (Ruppert 2004). K-folds cross-validation denotes the number of folds and repetitions of this cross-validation process as K. Figure 3 illustrates a threefold cross-validation process, where the entire dataset is cut into three folds. In round 1, folds 2 and 3 build the prediction model. The predictions are then tested for their accuracy on fold 1. 3-fold cross validation repeats the process for 3 total rounds until each fold is the hold-out testing sample. The average predictive accuracy for all rounds is an estimate of an overall accuracy. To evaluate my predictions, I use the standard 10-fold cross-validation, which is ten folds of the data and ten total rounds of prediction accuracy evaluation.

Fig. 3
figure 3

Cross-validation in AMI Sample

Appendix 2: Post-double selection

As before, to find important predictors of surgery, I first regress invasive surgery on I, a vector of all potential candidate variables to be selected for risk-adjustment.

$$\begin{aligned} Surgery_{ith}=\alpha ^S+ \sum _{j=1}^{J} I_j \beta _j^S +\delta _{th}^S +\epsilon _{ith} ^S \end{aligned}$$
(A1)

This step models important confounding variables in my model of interest (Eq. 1), to keep the residual variance small.

I next regress my independent variable of interest, an indicator for Black race of the patient, on the same vector of potential candidate variables, I. This step models variables that are strongly related to the Black race and thus potentially important confounding factors. Finding potential confounding variables for both the dependent and independent variables of interest lowers the chances of omitting important explanatory variables.

$$\begin{aligned} Black_{ith}= \alpha ^B+ \sum _{k=1}^{K} I_k \beta _k^B +\delta _{th}^B +\epsilon _{ith} ^B \end{aligned}$$
(A2)

I then estimate my model of interest (Eq. A1) as a linear probability model including variables important in either the invasive surgery (\(R_{S}\)) or the Black indicator (\(R_{B}\)) prediction steps, which when combined becomes (\(R_S \cup R_B \)), as equation 16.

$$\begin{aligned} Surgery_{ith}= \alpha ^{DR}+ \beta ^{P} Black_i+\beta ^{P} (R_S \cup R_B) +\delta _{th}^{P} +\epsilon _{ith}^{P} \end{aligned}$$
(A3)

The inclusion of predictors of both race and invasive treatment in the linear regression model is an application of the post-double-selection method. The method allows for valid inference from my standard model of interest by controlling for additional potential confounders of race (Belloni et al. 2013). Table 9 presents variables chosen for the prediction of Black race.

Appendix 3: Important predictors of black race

Table 9 presents variables chosen for the prediction of Black race. Black patients are more likely to be diagnosed with Sarcoidosis, cocaine abusers, and, Sickle-cell. Some of these variables may be mechanisms of discrimination rather than surgery confounders. For example, are Black patients much more likely to be cocaine abusers or simply more likely to be coded as such? Similar concerns about potential bias in the recording of these variables make including the predictors of Black race as controls in the main specification problematic. The lasso model, though, has a high AUC of .98 on an independent holdout sample selecting 98 variables.

Table 9 Lasso selected predictors of black indicator

Appendix 4: Boosted trees

Boosted versions of tree-based models build many trees with a small number of selected variables in a stage-wise fashion and aggregate the many models into a final “boosted” model that performs more accurate predictions than any of its parts (Friedman 2001). Boosted trees minimize a prediction error function called a loss function. For this paper, I implement boosted trees using the XGBoost gradient boosted tree algorithm from the XGBoost package in R (Chen and Guestrin 2016). XGBoost has a few main benefits for dealing with clinical data. For missing clinical data, XGBoost will automatically learn the best surgery/non-surgery split based on the outcomes of this loss function. In other words, XGBoost finds the best imputed response for missing values based on reduced predictive accuracy loss during the model building phase. Therefore, if there is a prediction signal in the distribution of missing data, it is learned and fit by the model. The algorithm also averages the predictive gain from each variable in each iteration to give an overall predictive importance of each variable in the final model. These feature rankings can be used to select features but does not explicitly exclude variables like lasso (Ruppert 2004). Gradient Boosted Feature Selection changes the penalty function to explicitly drop unimportant variables, like lasso does, to more explicitly meet the sparsity assumption required by the post-double-selection method (Belloni et al. 2013; Xu et al. 2014). However, variable selection through an importance threshold leads to sparsity and very similar chosen variables in this case.

Boosted tree variable selection gives consistent results in terms of variables selected when compared to the main method of lasso. All selection algorithms, lasso and the tree based methods, choose the many of the same important predictors with the tree based models again choosing a subset of the logistic lasso model selected variables, showing the consistency of machine learning variable selection. The order of the variable importance changes, but 95 of the top 100 variables remain the same.

Appendix 5: Disparity variation through time

Table 10 presents the main hospital fixed effects specification with lasso chosen adjusters and either a linear time variable interacted with the Black race indicator variable (column 1) or the Black race indicator interacted with a indicator for each year of the data (column 2). For column 2, the base year is 2003. The results suggest that the disparity decreases through time since the positive coefficient on Black\(*\)Linear moves the disparity estimate closer to zero. If the analysis is broken down to each year interacted with the Black race indicator, the findings in column 2 indicate the only individually significantly different year is 2010. This finding suggests that, in 2010, the disparity was significantly decreased as the positive coefficient moves the disparity estimate towards zero.

Table 10 AMI surgery with time interactions

Appendix 6: All chosen predictors of surgery

figure a
figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammarlund, N. Racial treatment disparities after machine learning surgical risk-adjustment. Health Serv Outcomes Res Method 21, 248–286 (2021). https://doi.org/10.1007/s10742-020-00231-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-020-00231-7

Keywords

Navigation