A Hierarchical Ensemble of α-Trees for Predicting Expensive Hospital Visits

  • Yubin Park
  • Joydeep Ghosh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8609)


Hospital charges are determined by numerous factors. Even the cost for the same procedure can vary greatly depending on a patient’s conditions, complications, and types of facilities. With the advent of Obamacare, estimating hospital charges has become an increasingly important problem in healthcare informatics. We propose a hierarchical ensemble of α-Trees to delicately deal with this challenging problem. In the proposed approach, multiple α-Trees are built to capture the different aspects of hospital charges, and then these multiple classifiers are uniquely combined for each hospital. Hospitals are characterized by unique weight vectors that explain the subtle differences in hospital specialties and patient groups. Experimental results based on the 2006 Texas inpatient discharge data show that our approach effectively captures the variability of hospital charges across different hospitals, and also provides a useful characterization of different hospitals in the process.


decision tree α-divergence ensemble classifiers healthcare 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amari, S.: Integration of stochastic models by minimizing α-divergence. Neural Computation (2007)Google Scholar
  2. 2.
  3. 3.
    Breiman, L.: Classification and regression trees. Wadsworth International Group (1984)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Journal of Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Centers for Medicare and Medicaid Services: Medicare provider charge data. online (January 2014),
  6. 6.
    Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics (1952)Google Scholar
  7. 7.
    Cichocki, A., Ishi Amari, S.: Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities. Entropy (2010)Google Scholar
  8. 8.
    Cipriano, L.E., Steinberg, M.L., Gazelle, G.S., Gonzalez, R.G.: Comparing and predicting the costs and outcomes of patients with major and minor stroke using the boston acute stroke imaging scale neuroimaging classification system. American Journal of Neuroradiology 30, 703–709 (2009)CrossRefGoogle Scholar
  9. 9.
    Csiszar, I.: Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica (1967)Google Scholar
  10. 10.
    Dietterich, T., Kearns, M., Mansour, Y.: Applying the weak learning framework to understand and improve c4.5. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96–104 (1996)Google Scholar
  11. 11.
    Diringer, M.N., Edwards, D.F., Mattson, T., Akins, P.T., Sheedy, C.W., Hsu, C.Y., Dromerick, A.W.: Predictors of acute hospital costs for treatment of ischemic stroke in an academic center. Stroke 30, 724–728 (1999)CrossRefGoogle Scholar
  12. 12.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)Google Scholar
  13. 13.
  14. 14.
    Gelman, A., Hill, J.: Data Analysis using Regression and Multilevel/Hierarchical Models. Cambridge University Press (2007)Google Scholar
  15. 15.
    Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley (2010)Google Scholar
  16. 16.
    Jost, L.: Entropy and diversity. Oikos (2006)Google Scholar
  17. 17.
    Meier, B., McGinty, J.C., Creswell, J.: Hospital billing varies wildly, government data shows (May 2013),
  18. 18.
    Park, Y., Ghosh, J.: Compact ensemble trees for imbalanced data. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 86–95. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  19. 19.
    Park, Y., Ghosh, J.: Ensembles of α-Trees for Imbalanced Classification Problems. IEEE Transactions on Knowledge and Data Engineering 26(1), 131–143 (2014)CrossRefGoogle Scholar
  20. 20.
    Quinlan, J.R.: C4.5: prgrams for machine learning. Morgan kaufmann (1993)Google Scholar
  21. 21.
    Renyi, A.: On measures of information and entropy. In: Proceedings of the Fourth Berkeley Symposium on Mathematics (1961)Google Scholar
  22. 22.
    Tsallis, C.: Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics (1988)Google Scholar
  23. 23.
    Wang, J., Li, M., Hu, Y., Zhu, Y.: Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models. BMC Health Services Research 9 (2009)Google Scholar
  24. 24.
    Zhu, H., Rohwer, R.: Information geometric measurements of generalization. Tech. Rep. 4350, Aston University (1995)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yubin Park
    • 1
  • Joydeep Ghosh
    • 1
  1. 1.The University of Texas at AustinAustinUSA

Personalised recommendations