Abstract
Rapidly rising healthcare costs represent one of the major issues plaguing the healthcare system. Data from the Arizona Health Care Cost Containment System, Arizona’s Medicaid program provide a unique opportunity to exploit state-of-the-art machine learning and data mining algorithms to analyze data and provide actionable findings that can aid cost containment. Our work addresses specific challenges in this real-life healthcare application with respect to data imbalance in the process of building predictive risk models for forecasting high-cost patients. We survey the literature and propose novel data mining approaches customized for this compelling application with specific focus on non-random sampling. Our empirical study indicates that the proposed approach is highly effective and can benefit further research on cost containment in the healthcare industry.
Keywords
- Predictive risk modeling
- health care expenditures
- Medicaid
- future high-cost patients
- data mining
- non-random sampling
- risk adjustment
- skewed data
- imbalanced data classification
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bodenheimer, T.: High and Rising Health Care Costs. Part 1: Seeking an Explanation. Ann. Intern. Med. 142, 847–854 (2005)
Berk, M.L., Monheit, A.C.: The Concentration of Health Care Expenditures, Revisited. Health Affairs 20(2), 9–18 (2001)
Scheffer, J.: Data Mining in the Survey Setting: Why do Children go off the Rails? Res. Lett. Inf. Math. Sci. 3, 161–189 (2002)
Zhang, D., Zhou, L.: Discovering Golden Nuggets: Data Mining in Financial Application. IEEE Trans. Sys. Man Cybernet 34(4), 513–522 (2004)
Anderson, R.T., Balkrishnan, R., Camacho, F.: Risk Classification of Medicare HMO Enrollee Cost Levels using a Decision-Tree Approach. Am. J. Managed Care 10(2), 89–98 (2004)
Cios, K.J., Moore, G.W.: Uniqueness of Medical Data Mining. Artificial Intelligence in Medicine 26(1-2), 1–24 (2002)
Li, J., Fu, A.W., He, H., Chen, J., Jin, H., McAullay, D., et al.: Mining Risk Patterns in Medical Data. In: Proc 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD 2005), pp. 770–775 (2005)
Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proc. 1st Int’l Workshop on Utility-based data mining (UBDM 2005), pp. 69–77 (2005)
Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study (Dept. Computer Science, Rutgers University, tech. report ML-TR-44 (2001)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Drummond, C., Holte, R.C.: C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. In: ICML Workshop Learning From Imbalanced Datasets II (2003)
Maloof, M.: Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown. In: ICML Workshop Learning From Imbalanced Datasets II (2003)
Estabrooks, A., Jo, T., Japkowicz, N.: A Multiple Resampling Method For Learning From Imbalanced Data Sets. Computational Intelligence 20(1), 18–36 (2004)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Moturu, S.T., Johnson, W.G., Liu, H.: Predicting Future High-Cost Patients: A Real-World Risk Modeling Application. In: Proc. IEEE International Conference on Bioinformatics and Biomedicine (2007)
Diehr, P., Yanez, D., Ash, A., Hornbrook, M., Lin, D.Y.: Methods For Analysing Health Care Utilization and Costs. Ann. Rev. Public Health 20, 125–144 (1999)
Meenan, R.T., Goodman, M.J., Fishman, P.A., Hornbrook, M.C., O’Keeffe-Rosetti, M.C., Bachman, D.J.: Using Risk-Adjustment Models to Identify High-Cost Risks. Med. Care 41(11), 1301–1312 (2003)
Fleishman, J.A., Cohen, J.W., Manning, W.G., Kosinski, M.: Using the SF-12 Health Status Measure to Improve Predictions of Medical Expenditures. Med. Care 44(5S), I-54-I-66 (2006)
Perkins, A.J., Kroenke, K., Unutzer, J., Katon, W., Williams Jr., J.W., Hope, C., et al.: Common comorbidity scales were similar in their ability to predict health care costs and mortality. J. Clin. Epidemiology 57, 1040–1048 (2004)
Farley, J.F., Harrdley, C.R., Devine, J.W.: A Comparison of Comorbidity Measurements to Predict Health care Expenditures. Am. J. Manag. Care 12, 110–117 (2006)
Zhao, Y., Ash, A.S., Ellis, R.P., Ayanian, J.Z., Pope, G.C., Bowen, B., et al.: Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims. Med. Care 43(1), 34–43 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moturu, S.T., Liu, H., Johnson, W.G. (2008). Understanding the Effects of Sampling on Healthcare Risk Modeling for the Prediction of Future High-Cost Patients. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2008. Communications in Computer and Information Science, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92219-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-92219-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92218-6
Online ISBN: 978-3-540-92219-3
eBook Packages: Computer ScienceComputer Science (R0)