Skip to main content

Linear and Ensembling Regression Based Health Cost Insurance Prediction Using Machine Learning

  • Conference paper
  • First Online:
Smart Computing Techniques and Applications

Abstract

The health insurance is an important big eye-openers during the emergency need during accidents and disease pandemic situations. Many of the people will lag to hit financially and to bear the operational expenses during treatment. the need for health insurance changes from youth to old age depending on your lifestyle and genetics. Due to the change in lifestyle and diseases, the health insurance is much needed for each individual. Since it is uncertain that a medical emergency can attack anyone, anytime that impact the person so emotionally and financially. With all this background, this paper attempts to predict the Health cost insurance based on the accessible parameters like age, sex, region, Smoking, Body Mass Index, Children with the following contributions. Firstly, the Health Cost Insurance dataset is extracted from UCI machine repository and the data is preprocessed along with exploratory data analysis. Secondly, the anova test is applied to verify the features with Probability of F-Statistic PR(>F) < 0.05 that highly influence the Target. Thirdly, the raw dataset and the feature scaled dataset is applied to all the Linear Regression models and the performance is analyzed. Fourth, the raw dataset and the feature scaled dataset is applied to all the Ensembling Regression models and the performance is analyzed through intercept, MAE, MSE, R2Score, and EVS. Anova Test Reults shows that the variable ‘region’ does not influence the target as the F-statistic value is 0.14. Experimental results show that polynomial regression is achieving 88% of R2Score before and after feature scaling. The Random Forest regression is achieving 86% of R2Score before and after feature scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, C., Delcher, C., Shenkman, E., et al.: Machine learning approaches for predicting high cost high need patient expenditures in health care. Bio. Med. EngOnLine 17, 131 (2018)

    Google Scholar 

  2. Maisog, J., Li, W., Xu, Y., Hurley, B., Shah, H., Lemberg, R., Borden, T., Bandeian, S., Schline, M., Cross, R., Spiro, A., Michael, R., Gutfraind, A.: Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach (2019)

    Google Scholar 

  3. Sethi, P., Jain, M.A.: Comparative feature selection approach for the prediction of healthcare coverage. Commun. Comput. Inf. Sci. 54, 392–403 (2010)

    Google Scholar 

  4. Panay, B., Baloian, N., Pino, J., Peñafiel, S., Sanson, H., Bersano-Méndez, N.: Feature selection for health care costs prediction using weighted evidential regression. Sensors 20 (2020)

    Google Scholar 

  5. Luo, L., Li, J., Lian, S.: Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China. Health Inf. J. 26(3), 1577–1598 (2020)

    Article  Google Scholar 

  6. Xie, Y., Schreier, G., Chang, D., Neubauer, S., Liu, Y., Lovell, N.: Predicting days in hospital using health ınsurance claims. IEEE J. Biomed. Health Inf. (2015)

    Google Scholar 

  7. Park, J.H., Cho, H.E., Kim, J.H.: Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data. npj Digit. Med. 3, 46 (2020)

    Google Scholar 

  8. Dhieb, N., Ghazzai, H., Besbes, H., Massoud, Y.: A secure AI-driven architecture for automated insurance systems: fraud detection and risk measurement. IEEE Access 8, 58546–58558 (2020)

    Article  Google Scholar 

  9. Blough, D.K., Ramsey, S.D.: Using generalized linear models to assess medical care costs. Health Serv. Outcomes Res. Method. 1, 185–202 (2000)

    Article  Google Scholar 

  10. Lysaght, T., Lim, H.Y., Xafis, V., et al.: AI-Assisted decision-making in healthcare. ABR 11, 299–314 (2019)

    Google Scholar 

  11. Boodhun, N., Jayabalan, M.: Risk prediction in life insurance industry using supervised learning algorithms. Complex Intell. Syst. 4, 145–154 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Shyamala Devi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shyamala Devi, M. et al. (2021). Linear and Ensembling Regression Based Health Cost Insurance Prediction Using Machine Learning. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi, T. (eds) Smart Computing Techniques and Applications. Smart Innovation, Systems and Technologies, vol 224. Springer, Singapore. https://doi.org/10.1007/978-981-16-1502-3_49

Download citation

Publish with us

Policies and ethics