Skip to main content

Part of the book series: Contributions to Economics ((CE))

  • 1087 Accesses

Abstract

Non-life insurance pricing is based on two components: claim severity and claim frequency. These components are used to estimate expected pure premium for the next policy period. Generalized linear models (GLM) are widely preferred for the estimation of claim frequency and claim severity due to the ease of interpretation and implementation. Since GLMs have some restrictions such as exponential family distribution assumption, more flexible Machine Learning (ML) methods are applied to insurance data in recent years. ML methods use learning algorithms to establish relationship between the response and the predictor variables as an intersection of computer science and statistics. Because of some insurance policy modifications such as deductible and no claim discount system, excess zeros are usually observed in claim frequency data. In the presence of excess zeros, prediction of claim probability can be a good alternative to the prediction of claim numbers since positive numbers are rarely observed in the portfolio. Excess zeros create imbalance problem in the data. When the data is highly imbalanced, predictions will be biased toward majority class due to the priors and predicted probabilities may be uncalibrated. In this study, we are interested in claim occurrence probability in the presence of excess zeros. A Turkish motor insurance dataset that is highly imbalanced is used for the case study. Ensemble methods that are popular ML approaches are used for the probability prediction as an alternative to logistic regression. Calibration methods are applied to predicted probabilities and results are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Frees EW, Derrig RA, Meyers G (2014) Predictive modeling applications in actuarial science. Cambridge University Press, p 565

    Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modelling, vol 26. Springer

    Google Scholar 

  • He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    Article  Google Scholar 

  • Guo X, Yin Y, Dong C, Zhou G (2008) On the class imbalance problem. IEEE Conf Publ 4:192–201

    Google Scholar 

  • Yip KCH, Yau KKW (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163

    Article  Google Scholar 

  • Boucher JP, Denuit M, Guillén M (2007) Risk classification for claim counts: a comparative analysis of various zeroinflated mixed poisson and hurdle models. North Am Actuar J 11(4):110–131

    Article  Google Scholar 

  • Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 11. Springer, Berlin

    Book  Google Scholar 

  • Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74

    Google Scholar 

  • Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine learning [Internet]. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 609–616. (ICML ’01). Available from: http://dl.acm.org/citation.cfm?id=645530.655658

  • Niculescu-Mizil A, Caruana RA (2012) Obtaining calibrated probabilities from boosting. Jul 4 [cited 2021 May 29]; Available from: https://arxiv.org/abs/1207.1403v1

  • Pozzolo AD (2010) Comparison of data mining techniques for insurance claim prediction [Master of Science]. University of Bologna

    Google Scholar 

  • Frempong NK, Nicholas N, Boateng MA (2017) Decision tree as a predictive modeling tool for auto insurance claims. Int J Stat Appl 7(2):117–120

    Google Scholar 

  • Tim P (2017) A framework to forecast insurance claims [Master of Econometrics and Management Science]. Erasmus University Rotterdam

    Google Scholar 

  • Glenn W (1950) Brier, verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  • Austin PC, Tu JV, Ho JE, Levy D, Lee DS (2013) Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol 66(4):398–407

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning [Internet], vol 6. Springer. Available from: https://doi.org/10.1007/978-1-4614-7138-7.pdf

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In 1996. pp 148–56

    Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  Google Scholar 

  • Marvin NW, Wager S, Probst P (2018) “ranger” package

    Google Scholar 

  • Birattari M, Stützle T, Paquete L, Varrentrapp K (2002) A racing algorithm for configuring metaheuristics. In: Proceedings of the 4th Annual Conference on Genetic and evolutionary computation [Internet]. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 11–18. (GECCO’02)

    Google Scholar 

  • Pozzolo AD, Caelen O, Bontempi G (2015) Package “unbalanced.”

    Google Scholar 

  • Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In 2002 [cited 2021 Jun 4]. Available from: https://doi.org/10.1145/775047.775151

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aslıhan Şentürk Acar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Acar, A. (2022). Prediction of Claim Probability with Excess Zeros. In: Terzioğlu, M.K. (eds) Advances in Econometrics, Operational Research, Data Science and Actuarial Studies. Contributions to Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-85254-2_32

Download citation

Publish with us

Policies and ethics