Prediction of Claim Probability with Excess Zeros

Acar, Aslıhan Şentürk

doi:10.1007/978-3-030-85254-2_32

Aslıhan Şentürk Acar²

Part of the book series: Contributions to Economics ((CE))

1087 Accesses

Abstract

Non-life insurance pricing is based on two components: claim severity and claim frequency. These components are used to estimate expected pure premium for the next policy period. Generalized linear models (GLM) are widely preferred for the estimation of claim frequency and claim severity due to the ease of interpretation and implementation. Since GLMs have some restrictions such as exponential family distribution assumption, more flexible Machine Learning (ML) methods are applied to insurance data in recent years. ML methods use learning algorithms to establish relationship between the response and the predictor variables as an intersection of computer science and statistics. Because of some insurance policy modifications such as deductible and no claim discount system, excess zeros are usually observed in claim frequency data. In the presence of excess zeros, prediction of claim probability can be a good alternative to the prediction of claim numbers since positive numbers are rarely observed in the portfolio. Excess zeros create imbalance problem in the data. When the data is highly imbalanced, predictions will be biased toward majority class due to the priors and predicted probabilities may be uncalibrated. In this study, we are interested in claim occurrence probability in the presence of excess zeros. A Turkish motor insurance dataset that is highly imbalanced is used for the case study. Ensemble methods that are popular ML approaches are used for the probability prediction as an alternative to logistic regression. Calibration methods are applied to predicted probabilities and results are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frees EW, Derrig RA, Meyers G (2014) Predictive modeling applications in actuarial science. Cambridge University Press, p 565
Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modelling, vol 26. Springer
Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
Article Google Scholar
Guo X, Yin Y, Dong C, Zhou G (2008) On the class imbalance problem. IEEE Conf Publ 4:192–201
Google Scholar
Yip KCH, Yau KKW (2005) On modeling claim frequency data in general insurance with extra zeros. Insur Math Econ 36(2):153–163
Article Google Scholar
Boucher JP, Denuit M, Guillén M (2007) Risk classification for claim counts: a comparative analysis of various zeroinflated mixed poisson and hurdle models. North Am Actuar J 11(4):110–131
Article Google Scholar
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 11. Springer, Berlin
Book Google Scholar
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74
Google Scholar
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the Eighteenth International Conference on Machine learning [Internet]. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 609–616. (ICML ’01). Available from: http://dl.acm.org/citation.cfm?id=645530.655658
Niculescu-Mizil A, Caruana RA (2012) Obtaining calibrated probabilities from boosting. Jul 4 [cited 2021 May 29]; Available from: https://arxiv.org/abs/1207.1403v1
Pozzolo AD (2010) Comparison of data mining techniques for insurance claim prediction [Master of Science]. University of Bologna
Google Scholar
Frempong NK, Nicholas N, Boateng MA (2017) Decision tree as a predictive modeling tool for auto insurance claims. Int J Stat Appl 7(2):117–120
Google Scholar
Tim P (2017) A framework to forecast insurance claims [Master of Econometrics and Management Science]. Erasmus University Rotterdam
Google Scholar
Glenn W (1950) Brier, verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Austin PC, Tu JV, Ho JE, Levy D, Lee DS (2013) Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. J Clin Epidemiol 66(4):398–407
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning [Internet], vol 6. Springer. Available from: https://doi.org/10.1007/978-1-4614-7138-7.pdf
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In 1996. pp 148–56
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article Google Scholar
Marvin NW, Wager S, Probst P (2018) “ranger” package
Google Scholar
Birattari M, Stützle T, Paquete L, Varrentrapp K (2002) A racing algorithm for configuring metaheuristics. In: Proceedings of the 4th Annual Conference on Genetic and evolutionary computation [Internet]. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 11–18. (GECCO’02)
Google Scholar
Pozzolo AD, Caelen O, Bontempi G (2015) Package “unbalanced.”
Google Scholar
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In 2002 [cited 2021 Jun 4]. Available from: https://doi.org/10.1145/775047.775151

Download references

Author information

Authors and Affiliations

Department of Actuarial Sciences, Hacettepe University, Ankara, Turkey
Aslıhan Şentürk Acar

Authors

Aslıhan Şentürk Acar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aslıhan Şentürk Acar .

Editor information

Editors and Affiliations

Faculty of Economics and Admin. Sciences, Trakya University, Edirne, Turkey
M. Kenan Terzioğlu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Acar, A. (2022). Prediction of Claim Probability with Excess Zeros. In: Terzioğlu, M.K. (eds) Advances in Econometrics, Operational Research, Data Science and Actuarial Studies. Contributions to Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-85254-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-85254-2_32
Published: 17 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85253-5
Online ISBN: 978-3-030-85254-2
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics