On the cost of risk misspecification in insurance pricing

. In the non-life insurance industry, pricing is often done relative to individual criteria of policyholders. Various classification algorithms are in use to categorize policyholders into risk classes defined by the insurer, but classification errors may result from this process. In the light of recent automatic classification practices, it becomes important to assess the risks caused by such errors. In this paper we examine the impact of risk class misspecifica-tions for a simple situation with two risk types. We provide a mean-variance framework for quantitatively studying the insurer’s optimization problem of specifying premiums and we analyze the tradeoff of costs and benefits when classification error probabilities are known.


Introduction
Risk classification is a classical tool in actuarial practice.Indeed, the distribution of individual risks will often differ substantially, depending on personal characteristics, different exposure, environmental conditions etc.If an insurer applies the same premium across all such categories, this may lead to adverse selection, in the sense that individuals who face a lower premium than appropriate for their true risk will massively enter the contract, raising the price and squeezing out rational individuals with lower risk (see e.g.[4] for a general discussion).Classification is commonly used in insurance during the underwriting process and the tarification, cf.[32,42] for health and life insurance and [43] for property lines.Different types of variables, such as quantitative or categorical can be used in various algorithms for actuarial classification purposes [38].For an overview of classification methods we refer to [21].In the literature, numerous authors discuss risk classification in view of adverse selection and efficiency in the Pareto sense, see [13] for a survey on this topic.In [11,12,22] authors study the efficiency of imperfect categorization in a utility setup.In [14,34], authors consider the effect of bans on classification on the market efficiency.[39] deals with the customers' perspective on adverse selection and efficiency of risk classification.The contributions [11,34] consider also costs of categorization.Indeed, risk classification may be costly both computationally (in terms of resource-consuming algorithms in the presence of huge datasets) and in monetary terms (e.g.involving data acquisition from official statistics or from competitors).
The insurer hence needs to select appropriate criteria for rating variables, which may take different forms and are not always motivated by actuarial drivers.For instance, [17] distinguishes actuarial, operational, social and legal criteria.In some cases, classification may be inaccurate when the main criterion is unobservable or there are legal constraints to use it, e.g. for social or political reasons.For instance, gender-based discrimination is nowadays forbidden in the European Union (see e.g.[36]), even if it is widely considered a relevant characteristic from the statistical point of view.In [37], authors discuss the perceptions of gender as pricing criteria by the customer and explain the complex interplay between anti-discrimination laws and actuarial principles within the insurance industry, see [27] for suggestions to deal with this issue.We refer to the recent book [9] for a rich source of information and ideas on this topic.
Although criteria to distinguish risk profiles will exist, they may often be unknown to the insurer due to the asymmetry of information, see e.g.[1,3].At the same time, recent years have seen a dramatic increase in both the amount of risk information available and the ability to analyse it statistically.In many situations, risks can be analysed on an almost individual basis, but in any case as elements of much smaller rating pools, i.e. pools of risks that share characteristics such that they can be assumed to have the same loss distribution.Offered insurance cover may also take into account the current risk situation, i.e. environmental variables (e.g.time, place) or even behavioural variables.The small size of rating pools leads to larger estimation errors in estimating the expected cost of insurance cover.In the presence of competing insurance providers and given the transparency of the prices of their insurance offers, customers may tend to choose the cheapest offer, i.e. an offer that is too low compared to the true but unknown production costs.From the perspective of the insurance company, this phenomenon is known as the "winner's curse".In order to avoid the negative economic consequences of the winner's curse, insurance companies like to apply tailor-made surcharges to the offered premium.It is likely that these surcharges, in addition to the higher cost of more granular risk assessment, will lead to higher overall costs for the entire insured portfolio of risks.Hence, the total welfare of the community is reduced.In addition, the overall coverage ratio may be reduced as some risk owners may find the increased cost of insurance too high.An overly granular approach may therefore be counterproductive and there may be an optimal level of rating granularity, together with legal boundaries, see for instance [17], [2] and [40].Even when the resulting loss distributions are assumed to be known, each classification method will inherently contain classification errors, which may lead to unprofitable decision making.Note that error rates tend to be higher when one class is less represented in the population, cf.[24,44].For a classical reference to empirical evidence on concrete values for error probabilities arising from frequently used statistical methods, see e.g.[6].One can also apply empirical methods for the estimation of the misclassification probabilities.For instance, an empirical estimate arises from historical records, when after the loss occurrences, one obtains more information on the risk type of the policyholders.The insured individual can be reclassified using, for example, maximum likelihood techniques, and thus proportions of misclassified individuals can be identified.
In this paper, we would like to quantitatively study the effects of such classification errors in the context of a simple model with only two possible risk classes.In that case one faces two types of errors: assigning a risk of the first class as one of the second and vice versa (a false positive and false negative in statistical terms, or also sensitivity vs. specificity, see e.g.[24,42]).We compare three scenarios, one where the insurer does not classify the heterogeneous risks and two others where the risks are differentiated, but once with perfect knowledge and once with some probability of misspecification of policyholder types.While the introduction of the classification mechanism allows to price the insurance risks more efficiently, it entails certain fixed costs, and the potential misspecification can impact the insurer's profit further.In [8], the authors study a similar framework with incomplete information from the insurer's side, whereas the customer knows their true risk type.During the underwriting process, the customer may be required to take a test and the result will reveal for what type of coverage they are eligible.In contrast to our present work, the authors consider a plurality of insurers and customers choosing their coverages, while the present paper deals with a one-insurer setting and customers being provided with one single offer.In a similar spirit to [18], we develop a simple framework to assess the respective trade-off between costs of classification and profits.We consider linear and sigmoid-type demand functions of the premium for the probability that a customer accepts an offered policy (rather than deterministic demand functions as in [18]).Our main optimization target is the expected profit for the insurer (see e.g.[25]), and we consider several risk measures to assess the risk part in the analysis.
The remainder of the paper is structured as follows: In Section 2, we start with an insurer's expected profit approach and a piece-wise linear demand setting.We then consider different scenarios for this setup in Section 3. In Section 4, we then include the variance in our considerations, and establish mean-variance frontiers for the profit.We illustrate the results and its main drivers in Section 5.In order to assess the sensitivity of the results, Section 6 then develops a number of extensions, namely a sigmoid-type rather than piece-wise linear demand function, a lower semivariance and a value-at-risk concept for replacing the variance in the risk assessment of the strategies, as well as a utility function approach to unite the consideration of profitability and variability in one function, together with numerical illustrations for each of these cases.Finally, Section 7 concludes.

Model Setting
Assume that there is only one insurer present in the market, so there is no competition.Let us further assume a population of n individuals, who are all willing to contract insurance, and independently from each other choose whether they enter the insurance contract for a given premium or not.The individuals fall into two types: low risk type with loss random variable L and high risk type with loss random variable H, with underlying cumulative loss distribution functions F L (x) and F H (x), x ≥ 0 respectively (both L and H will typically have atoms at 0, signifying the case of no claim in the considered time period).Define the respective means and variances by which are all assumed to be finite.All risks are assumed to be independent and identically distributed within each type.Let {p L , p H ≥ 0} be the actual proportion of the low-and high-risk type among the policyholders (p L + p H = 1).
Define an acceptance function f i which for any proposed premium P gives the probability for an individual to enter the contract; the form of this function differs for each risk type i.Each individual is assumed to take the decision about entering independent of all the others.If m individuals are offered a premium P and all use the same acceptance function f i , we then expect m • f i (P ) individuals to enter the contract.Let us first assume that f i is piece-wise linear where P max i is the so-called reservation price, cf. Figure 1.Clearly, f i is nonincreasing in P .We set the condition µ i < P max i to ensure the possibility of positive expected profits for the company.Our approach represents a stochastic setup for the acceptance of an offered contract, in contrast to [18] who work with (3) as a deterministic demand function for a given price.While for expected profits as dealt with in Section 3 this difference does not matter, it will be important for the risk considerations in the subsequent sections.Despite the non-differentiability of the piece-wise linear f i at P max i , this form will lead to simple local solutions.An extension of the results to a more complex, but analytically better tractable sigmoid form will be considered in the Appendix.
Remark 2.1.In the classical work of Rothschild and Stiglitz [35] as well as further models based on it, in a comparable setting of identifying contracts to sell, the authors find optimal solutions in terms of premiums and levels of coverage.For the present purpose and simplicity we, however, prefer to use the concept of the acceptance function, with individuals facing a binary choice of entering the contract or not.That is, we do not allow a partial coverage or deductibles here.
In the following sections, we introduce three different scenarios that the insurer may face.We start with the case where the insurer can observe the risk type of each individual, which we refer to as the full information case.We then consider the situation where the insurer can not distinguish between the two types ex-ante at all.In that case the only possible method of pricing is to not differentiate individuals, and the profitability of the insurance business will then depend on the empirical fraction of each risk type in the population.Finally, the possibility to observe and measure a certain characteristic, which can be discrete or continuous, allows us to classify an observation, but with a certain error probability.This probability depends on the true class of the observation.The introduction of the classification mechanism has certain costs, but allows to better price according to the true risk class.We are interested to quantitatively assess the respective trade-off in this simple model setup.

Expected profit in three scenarios
In this section, we focus on the expected profit only.Let us introduce three different scenarios that the insurer may face, starting with the case where the insurer can actually observe the risk type of each individual.

Full information
The benchmark for our analysis is the situation with no asymmetry of information.
Here the insurer can observe the risk type of each individual and therefore price according to the true type.Recall that we know the actual proportion {p L , p H } of the population in each class.Therefore, we can maximize profit by differentiating between groups.If, for a given individual, the price is higher than its true risk premium, his/her willingness to accept the contract decreases.Let us denote by X (L) j the j th loss random variable of the low risk type (independent copies of L) and by X (H) j the j th high-risk loss variable (independent copies of H).Then, the (random) profit is given by where I L j and I H j are independent Bernoulli random variables with probabilities f L (P L ) and f H (P H ), respectively.In this case, an adaptation of [18] establishes that the optimal premiums are independent of n, p L and p H , and they are simply the average of the mean claim size and the maximum premium that the policyholders are willing to pay.Theorem 3.1.In the full information case, the expected profit of the insurer is maximized by the premium choice and for the two risk classes.
Proof.The optimal premium choice is the solution of the following optimization problem: We notice that E (Π) is a continuous function of {x, y} and that for {x Also, lim x→+∞,y→+∞ E (Π) = 0, which means that E (Π) admits a strictly positive maximum for some {x > µ L , y > µ H } (and the optimization in x and y can in fact be separated).We can characterize this point by the following equations: where the != operator denotes a necessary condition.From (8) we have ).
Since the optimal solution P L respects the condition P L ≤ P max L , as µ L < P max L , it is not necessary to distinguish cases of the piece-wise function.To prove it is indeed a local and global maximum, we can easily prove that the second derivative is negative for a linear f : The same reasoning holds for the first and second derivative w.r.t.y.Remark 3.2.It is easy to check that if then P L < µ H , in which case charging a low-risk type premium to a high-risk type customer results in an expected loss on the individual level.□ In terms of sensitivities, we simply see from ( 5) and (6) that for both risk types i ∈ {L, H}.That is, the reactivity of the optimal premium is constant for variation in the mean loss.Consequently, in case of increasing losses, the increase in premium will only cover half of the increase in losses, thus decreasing profits by the double effect of smaller margins and smaller acceptance rate.Similarly, shifting the endpoint P max i of the acceptance function (for invariant µ i ) also increases the optimal chargeable premium linearly with slope 1/2.

No differentiation
Next, let us consider the situation where the insurer has no possibility to distinguish between risk types on the individual level, but still has an estimate for the fractions {p L , p H } of the population in each class (e.g. through some historical figures).So we assume these numbers to be known (p L + p H = 1).In this scenario, the insurer proposes an identical premium P to every individual.This has the advantage that one saves the cost of identification of risk types, and provides another benchmark for the sequel.The profit in this case is where I L j and I H j are Bernoulli random variables with probabilities f L (P ) and f H (P ) respectively, and the optimal premium then amounts to Comparing the resulting optimization problem with (7), we see from P L ̸ = P H (which itself is due to µ H > µ L ) there that the optimal solution P to (12) will now yield a smaller profit (this is intuitive, since we have less information available than in the setup of Section 3.1).
Theorem 3.3.In the no differentiation case, the expected profit of the insurer is maximized by the premium choice where P L and P H are the optimal premiums of the full information case given in ( 5) and ( 6) and a * = Proof.Problem ( 12) can be solved using the first order condition and finally (13).
The second order condition yields a strictly negative result, thus confirming the global maximum.
Expression (13) shows that P is the average of the optimal premiums under full information, weighted by the proportions in the population and the maximum affordable premiums.Under the assumption (10), this also establishes Remark 3.4.One should be careful to check whether P > P max L : in that case, L type customers do not enter the contract.This happens if a * < P H −P max L P H −P L .Consequently, the optimal premium is that for higher risk types only, meaning P = P H .If the expected profit for P = P H is greater than the one found above, then the optimal premium will be P H and only H types will enter the contract.□ The change of the expected profit when compared to the case of full information can now also be expressed as gain on more H entering the contract In particular, for low risk types the proposed premium P is higher than their appropriate optimal premium P L under full information.Thus, with the decreasing shape of the acceptance function, on average the insurer loses low-risk type customers and the associated expected profit (negative first term in ( 14)).At the same time, those who remain bring higher profits (the second term in ( 14)).Correspondingly, due to the cheaper than appropriate premium P H , more high-risk type customers join (positive third term in ( 14)), but they pay less premium now (negative fourth term).

Differentiation in two classes
Assume now that the insurer does not know the individuals' risk type, but has access to a mechanism that can assign (classify) the risk types correctly with a certain probability.Assume that the probability of misclassification is the same for each policyholder of the same type and given by p H|L := P (i is classified as Remark 3.5.If p H|L = p L|H = 0, we get back to the full information setting, as there is no classification error.If p H|L = p L|H = 1, all the true H end up in the L group and all the true L are classified in the H group (which would also result in knowing the true type of each one, but having to switch the categories).In the cases when p H|L = 1 and p L|H = 0 or p H|L = 0 and p L|H = 1, all individuals are classified in the same group.Typically, there is a tradeoff between the two error types: in an attempt to classify one risk more accurately, the precision on the other one will go down.For instance, in order to minimize p L|H , we could simply attribute all observations to group H, which indeed gives p L|H = 0, but p H|L would increase drastically as all L observations are then erroneously identified as H.
The cost c(n) of applying the classification algorithm will increase with population size n (the computational cost of different algorithms is increasing in the sample size (take for instance the simplest Bayesian classifier [45] with linear complexity), the human time invested in analysing data and making decisions increases, and more powerful machines may be needed to run the algorithms, just to name a few reasons).At the same time, the marginal cost is likely to decrease in n (fixed costs in the process can be divided onto more policyholders, the insurer gains experience and recognizes patterns etc.).Hence, we define A mathematically simple candidate for such a function is where γ offsets for the minimal cost amount and c 0 scales for the intensity of the effect of the population size.
The insurer will propose premiums, P * L and P * H , different from the ones in Section 3.1 under full information, and some customers receive 'wrong' offers, leading to a different customer behaviour with respect to accepting the contract.Figure 2 visualizes the pricing process.An initial population of n customers is subdivided into groups by their true risk type, rather than their identified risk type, and finally the insurer loses some customers because of the entailed acceptance patterns of policies.
In this situation, the profit is given by Contacted population Figure 2: Visualisation of the pricing process.
Theorem 3.6.In the differentiation case, the expected profit of the insurer is maximized by the premium choice and for the two classified risk classes, where P L and P H are the optimal premiums (5) and Proof.We make use of the following first order conditions from (15) to determine the optimal solution: Equation ( 18) yields (20) leading to (16).Formula ( 17) is obtained in a completely analogous way from (19).
The second order conditions are strictly negative and thus confirm the maximum.
Like in the no differentiation case, the optimal premiums can again be expressed simply as a weighted average of the optimal premiums from the full information case, and the weights now involve the error probabilities., then the optimal solutions are given by Equations ( 16) and (17) , which could happen with a high proportion of misclassified H individuals, then the optimal solution would be to offer the contract only to H types by setting P * L = P H and P * H = P H .It is worthwhile to notice that the secondorder mixed partial derivatives ∂ 2 E(Π) ∂v∂w = ∂ 2 E(Π) ∂w∂v = 0, and therefore the optimal price for the low risk types does not depend on the optimal price for the high risk types and vice versa.□ What is of particular interest is the situation where H individuals are wrongly classified as L. Indeed, since P i > µ i , this is the only situation where the insurer makes losses, so it is important to maintain control over this group.The loss (presented here as a negative gain) compared to the benchmark of the situation of full information can be decomposed into True L: Loss on L not entering the contract and gain on those who remain False H: Loss on L not entering the contract and gain on those who remain True H: Gain on extra H entering the contract and loss on them underpriced False L: Gain on extra H entering the contract and loss on them underpriced .
Recall that P was the optimal uniform premium for the case without differentiation.Differentiation of risk types only makes sense, if the resulting premiums P * L , P * H satisfy P * L ≤ P ≤ P * H (cf. Figure 3).From Equations ( 13), ( 16) and ( 17), this Figure 3: Illustration of premiums in different scenarios.
amounts to the condition c * ≤ a * ≤ b * which can easily translated to the following condition on the error probabilities: This will always be fulfilled in practically relevant situations.

A mean-variance analysis
Proposing a unique premium P to both categories of risks attracts a higher relative proportion of H than the differentiating strategies.This heterogeneity in the portfolio composition generates a higher level of risk, which one should consider in the underwriting process.As the expected profit considered in Section 3 does not capture this aspect of the problem, we introduce the variance of the profit as a simple indicator that can be easily implemented in practical settings, as one only needs estimates for the first two moments of the underlying claim distributions for the analysis.
Define by N L , N H the (random) number of insured persons of risk type L and H, respectively, entering the contract.Their first two moments are summarized in Table 1.
Table 1: Expected value and variance of the number of insured for each scenario.

Full information
No differentiation Differentiation For each risk type i, N i = n j=1 I i j , where I i j are independent Bernoulli random variables with probability j and the premium is P i .From (4), we then get With this ingredient, we can now derive the variance of the profit in our three scenarios introduced in the previous section.
• Full information: • No differentiation: • Differentiation: Remark 4.1.Note that in all of the above expressions a term containing the variance due to the randomness of claim sizes is followed by one with the variance due to the randomness of underwriting, i.e. the customer's probability to enter the contract or not.In Section 5, we will illustrate this decomposition with the help of a numerical example.
For both the insurance company itself and the regulator it will be natural to also include a risk constraint into the problem of optimizing profits.Consequently, we introduce a variance constraint in the optimization problem, modifying the problem from Section 3 to max E (Π) s.t.Var (Π) ≤ σ2 .
Varying the value of σ will lead to a mean-variance efficient frontier in the spirit of Markowitz [29].Introduce the Lagrange multipliers for the premium P i in any of the optimization programs ( 7), (11) and (15).The optimal premiums are then obtained by the first order conditions In order to construct the efficient frontier, we maximize the expected profit subject to the constraint of the variance being smaller than a certain level (σ 2 ).We use the Lagrange multiplier method in order to perform the optimization under this constraint.Thus, we obtain one point of the frontier defined by the coordinated µ and σ2 .To obtain more points and draw the frontier, we augment the σ2 and redo the analysis each time.We give here the corresponding equations for the full information case, the other cases follow in an analogous way.Equation (22) translates into The first order conditions are given by This results in a system of three equations for the three unknowns P L , P H and λ which can be solved numerically for every choice of involved parameters.

Numerical illustrations
Let us now consider concrete numerical illustrations of the results of the previous sections.The following parametrization will be used throughout this section unless otherwise stated: For the shape of the cost function we assume c(n) = 20 log n in the plots, but note that any other choice would be feasible as well.

Expected profit
Let f i (P ) have the form (3) with P max L = 4µ L = 4 and P max H = 4µ H = 20.Then we get from the respective formulas of Section 3: • Full information: P L = 2.5, P H = 12.5, E (Π) = 0.788 n.
Note that µ L < P < µ H .In this case, the insurer targets the low-risk L type customers because their proportion in the population is large enough to compensate for the losses on the H types.
• Differentiation in two classes: Note that P * H > P max L .
Applying the classification is hence only an advantage if Conversely, the maximum cost which the insurer will be willing to pay for the classification, given a population of size n, is It is instructive to look into the sensitivity of the results.Let us first explore the variability of the profit under different error probabilities, which can be a helpful decision tool in case of limited investment resources.For each level of error probabilities, we recompute the optimal premiums.Figure 4 features the sensitivity of the expected profit with respect to both error probabilities.The classification cost still needs to be deducted here from the expected profit.If the insurer is given a choice of different classification algorithms or investment possibilities for improvement of precision with known resulting error probabilities, one can verify whether that investment is worthwhile.This figure may help the decision makers to judge whether with given error probabilities, a refinement in the classification may be of added value to the company.

Variance
Numerically, with the parameters defined in (23), we obtain the following results for the three cases of the linear acceptance function: • Full information: Var (Π) = 2.505469n.

• No differentiation:
Var (Π) = 1.79213n.• Differentiation: As the total variance is an increasing function of the number of policyholders, it will naturally be higher under a differentiation strategy, as the insurer gets more market share.But the structure of the variance will be different.Without differentiation, the variance inside the group is much higher than the average of internal group variances from the differentiation case, that difference being larger when the two distributions are further apart.
In Figure 5, we plot the variances in the three scenarios to illustrate their forms as a function of chosen premium.We split the variances according to the part stemming from the variability of claims (in red) and from the one of acceptance of contracts (in green).The humps indicate the region where the increase of variance due to the increasing deviation from the mean is compensated by the decrease in the number of underwritten policies.In Figures 5b and 5c, we can observe two humps, appearing because of the mixture of two risk types.With the help of this decomposition, we can clearly see that the humps in the plots come from the acceptance behaviour.In Figure 5a, we observe that if the premium becomes too high, the total variance decreases as the population does not enter the contract any more.
In Figures 6-8, we show the variances when varying one parameter at a time.In Figure 6, one can observe the variance shapes for a small range acceptance function.In this case, the variance is mostly defined by the claims behaviour, since the acceptance rate remains low and the humps are less pronounced.For an acceptance function ranging up to high premiums as shown in Figure 7, the acceptance variance dominates.The humps are more pronounced as policyholders exist in a

Mean-variance efficient frontier
To complete the numerical part, we now address the illustration of the mean-variance frontier as defined in Section 4 for a population size of n = 10, 000.We see in Figure 9 that up to a certain variance level, the non-differentiation strategy dominates differentiation in terms of expected profit.This breaking point depends on the cost function c(n) and the error probabilities.One may also want to consider limiting   constraints in practice such as regulatory constraints or the demands of stakeholders.
The kinks in the frontier arise from the fact that for different variance limitations, a different portfolio composition becomes optimal.In other words, the optimal strategy switches in the points of the kinks by letting more of a lower or higher risk type entering the contract.The mean-variance approach assumes variations to both sides as equally weighted since the variance is a symmetric risk measure.This framework can be extended to other risk measures, such as the lower semi-variance to take in account only one-sided deviations from the mean or the value-at-risk to consider minimal profit requirements.These adaptations are developed in Section 6, where we also present an alternative approach for the risk assessment based on utility functions.

A sigmoid-type acceptance function
While the piece-wise linear acceptance functions used in this paper allow for intuitive and transparent results, one may want to challenge this simplistic assumption.In this section we would like to extend the previous analysis to a possibly more realistic shape that still allows for an explicit treatment.Concretely, assume that f belongs to the class of sigmoid functions, namely the logistic functions, which are smooth and monotone, thus suitable for our situation [26].This form of function appears when applying a logit lapsing model with different risk factors, see e.g.[15,19].An example of a model using premiums as risk factors can be found in [7,20] and particularly in [16].Consider the following concrete shape of the acceptance function f i of an individual of risk type i: where the parameters a i and b i need to be calibrated.We can suppose b i > µ i , so that the function reaches value 1/2 for premiums that are higher than the actuarially fair premium, cf. Figure 10.As a grows, the curve becomes steeper around the pivotal position determined by the parameter b (note that the choice of b also determines the value of f for P = 0 which will typically be smaller than 1).From an analytical point of view, the form ( 24) is more attractive than the piece-wise linear shape considered in the previous sections, as it is differentiable everywhere.Clearly, f i is strictly decreasing in P : Define further the price elasticity of a risk type as the change of the number of customers entering the contract with respect to the price variation: This measure illustrates the reactivity of the portfolio size to the variation of premium, cf. for instance [41,Ch.15].

Theoretical results
We first derive the analogous results to the ones in Sections 3, 4, under the sigmoid acceptance function.The full information case still leads to an explicit formula: Theorem 6.1.In the full information case, the expected profit of the insurer is maximized for the premium choice and where W (z) denotes the (principal branch of the) Lambert W function, which is the inverse function of g(x) = xe x (cf.[10]).
Proof.The optimal premium choice is the solution of the optimization problem We can characterize the maxima by the equations From ( 27) we have leading to (25).Equation (26) for the high-risk individuals is then obtained in a completely analogous way.To see that the extremal point is indeed a local maximum, one needs to verify Using {P L > µ L , P H > µ H } and we can rearrange the previous condition as Since −2f ′ L (x) is always positive, f ′′ L (x) < 0 for all x < b L and x = P L > µ L .The same conclusion holds for the second derivative w.r.t.y.Remark 6.2.Note that we can provide a necessary condition for P L to be smaller than µ H : This condition is of interest for analysing the case when the low-risk type premium yields losses in absolute terms if sold to a high-risk type.Also, one can easily obtain sensitivities of the premium with respect to the parameters by means of first derivatives.
Concerning the sign of the last term, under b i > µ i the first term is positive and the second is negative.The overall difference is negative for small values of a i , but positive for larger a i , that effect manifesting itself sooner if the difference b i − µ i is larger.
In case of no differentiation, we proceed as before by taking first order conditions of the expected profit defined above in Equation (11): Plugging in the sigmoid function f yields which can be solved numerically.We establish that P L ≤ P ≤ P H , following the assumption in (29).
For the differentiation case, we have the problem defined in Equation ( 15) to solve.Once again, as the first order conditions are symmetric, we will detail only one of them.
Similarly, we also get These conditions characterize the optimum, which is then solved numerically.

Numerical illustrations
Let us look into the case of a sigmoid acceptance function (24) with parameters All other parameters remaining identical to those from Section 5, we obtain the following results: • Full information: Π) ≈ 0.8030561n.
• No differentiation: Note that P > µ H > µ L .In this case, the insurer targets the high risk type audience because even if its size is smaller, with this acceptance function form he can make higher margins on them, thus compensating their smaller size.
Consequently, differentiation is only preferable here if the population size satisfies 0.6993114n − c(n) ≥ 0.2998947n, that is n/ln(γn) ≥ 2.503651c 0 , in our case n ≥ 282.617.Conversely, the maximum cost the insurer is willing to pay given a population size is given by: c(n) < 0.3994167n.
For the variances, the results are as follows: • Full information: • No differentiation: Var (Π) = 1.089133n.
• Differentiation: In Figure 12, we observe that under small price elasticity, with more customers entering the contract, the hump behaviour disappears, since at the limit there is no gap in different risk types behaviour.The total variance is now mostly due to the underwriting process via the acceptance function, and claim size variance has little effect on the total variance.In contrast, a high price elasticity pushes different risk types to stabilize around their pivotal point of their respective acceptance function, accepting contracts only below this point, see Figure 13.All the variance of the profit can then be explained by the claim size variance.Finally, Figure 15 gives the mean-variance frontier in case of this sigmoid acceptance function.
We observe that the no-differentiation strategy changes depending on the form of the acceptance function used in the analysis.This can be particularly relevant in the case when a company conducts a study using a simplified linear form instead of a more realistic logistic approximation.Under small price-elasticity of demand, we observe higher levels of underwriting for both risk types, hence higher and smoother variance.

Sensitivities
Let us now investigate the sensitivity of the expected profit w.r.t. to each parameter.The change in expected profits will allow to compute the variation in the maximal cost for which the differentiation policy is still advantageous.
Parameter a: In the sigmoid curve of the acceptance function f , a represents the steepness, giving the speed at which the function changes around its central point (see Figure 10).We choose a L = a H = a, so we will vary it as a unique parameter.We observe in Figure 16 that for higher levels of a, the steepness of the twist increases, meaning that the values of the acceptance function grow closer to the    points b L and b H . Thus, the price can be set closer to the twisting point, allowing a higher proportion of individuals to enter the contract.As in our initial parametrization b i > µ i , we gain strictly positive profit when pricing around b i .In the case of no differentiation, we cannot entirely benefit from this feature, as one of our types twisting point will end up far from the unique price P .Therefore, the maximum cost the insurer is willing to invest into the classification method is increasing in the parameter a.We can determine the limit: which in our case gives 0.63n.17.Naturally, the higher b i , the higher will be the overall profit, as customers accept premiums until higher thresholds.Therefore, it becomes more and more attractive to differentiate customers to actually get this profit.Conversely, if b H grows ceteris paribus, the profit increase becomes smaller with differentiation as the proportion of high-risk types is too low to strongly influence the non-differentiation premium.

Other risk measures
We give a short comparative analysis for two other risk measures replacing the variance criterion (see Pflug and Römisch [33,Ch.5]for a more extensive list of possible alternatives in the context of efficient frontier studies in decision making).

Lower semi-variance
The main drawback of a variance risk constraint is that positive deviations from the mean are also penalized.In [28], Markowitz suggests the concept of the lower semi-variance (LSV) of the profit Π to account for asymmetry of positive and negative deviations from the profit target.In this case, analytical formulas are not feasible any more, but one can obtain similar results by Monte Carlo simulation, using 1000 simulation runs.For that purpose, rather than only specifying two moments, we need to take an assumption of the entire distribution of claim sizes.The left plot in Figure 18 depicts the resulting efficient frontiers for the three scenarios for an assumption of Gamma distributions for the individual claim sizes with an additional atom at 0 with probability 0.25 (parameters consistent with their first two moments from ( 23)) and all other parameters chosen as in (23).The right plot in Figure 18 shows the results for H being log-normally distributed risks (and again matching the first two moments).For instance, the intersection between the no-differentiation and differentiation scenario takes place at a much higher threshold.

Value-at-risk
Let us now instead consider the Value-at-risk for some level 0 < α < 1.This measure is particularly focusing on the tail of the loss (negative profit), when using small values of α.As for the LSV, we depict Monte Carlo results for the case of Gamma-distributed H and Log-normal H (Figure 19) risk types, where α = 0.025.That is, the profit can be lower than the value of the abscissa in Figure 19 only with probability α = 0.025, so that the more left in the abscissa one gets, the more risk-averse the strategy is.One observes that high values of VaR 0.025 (Π) can only be obtained by the no-differentiation case.In regions where that VaR-value can be attained by all strategies, the differentiation strategy always dominates the one without differentiation.Note that for this level of α, one virtually does not observe any difference between the case of light-tailed and heavy-tailed losses, which is also due to the size of the portfolio.

Utility functions
Utility theory is a classical tool to combine risk and profitability of an insurance undertaking in one function (see e.g.[35,33]), so in this subsection we would like to briefly look at the problem posed in this paper from the utility point of view.Note that in this case the knowledge of the full loss distribution is needed, and not only the first two moments as in Section 4. Assume that the insurer bases decisions on a risk-averse (i.e., increasing and concave) utility function u(x).The insurer's optimization problem is then modified as follows: where the profit Π is given by the (4), which we can also write as Firstly, the moment-generating function of each Π L j is Analogously, M Π H j (t) = e tP H M H (−t).By independence and classical collective risk theory calculations (cf.[23]), we can then determine the moment generating function of Π: M Π (t) = M N L (log M Π L j (t)) • M N H (log M Π H j (t)).The same reasoning applies to the non-differentiation case with setting P L = P H = P .Finally, for differentiating pricing, an analogous derivation gives In each of the cases, M Π (t) can be inverted to obtain the c.d.f.F Π (x) of the profit, and the expected utility is then given by E (u(Π)) = x u(Π(x))dF Π (x).
For a numerical illustration, assume now that L ∼ Exp(α L ) and H ∼ Γ(α H , λ H ). To be consistent with (23), we choose α L = µ L = 1, α H = µ 2 H /σ 2 H , λ H = µ H /σ 2 H . Since an explicit calculation of E (u(Π)) is not feasible, we add here numerical results from a Monte Carlo simulation, simulating its value for each choice of P L , P H (across a discrete grid of mesh size 0.05) using 1000 runs.For the sake of comparison, we use three popular utility functions: • linear utility u(x) = x (leading to simply the expected value of the profit); • exponential utility u(x) = −e −Ax for some risk aversion coefficient A > 0; • quadratic utility u(x) = x − Bx 2 , x ≤ 1 2B .
The results in Figures 20,21,22 show the expected utility for each of the available premium combinations and each strategy for these three utility functions.Figure 20 serves as a reference point since it represents the simple expected profit as before.
One observes that the optimal solution clearly depends on the chosen utility function.
With the chosen parametrization of the exponential utility function, the difference in the expected utility between the differentiation and not differentiation case is less prominent than in the quadratic utility as the marginal utility of the quadratic function is greater in this region.

Conclusion
In this paper, we investigated the problem of risk categorization under the possibility of classification errors for an insurance company.We highlighted the impact of misspecification of risk classes on the company's profit, which is a relevant topic due to the growing use of black box techniques in classification.Resulting pricing errors may lead to adverse selection via a modified acceptance behaviour of individuals to enter a contract, potentially leading to extra costs due to lost market shares and loss of premium inflow.In a simple model with two risk types and piece-wise linear acceptance function, we distinguished three pricing scenarios: full information, undifferentiated pricing and costly price differentiation under error assumptions.In this framework, we studied the optimal solution for simply maximizing expected profit and more generally within a mean-variance framework, establishing efficient frontiers for the premium choices.The cost of the risk categorization as a function of population size will then eventually determine the optimal choice of premiums, and to what extent risk classification is profitable.
The simplicity of the introduced model allowed to quantify the effects and consequences of misspecification on the insurer's profit.Clearly, it will be of interest in future research to generalize the model assumptions in various directions.Beyond the extensions to more general acceptance functions and risk measures that we already address to a first extent in Section 6 of the paper, it will be of interest to extend the study to more than two risk categories.Another important direction will be to introduce market competition into this model (cf.[16,31]), as well as the lapse behavior of policyholders between the different market players (see e.g.[5,30]).Also, while our probabilistic acceptance model already covers a certain degree of randomness in the choice of insurance policies, it could be interesting to more explicitly include bounded rationality as well as other elements of inertia of policyholders and the markets in the modelling framework.
Statement: On behalf of all authors, the corresponding author states that there is no conflict of interest.

Figure 1 :
Figure 1: Acceptance probability f i (P ) as a function of offered premium P .

( 6 )
of the full information case, b * = p L (1−p H|L )/P max L p L (1−p H|L )/P max L +p H p L|H /P max H and c * = p L p H|L /P max L p H (1−p L|H )/P max H +p L p H|L /P max L .

Remark 3 . 7 .
One needs to verify the limiting case of P * H = P max L and P * L = P max L to obtain the true maximum, since misclassified L individuals may not enter the contract after the limiting premium.This is the case when c * < P H −P max L P H −P L and b * < P H −P max L P H −P L .One can distinguish three cases.Firstly, if both P * H < P max L and P * L < P max L

Figure 4 :
Figure 4: Expected profit as function of error probabilities.
(a) Perfect information case.Left-hand side: Var (Π) as a function of P L .Right-hand side: Var (Π) as a function of P H .(b) No differentiation case.Var (Π) as a function of P .(c) Differentiation case.Left-hand side: Var (Π) as a function of P * L .Right-hand side: Var (Π) as a function of P * H .

Figure 5 :
Figure 5: Decomposition of the variance.

Figure 6 :
Figure 6: Decomposition of the variance, parameter P max i

Figure 7 :
Figure 7: Decomposition of the variance, parameter P max i

Figure 9 :
Figure 9: Mean-variance frontier with linear demand function.

Figure 10 :
Figure 10: Acceptance function f for small (left) and large (right) parameter a.

Figure 11
Figure 11  depicts the form of the variance as function of the proposed premiums in the different scenarios and its decomposition into the two parts (the variance arising from the acceptance function and the one from the claim size variability), showing again a hump pattern.Figures 12-14 illustrate the sensitivity of the variances of the profit with and without differentiation, when varying one of the parameters.In Figure12, we observe that under small price elasticity, with more customers entering the contract, the hump behaviour disappears, since at the limit there is no gap in different risk types behaviour.The total variance is now mostly due to the underwriting process via the acceptance function, and claim size variance has little effect on the total variance.In contrast, a high price elasticity pushes different risk types to stabilize around their pivotal point of their respective acceptance function, accepting contracts only below this point, see Figure13.All the variance of the profit can then be explained by the claim size variance.Finally, Figure15gives the mean-variance frontier in case of this sigmoid acceptance function.We observe that the no-differentiation strategy changes depending on the form of the acceptance function used in the analysis.This can be particularly relevant in the case when a company conducts a study using a simplified linear form instead of a more realistic logistic approximation.
(a) Perfect information case.Left-hand side: Var (Π) as function of P L .Right-hand side: Var (Π) as function of P H .(b) No differentiation case.Var (Π) as function of P .(c) Differentiation case.Left-hand side: Var (Π) as function of P * L .Right-hand side: Var (Π) as function of P * H .

Figure 11 :
Figure 11: Decomposition of the variance.

Figure 12 :
Figure12: Parameter a = 0.1.Under small price-elasticity of demand, we observe higher levels of underwriting for both risk types, hence higher and smoother variance.

Figure 13 :
Figure 13: Parameter a = 10.Under big price-elasticity of demand, demand concentrates around pivotal points, thus different risk types only enter contract until their pivotal point price, therefore steps are noticeable.

Figure 14 :
Figure 14: Parameter µ H = 10.With a bigger expected claim size difference between risk types, the relationship between the humps and the risk types becomes clearer.

Figure 15 :
Figure 15: Efficient frontiers in the mean-variance setup for the three scenarios.

( a )
Function of a and n.(b) n fixed at 10000.

Figure 16 :
Figure 16: Maximum affordable investment cost for implementation of a differentiation mechanism as a function of a and n.

Figure 17 :
Figure 17: Maximum affordable cost as a function of b H , b L and n = 10000

Figure 18 :
Figure 18: Efficient frontiers in the mean-LSV setup for the three scenarios under the assumption of Gamma-distributed H (left) and Log-Normal H (right).

Figure 19 :
Figure 19: Efficient frontiers in the mean-VaR setup for the three scenarios for Gamma-distributed H (left) and Log-Normal H (right).
Perfect information (green heatmap) and differentiation (red level curves).

Figure 20 :
Figure 20: Expected linear utility as a function of premiums

Figure 21 :
Figure 21: Expected exponential utility as a function of premiums (A = 0.0005)
H , since we correctly price for only H types entering the group.H types will always enter the contract since their premium is a weighted average of P L and P H , and both are smaller than P max