Abstract
We analyze the design of optimal medical insurance under ex post moral hazard, i.e., when illness severity cannot be observed by insurers and policyholders decide for themselves on their health expenditures. The tradeoff between ex ante risk sharing and ex post incentive compatibility is analyzed in an optimal revelation mechanism under hidden information and risk aversion. The optimal contract provides partial insurance at the margin, with a deductible when insurers’ rates are affected by a positive loading, and it may also include an upper limit on coverage. The potential to audit the health state leads to an upper limit on outofpocket expenses.
Similar content being viewed by others
Notes
Blomqvist (1997) argues that the indemnity schedule is Sshaped, with marginal coverage increasing for small expenses and decreasing for large expenses. As we will see, this conclusion is not valid when bunching and limit conditions are adequately taken into account.
It is well known that optimal insurance contracts may include a deductible because of transaction costs (Arrow 1963), ex ante moral hazard (Holmström 1979) or costly state verification (Townsend 1979). Drèze and Schokkaert (2013) extend Arrow’s theorem of the deductible to the case of ex post moral hazard. Although ceilings on coverage are widespread, they have been justified by arguments that are much more specific: either the insurer’s risk aversion for large risks and regulatory constraints (Raviv 1979), or bankruptcy rules (Huberman et al. 1983) or the auditor’s risk aversion in costly state verification models (Picard 2000).
A straight deductible contract, i.e., full coverage of losses above a deductible, is optimal when effort affects the probability of an accident, but not the probability distribution of losses, conditionally on the occurrence of an accident.
See, for instance, the description of the health insurance plans in the Affordable Care Act at https://www.healthcare.gov/healthplaninformation/.
For notational simplicity, we assume that there is no probability weight at the nosickness state \(x=0\), but the model could easily be extended in that direction.
In addition to being realistic, assuming that I(m) is nondecreasing is not a loss of generality if policyholders can claim insurance payment for only a part of their medical expenses: in that case, only the increasing part of their indemnity schedule would be relevant. Piecewise differentiability means that I(m) has only a finite number of nondifferentiability points, which includes the indemnity schedule features that we may have in mind, in particular those with a deductible, a rate of coinsurance, or an upper limit on coverage. \(I(0)=0\) corresponds to the way insurance works in practice, but it also acts as a normalization device. Indeed, replacing contract \(\{I(m),P\}\) by \(\{I(m)+k,P+k\}\) with \(k>0\), would not change the net transfer \(I(m)P\) from insurer to insured, hence an indeterminacy of the optimal solution. This indeterminacy vanishes if we impose \(I(0)=0\).
Our notations are presented by presuming that policyholders pay m (i.e., the total cost of medical services) and they receive the insurance indemnity I(m). However, we may also assume that the insurer and policyholders, respectively, pay I(m) and \(mI(m)\) to medical service providers. Both interpretations correspond to different institutional arrangements, and both are valid in our analysis.
We use Lemma 1(ii) to restrict attention to functions \(\widehat{I}(x)\) and m(x) that are continuous. Furthermore, \(\widehat{I}(x)\) and m(x) are piecewise differentiable because I(m) is piecewise differentiable. This allows us to use Pontryagin’s principle in the proof of Proposition 1. In this proof, it is shown that the optimal revelation mechanism is such that \(\widehat{I}^{\prime }(x)\ge 0\). Since \(m^{\prime }(x)\ge 0\), the optimal mechanism will be generated by a nondecreasing indemnity schedule I(m), as we have assumed. Note that Blomqvist (1997) studies a similar optimization problem, but he wrongly ignores the secondorder conditions (8) and the sign conditions (9). Nor does he fully consider the technical implications of the assumption \(v^{\prime }(0)=+\infty\), in the absence of which we would have a corner solution with \(m(x)=0\) for x small.
Note the relationship of Proposition 1 with optimal insurance under (ex ante) moral hazard when effort affects the distribution of losses should an accident occur, but not the probability of the accident itself. In that case, it may be optimal to fully cover small losses without a deductible. See Rees and Wambach (2008).
This is the case, for instance, if the distribution of x is uniform or exponential.
In more technical terms, we may define the value function \(v(I_{0},m_{0},x)\) to be the greatest expected utility over [x, a], with unchanged insurance expected cost, if we start at \(\widehat{I}(x)=I_{0} ,m(x)=m_{0}\). The vector of costates \((\mu _{1}(x),\mu _{2}(x))\) is the gradient at x of the value function, evaluated along the optimal trajectory.
\(\varphi (x)\) is called a “switching function” in the optimal control terminology, because its sign determines the sign of the control.
These conditions can be deduced from the trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\).
The proofs do not require this assumption.
A similar but more complex argument is used in the proof of Proposition 2 to show that bunching cannot occur in intervals interior to [0, a].
In practice, the optimal policy could be approximated by a piecewise linear schedule with slope between 0 and 1 until the upper limit \(\overline{m}\) and with a capped indemnity when \(m>\overline{m}\). It would be interesting to estimate the welfare loss associated with this piecewise linearization. The simulations presented in Sect. 3.3 suggest that this loss may be low.
The same intuition is at work to show that \(\widehat{I}^{\prime }(x)>0\) when x is close to zero, and thus that the indemnity schedule should not include a deductible, with additional technical specificities induced by the sign constraint \(\widehat{I}(x)\ge 0\).
We use the Bocop software (see Bonnans et al. 2016 and http://bocop.org). We refer the reader to ‘Computational approach’ in Appendix 2 and, for instance, to Betts (2001) and Nocedal and Wright (1999) for more details on direct transcription methods and nonlinear programming algorithms.
Note that f(a) and \(f^{\prime }(a)\) are close to 0 when a is large.
More generally, the insurer could randomly audit claims, the probability of triggering an audit depending on the size of the claim. See the references in Picard (2013) on deterministic and random auditing for insurance claims.
The policyholder is subject to prior authorization for increasing her medical expenses above \(m^{*}\). After auditing the health state, this authorization will be granted but capped by m(x) if \(x>x^{*}\), and otherwise it will be denied.
Since an upward discontinuity of I(m) at \(m=m^{*}\) dominates the optimal solution when I(m) is constrained to be continuous, increasing I(m) as much as possible in a small interval \((m^{*},m^{*}+\varepsilon )\) would bring the continuous function I(m) arbitrarily close to this discontinuous function. No optimal solution would exist in the set of continuous functions I(m). Thus, in addition to being realistic from an empirical point of view, the assumption \(I^{\prime }(m)\le 1\) if \(m\ge m^{*}\) eliminates this reason for which an optimal solution may not exist. As previously shown, we have \(I^{\prime }(m)<1\) in the noaudit regime where \(m<m^{*}\).
If \(c=0\), then the firstbest allocation would be feasible with \(x^{*}=0\), that is by auditing the health state in all possible cases. Thus, choosing \(x^{*}\) smaller than a is optimal when c is not too large, and this is what we assume in what follows.
See Gollier (1987) and Bond and Crocker (1997) for similar results; see also Picard (2013) for a survey on deterministic auditing in insurance fraud models. Lemma 2 also characterizes the optimal health expenses profile m(x) when there is auditing and full insurance at the margin (that is when \(x>\widehat{x}\)): we have \(m^{\prime }(x)=v^{\prime }(m(x))/xv^{\prime \prime }(m(x))\), which means that the increase in health expenses which follows a unit increase in the illness severity x is equal to the inverse of the elasticity of the marginal efficiency of health expenses \(v^{\prime }(m(x))\). Equivalently, the marginal utility of health care expenses \(\gamma xv^{\prime }(m(x))\) should remain constant in the auditing regime.
Of course, this discontinuity of function m(x) at \(x=x^{*}\) is compatible with a continuous function I(m).
The bunching of types is no longer sustained by a kink in the indemnity schedule I(m) at \(m=\overline{m}\), but by the threat of an audit, since increasing expenses above \(\overline{m}\) will not be possible if \(x\le x^{*}\).
An example is when the individual may lose a part of her business or wage income when her health level deteriorates.
If \(\varepsilon\) is continuously distributed, then \(G_{\varepsilon }^{\prime }(\varepsilon \mid x)>0\) is the density of \(\varepsilon\) conditionally on x.
In Fig. 5top, indifference curves for \(x=7\) and 9 almost coincide. Figure 5bottom shows that \(\overline{m}\) decreases when k increases, with a decrease in the upper limit of the insurance indemnity \(I(\overline{m})\). There is bunching only when \(k>0\) since Fig. 5 corresponds to the case of uniform distribution.
Henceforth, we assume there is no background risk.
\(U_{HR}^{\prime \prime }>0\) is assumed for the sake of simplicity. Lemma 3 is valid under more general conditions that are compatible with \(U_{HR}^{\prime \prime }\le 0\).
Thus, utility is CRRA w.r.t. wealth. Parameters are \(\alpha =2,\beta =0.5,b_{0}=0.01,\) and \(b=1\).
The indifference curves are drawn for \(x=2\) in Fig. 9.
See Picard (2016) for a case with linear coinsurance where this effect of wealth on the coinsurance rate vanishes completely.
This is just an assumption made for illustrative purposes.
The relevant values are such that \(m<1\), and thus \(q(m)=m^{\alpha }<m\).
This corner solution is induced by the nonconcavity of \(v(m^{\alpha })\) when \(\alpha =3\) and 4.
For the sake of illustration, see for instance Kaiser Family Foundation (2009) for France, Germany, and Switzerland, and http://www.healthcare.gov for the ObamaCare Marketplace in the US.
A similar proof applies to the case of leftward discontinuity.
In optimal control problems with state variable constraints, the costate variable may be discontinuous at junctions between regimes where the constraint is binding or not binding; see for instance Sect. 7.6 in Beavis and Dobbs (1991). Here, \(\mu _{1}(x)\) may be discontinuous at junction points between intervals where \(\widehat{I}(x)=0\) and intervals where \(\widehat{I}(x)>0\). The proof is almost the same if the junction point is such that \(\widehat{I}(x)>0\) if \(x\in (x_{0}\varepsilon ,x_{0}]\) and \(\widehat{I}(x)=0\) if \(x\in (x_{0} ,x_{0}+\varepsilon )\).
Note that (26) and \(\mu _{1}(x)=\) \(\mu _{1}^{\prime }(x)=0\) for all \(x\in [0,x_{0}]\) imply that \(\delta (x)\) is continuous in this interval.
Step 3 in the proof of Proposition 1 shows that \(\mu _{1}(x)>0\) for all \(x\in (0,a)\).
We assume w.l.o.g. that h(x) is continuous at \(x=a\).
We can straightforwardly check that (8) is not binding in this subproblem.
On can check that \(A_{\widetilde{x}}^{\prime }{}_{\left \widetilde{x}=x\right. }>0\) if \(U_{H}^{\prime }U_{RH}^{\prime \prime }U_{R}^{\prime }U_{H^{2}}^{\prime \prime } >0\), which holds when \(U_{RH}^{\prime \prime }>0,U_{H^{2}}^{\prime \prime }<0\) as postulated, but which is also compatible with \(U_{RH}^{\prime \prime }<0\).
References
Arrow, K.J. 1963. Uncertainty and the welfare economics of medical care. American Economic Review 53: 941–973.
Arrow, K.J. 1968. The economics of moral hazard: Further comment. American Economic Review 58: 537–539.
Arrow, K.J. 1971. Essays in the Theory of Risk Bearing. Chicago: Markham Publishing.
Arrow, K.J. 1976. Welfare analysis of changes in health coinsurance rates. In The Role of Health Insurance in the Health Services Sector, ed. R. Rosett, 3–23. New York: NBER.
Beavis, B., and I. Dobbs. 1991. Optimization and Stability Theory for Economic Analysis. Cambridge: Cambridge University Press.
Betts, J.T. 2001. Practical Methods for Optimal Control Using Nonlinear Programming. Philadelphia: Society for Industrial and Applied Mathematics (SIAM).
Blomqvist, A. 1997. Optimal nonlinear health insurance. Journal of Health Economics 16: 303–321.
Bond, E., and K.J. Crocker. 1997. Hardball and the soft touch: the economics of optimal insurance contracts with costly state verification and endogenous monitoring costs. Journal of Public Economics 63: 239–264.
Bonnans, F., D. Giorgi, V. Grelard, B. Heymann, S. Maindrault, P. Martinon, and O. Tissot. 2016. Bocop—A Collection of Examples. Technical Report, INRIA.
Cutler, D.M., and R.J. Zeckhauser. 2000. The anatomy of health insurance. In Handbook of Health Economics, vol. 1, ed. A. Culyer, and J.P. Newhouse, 563–643. Amsterdam: NorthHolland.
Drèze, J.H., and E. Schokkaert. 2013. Arrow’s theorem of the deductible: moral hazard and stoploss in health insurance. Journal of Risk and Uncertainty 47 (2): 147–163.
Ebert, U. 1992. A reexamination of the optimal nonlinear income tax. Journal of Public Economics 49: 47–73.
Ellis, R.P., S. Jiang, and W.G. Manning. 2015. Optimal health insurance for multiple goods and time periods. Journal of Health Economics 41: 89–106.
Evans, W.N., and W.K. Viscusi. 1991. Estimation of state dependent utility functions using survey data. Review of Economics and Statistics 73: 94–104.
Feldman, R., and B. Dowd. 1991. A new estimate of the welfare loss of excess health insurance. American Economic Review 81: 297–301.
Feldstein, M. 1973. The welfare loss of excess health insurance. Journal of Political Economy 81: 251–280.
Feldstein, M., and B. Friedman. 1977. Tax subsidies, the rational demand for insurance and the health care crisis. Journal of Public Economics 7: 155–178.
Finkelstein, A., E.F.P. Luttmer, and M.J. Notowidigdo. 2013. What good is wealth without health? The effect of health on the marginal utility of consumption. Journal of the European Economic Association 11: 221–258.
Gollier, C. 1987. Paretooptimal risk sharing with fixed cost per claim. Scandinavian Actuarial Journal 13: 62–73.
Holmström, B. 1979. Moral hazard and observability. Bell Journal of Economics 10: 74–91.
Huberman, G., D. Mayers, and C.W. Smith Jr. 1983. Optimum insurance policy indemnity schedules. Bell Journal of Economics 14: 415–426.
Kaiser Family Foundation. 2009. Cost Sharing for Health Care: France, Germany, and Switzerland. Menlo Park, CA: The Henry J. Kaiser Family Foundation.
Laffont, J.J., and J.C. Rochet. 1998. Regulation of a riskaverse firm. Games and Economic Behavior 25: 149–173.
Lollivier, S., and J.C. Rochet. 1983. Bunching and secondorder conditions: A note on optimal tax theory. Journal of Economic Theory 31 (2): 392–400.
Ma, C.T.A., and M. Riordan. 2002. Health insurance, moral hazard, and managed care. Journal of Economics and Management Strategy 11: 81–107.
Nocedal, J., and S.J. Wright. 1999. Numerical Optimization. NewYork: SpringerVerlag.
Pauly, M. 1968. The economics of moral hazard: Comment. American Economic Review 58: 531–537.
Pflum, K.E. 2015. Physician incentives and treatment choices. Journal of Economics and Management Strategy 24: 712–751.
Picard, P. 2000. On the design of optimal insurance policies under manipulation of audit cost. International Economic Review 41 (4): 1049–1071.
Picard, P. 2013. Economic analysis of insurance fraud. In Handbook of Insurance, 2nd ed, ed. G. Dionne, 349–395. New York: Springer.
Picard, P. 2016. A note on health insurance under ex post moral hazard. Risks 4 (38): 1–9.
Raviv, A. 1979. The design of an optimal insurance policy. American Economic Review 69: 854–896.
Rees, R., and A. Wambach. 2008. The microeconomics of insurance. Foundations and Trends in Microeconomics 4 (1–2): 1–163.
Salanié, B. 1990. Sélection adverse et aversion pour le risque. Annales d’Economie et de Statistiques 18: 131–150.
Townsend, R. 1979. Optimal contracts and competitive markets with costly state verification. Journal of Economic Theory 21: 265–293.
Viscusi, W.K., and W.N. Evans. 1990. Utility functions that depend on health status: Estimates and economic implications. American Economic Review 80: 353–374.
Wächter, A., and L.T. Biegler. 2006. On the implementation of a primaldual interior point filter line search algorithm for largescale nonlinear programming. Mathematical Programming 106 (1): 25–57.
Walther, A., and A. Griewank. 2012. Getting started with adolc. In Combinatorial Scientific Computing. ChapmanHall CRC Computational Science, ed. U. Naumann, and O. Schenk. Boca Raton: CRC Press.
Weymark, J.A. 1986. A reducedform optimal nonlinear income tax problem. Journal of Public Economics 30 (2): 199–217.
Winter, R.A. 2013. Optimal insurance contracts under moral hazard. In Handbook of Insurance, ed. G. Dionne, 205–230. Second Edition: Springer.
Zeckhauser, R. 1970. Medical insurance: A case study of the tradeoff between risk spreading and appropriate incentives. Journal of Economic Theory 2: 10–26.
Acknowledgements
Pierre Picard gratefully acknowledges financial support from LabEX ECODEC.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Proof of Lemma 1
 Step 1:

There exists an optimal revelation mechanism.
Let us change variables by denoting \(A(x)=u(wP+\widehat{I}(x)m(x))\) and \(B(x)=v(m(x))\). The incentive compatibility constraints and the insurer’s breakeven constraint are, respectively, rewritten as
Furthermore, \(\widehat{I}(0)=m(0)=0\) gives \(A(0)=u(wP)\) and \(B(0)=0\). Let \(\mathcal {S}\) be the subset of functions A(.), B(.) that belong to the Banach space \(\mathcal {L}^{\infty }([0,1],\mathbb {R}\times [0,1])\) with the sup norm topology \(\parallel .\parallel _{\infty }\) and that satisfy (21),(22), and \(B(0)=0\). Hence, \(\mathcal {S}\) is closed and convex, and furthermore \(\parallel (A(.),B(.))\parallel _{\infty }\le u(w)\) for all \((A(.),B(.))\in \mathcal {S}\). Let
J is a linear (and thus weakly concave) function of A(.), B(.). Hence, it reaches a maximum in \(\mathcal {S}\), which proves the existence of an optimal incentive compatible mechanism, with \(P=wu^{1}(A(0))\)
 Step 2:

For any incentive compatible mechanism, m(x) and \(\widehat{I}(x)\) are nondecreasing.
Incentive compatibility implies
and, reversing the roles of x and \(\widetilde{x},\)
We deduce \((\widetilde{x}x)[v(m(\widetilde{x}))v(m(x))]\ge 0\) for all \(x,\widetilde{x}\), which implies that m(.) is nondecreasing. Since I(.) is nondecreasing, \(\widehat{I}(.)\equiv I(m(.))\) is also nondecreasing.
 Step 3:

For any optimal revelation mechanism, m(.) and \(\widehat{I}(.)\) are continuous.
Let \(\{m_{0}(.),\widehat{I}_{0}(.)\}\) be an optimal incentive compatible revelation mechanism and suppose that \(m_{0}(.)\) is rightward discontinuous^{Footnote 44} at \(x_{*}\in (0,a)\), with \(m_{0}(x)\rightarrow m_{0} (x_{*})+\Delta _{m}\) and \(\widehat{I}_{0}(x)\rightarrow \widehat{I} _{0}(x_{*})+\Delta _{I}\), when \(x\rightarrow x_{*},x>x_{*}\), with \(\Delta _{m}>0\) and \(\Delta _{I}\ge 0\). Incentive compatibility implies that a type \(x_{*}\) individual is indifferent between \(m_{0}(x_{*} ),\widehat{I}_{0}(x_{*})\) and \(m_{0}(x_{*})+\Delta _{m},\widehat{I} _{0}(x_{*})+\Delta _{I}\). If \(\Delta _{I}=0\), since I(m) is nondecreasing, it remains constant when \(m\in [m_{0}(x_{*}),m_{0}(x_{*} )+\Delta _{m}]\). Using the concavity of \(m\rightarrow u(wPm+\widehat{I} _{0}(x_{*}))+\gamma x_{*}v(m)\) then shows that the type \(x_{*}\) individual reaches a higher expected utility by choosing \(m\in (m_{0}(x_{*}),m_{0}(x_{*})+\Delta _{m})\) than by choosing \(m_{0}(x_{*})\), hence a contradiction. Thus, we have \(\Delta _{I}>0\).
Since \(\widehat{I}_{0}(x)\) is piecewise continuous, there exists \(\theta >0\) such that \(\widehat{I}_{0}(x)\widehat{I}_{0}(x_{*})\ge \Delta _{I}/2\) for all \(x\in (x_{*},x_{*}+\theta )\). Consider another revelation mechanism \(\{m_{1}(.),\widehat{I}_{1}(.)\}\) defined by
(i) If \(x\in (x_{*},x_{*}+\theta )\), let \(m_{1}(x)=m_{1}^{*}\) and \(\widehat{I}_{1}(x)=I_{1}^{*}\) close to \(m_{0}(x_{*})\) and \(\widehat{I}_{0}(x_{*})\), respectively, with \(\widehat{I}_{0} (x)I_{1}^{*}\ge \Delta _{I}/4\), and such that
for all \(x\in (x_{*},x_{*}+\theta )\), and
if \(x\le x_{*}\).
(ii) If \(x\notin (x_{*},x_{*}+\theta )\), then \(m_{1}(x)\equiv\) \(m_{0}(x)\) and \(\widehat{I}_{1}(x)\equiv \widehat{I}_{0}(x)\).
Let \(\widetilde{x}_{1}(x)\) be an optimal report of a type x policyholder in \(\{m_{1}(.),\widehat{I}_{1}(.)\}\), with \(\widetilde{x}_{1}(x)=x\) for all \(x\in [0,x_{*}+\theta )\), and let \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) be the incentive compatible revelation mechanism defined by \(m_{2}(x)\equiv m_{1}(\widetilde{x}_{1}(x)),\widehat{I}_{2}(x)\equiv \widehat{I}_{1} (\widetilde{x}_{1}(x))\). For P unchanged, the policyholder’s expected utility is higher for \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) than for \(\{m_{0}(.),\widehat{I}_{0}(.)\}\). Furthermore, \(\widehat{I}_{2} (x)=\widehat{I}_{0}(x)\) if \(x<x^{*}\), \(\widehat{I}_{2}(x)=I_{1}^{*}<\widehat{I}_{0}(x)\Delta _{I}/4\) if \(x_{*}\le x<x_{*}+\theta\) and \(\widehat{I}_{2}(x)\le \widehat{I}_{0}(x)\) if \(x\ge x_{*}+\theta\). Hence, \(\{m_{2}(.),\widehat{I}_{2}(.)\}\) is feasible with P unchanged, which contradicts the optimality of \(\{m_{0}(.),\widehat{I}_{0}(.)\}\).
 Step 4:

(4) and (5) are necessary and sufficient conditions for a continuous revelation mechanism to be incentive compatible.
Local firstorder and secondorder incentive compatibility conditions for type x are written, respectively, as
at any point of differentiability. (23) and (24) are necessary conditions for incentive compatibility. We have
Since (4) should hold for all \(x\in [0,a]\), a simple calculation yields
Conversely, assume (4) and (5) hold. (4) gives
Using (5) then shows that \(\partial V(x,\widetilde{x})/\partial \widetilde{x} \le 0\) if \(\widetilde{x}>x\) and \(\partial V(x,\widetilde{x})/\partial \widetilde{x}\ge 0\) if \(\widetilde{x}<x\), which implies incentive compatibility. \(\square\)
Proof of Proposition 1
Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables for \(\widehat{I}(x)\) and m(x), respectively, and let \(\lambda\) and \(\delta (x)\) be Lagrange multipliers, respectively, for (2) and (9). The Hamiltonian is written as
The optimality conditions are
with \(\delta (x)\ge 0\) and \(\delta (x)=0\) if \(\widehat{I}(x)>0\). A tedious but straightforward calculation using (26) and (27) leads to
We also have \(R^{\prime }(x)=\widehat{I}^{\prime }(x)m^{\prime }(x)=\gamma xh(x)v^{\prime }(m(x))/u^{\prime }(R(x))\le 0\). Thus, R(x) is nonincreasing, and it is decreasing when \(h(x)>0\). The remaining part of the proof is in five steps.
 Step 1:

\(m(x)>0\) for all \(x>0\).
Since \(m(0)=0\) and m(x) is nondecreasing, there exists \(\underline{x} \in [0,a]\) such that \(m(x)>0\) if and only if \(x>\underline{x}\). Suppose \(\underline{x}>0\), which implies \(h(x)=0\) over \([0,\underline{x}]\). Using \(\widehat{I}(0)=0\) and (6) gives \(\widehat{I}(x)=0\) for all \(x\in [0,\underline{x}]\). Let
with \(\widehat{m}(x)>0\) for all \(x>0\). Define \(m_{0}(x)=\widehat{m} (x),I_{0}(x)=0\) if \(x\le \underline{x}\) and \(m_{0}(x)=m(x),I_{0} (x)=\widehat{I}(x)\) if \(x>\underline{x}\), and
The revelation mechanism \(m_{1}(.),\widehat{I}_{1}(.)\) defined by \(m_{1}(x)\equiv m_{0}(x_{0}(x))\) and \(\widehat{I}_{1}(x)\equiv I_{0} (x_{0}(x))\) is incentive compatible and it dominates the supposed optimal mechanism \(m(.),\widehat{I}(.)\)—i.e., it provides a higher expected utility to the policyholder and its expected profit is nonnegative for P unchanged—, hence a contradiction. Thus, \(\underline{x}=0\).
 Step 2:

\(\mu _{1}(x)\) is continuous in [0, a] with \(\mu _{1}(x)=0\) if\(\widehat{I}(x)=0\).
Let \(x_{0}\in (0,a)\) be a junction point such that \(\widehat{I}(x)=0\) if \(x\in (x_{0}\varepsilon ,x_{0}]\) and \(\widehat{I}(x)>0\) if \(x\in (x_{0} ,x_{0}+\varepsilon )\), with \(0<\varepsilon <x_{0}\).^{Footnote 45}
Using the same argument as in Step 1 shows that \(h(x)>0\) in \((x_{0} \varepsilon ,x_{0})\). Let \(x\in (x_{0}\varepsilon ,x_{0})\). Using \(h(x)>0,\widehat{I}^{\prime }(x)=0\) and (6) gives \(u^{\prime }(R(x))=\gamma xv^{\prime }(m(x))\). Then, \(\varphi (x)=0\) gives \(\mu _{2}(x)=0\) and thus \(\mu _{2}^{\prime }(x)=0\) for all \(x\in (x_{0}\varepsilon ,x_{0}]\). Equation (30) implies \(\mu _{1}(x)=0\) for all \(x\in (x_{0}\varepsilon ,x_{0})\), and this is true, more generally, for all \(x\in [0,a]\) such that \(\widehat{I}(x)=0\).
Let \(x\in (x_{0},x_{0}+\varepsilon )\). \(\widehat{I}(x)\) is locally increasing over \((x_{0},x_{0}+\varepsilon )\) and thus \(\widehat{I}^{\prime }(x)>0\) and \(h(x)>0\) (at least for \(\varepsilon\) small enough). Thus, we have \(\delta (x)=\varphi (x)=\varphi ^{\prime }(x)=0\) for all \(x\in (x_{0} ,x_{0}+\varepsilon )\). Since R(x) and m(x) are continuous functions and \(u^{\prime }(R(x_{0}))=\gamma x_{0}v^{\prime }(m(x_{0}))\), we have \(u^{\prime }(R(x))\gamma xv^{\prime }(m(x))\rightarrow 0\) when \(x\searrow x_{0}\). Using (30) then gives \(\mu _{1}(x_{0})_{+}=0\). Thus, \(\mu _{1}(x)\) is continuous at \(x_{0}\).
 Step 3:

\(\mu _{1}(x)\ge 0\) for all\(x\in [0,a]\).
Integrating \(\mu _{1}^{\prime }(x)\) given by (26) and using (28) and (29) give
Suppose there exist \(x_{0},x_{1}\in [0,a]\) such that \(x_{0}<x_{1},\mu _{1}(x_{0})=\mu _{1}(x_{1})=0\) and \(\mu _{1}(x)<0\) if \(x\in (x_{0},x_{1})\). Thus, from Step 2, we have \(I(x)>0\) and \(\delta (x)=0\) if \(x\in (x_{0},x_{1})\). For \(\eta _{0}>0\) small enough, we have \(\mu _{1}^{\prime }(x_{0}+\eta _{0})<0\) and \(\delta (x_{0}+\eta _{0})=0\). Hence (26) gives
for \(x=x_{0}+\eta _{0}\). The previous inequality holds when \(\eta _{0}\searrow 0\). Since \(\mu _{1}(x)\) is continuous and \(\mu _{1}(x_{0})=0\), we deduce \(u^{\prime }(R(x_{0}))\ge \lambda .\)
By a similar argument, for \(\eta _{1}>0\) small enough, we have \(\mu _{1} ^{\prime }(x_{1}\eta _{1})>0\) and \(\delta (x_{1}\eta _{1})=0\). Thus (26) gives
for \(x=x_{1}\eta _{1}\). The previous inequality holds when \(\eta _{1}\searrow 0\), which implies \(\lambda >u^{\prime }(R(x_{1}))\). Thus, we have \(u^{\prime }(R(x_{0}))\ge \lambda >u^{\prime }(R(x_{1}))\). Since \(u^{\prime \prime }<0\), we deduce \(R(x_{0})<R(x_{1}),\) which contradicts \(R^{\prime }(x)\le 0\) and \(x_{0}<x_{1}\).
 Step 4:

\(\widehat{I}^{\prime }(x)\ge 0\) for all \(x\in [0,a]\).
Suppose \(\widehat{I}(x)>0\) and \(\widehat{I}^{\prime }(x)<0\) if \(x\in \mathcal {[}x_{0},x_{1}]\subset (0,a]\) with \(x_{0}<x_{1}\). (6) and (8) yield \(h(x)>0\)—and thus \(\varphi (x)=0\)—and \(\gamma xv^{\prime }(m(x))>u^{\prime }(R(x))\) if \(x\in \mathcal {[}x_{0},x_{1}]\). We also have \(\delta (x)=0,\mu _{1}(x)\ge 0\) if \(x\in \mathcal {[}x_{0},x_{1}]\). Hence (30) gives \(\varphi ^{\prime }(x)<0\) if \(x\in \mathcal {[}x_{0},x_{1}]\), which contradicts \(\varphi (x)\equiv 0\) in \(\mathcal {[}x_{0},x_{1}]\). Thus, \(\widehat{I}(x)\) is nondecreasing over [0, a].
 Step 5:

\(\widehat{I}(x)>0\) for all \(x\in (0,a].\)
Step 4 implies that there exists \(x_{0}\) in [0, a] such that \(\widehat{I}(x)=0\) if \(x\in [0,x_{0}]\) and \(\widehat{I}(x)>0\) if \(x\in (x_{0},a]\). Suppose \(x_{0}>0\). From Step 2, we have \(\mu _{1}(x)=0\) for all \(x\in [0,x_{0}]\), and
implies \(\delta (x)=0\) over \([0,x_{0}]\).^{Footnote 46} (26) then gives \(R^{\prime }(x)=0\) and thus \(h(x)=0\) for all \(x\in [0,x_{0}]\). From the same argument as in Step 1, we have \(m(x)=\widehat{m}(x)\), and thus \(h(x)>0,\) for all \(x\in [0,x_{0}]\), hence a contradiction.
We know from (6) and (7) that \(\widehat{I}^{\prime }(x)<m^{\prime }(x)\) when \(m^{\prime }(x)>0\), and thus Steps 1 and 5 prove Proposition 1.
Figure 8 illustrates the simulated trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\) under the calibration hypothesis introduced in Sect. 3.3, in the case of an exponential distribution function (Fig. 14).
Proof of Proposition 2
Suppose there are \(x_{1},x_{2},x_{3}\) in [0, a] such that \(x_{1}<x_{2} <x_{3},h(x)=0\) if \(x\in [x_{1},x_{2}]\) and \(h(x)>0\) if \(x\in (x_{2} ,x_{3}].\) Thus, m(x) and I(x) remain constant over \([x_{1},x_{2}]\), and we may write \(m(x)=m_{0}>0,I(x)=I_{0}>0,\) and \(R(x)=wP+I_{0}m_{0}=R_{0}\) in this interval. Let \(\varphi (x)\) be defined as in the proof of Proposition 1. Using (26), (30), and \(\delta (x)=h(x)=0\) if \(x\in [x_{1},x_{2}]\) yields
and
if \(x\in [x_{1},x_{2}].\) Let
We have
We also have \(\varphi (x)\le 0\) if \(x\in [x_{1},x_{2}]\) and \(\varphi (x_{2})=0\), which implies \(\varphi ^{\prime }(x_{2})_{}\ge 0\). (30), \(\delta (x_{2})=0,\) and \(\mu _{1}(x_{2})>0\)^{Footnote 47} give \(\gamma x_{2}v^{\prime }(m_{0})\le u^{\prime }(R_{0})\). If \({\rm{d}}f(x)/{\rm{d}}x\le 0\) and \({\rm{d}}^{2}\ln f(x)/{\rm{d}}x^{2}\ge 0\), then we have \(\Lambda ^{\prime }(x)\ge 0\) if \(x\le x_{2}\). Suppose there is \(x_{4}\in [0,x_{2}]\) such that \(\varphi (x_{4})=0\) and \(h(x)=0\) for all \(x\in [x_{4},x_{2}]\). Since \(\varphi (x)=0\) for all \(x\in [x_{2},x_{3}]\), we have \(\varphi ^{\prime \prime }(x_{2})_{+}=0\). Since \(I_{0}>0\), \(\mu _{1}(x)\) is differentiable at \(x=x_{2}\). Thus, using (30) and \(\delta (x)=0\) if \(x\in [x_{1},x_{2}]\) allows us to write
\(\Lambda (x_{2})_{}<0\) and \(\Lambda ^{\prime }(x)\ge 0\) then yield \(\varphi ^{\prime \prime }(x)<0\) for all \(x\in [x_{4},x_{2}]\). Since \(\varphi (x_{2})=0\) and \(\varphi ^{\prime }(x_{2})_{}\ge 0\), we have \(\varphi (x)<0\) for \(x<x_{2},\) x close to \(x_{2}\). Since \(\varphi (x_{2})=\varphi (x_{4})=0\), there is \(x_{5}\in (x_{4},x_{2})\) where \(\varphi (x)\) has a local minimum, and thus such that \(\varphi ^{\prime \prime }(x_{5})\ge 0\), which contradicts \(\varphi ^{\prime \prime }(x)<0\) for all \(x\in [x_{4},x_{2}].\) Thus, \(\varphi (x)<0\) for all x in \([0,x_{2})\), which contradicts \(\varphi (0)=0\). Hence, if \(h(x)>0\) in an interval \((x_{2},x_{3}]\), then \(h(x)>0\) in \([0,x_{3}]\), which shows that there exists \(\overline{x} \in [0,a]\) such that \(h(x)>0\) if \(x<\overline{x}\) and \(h(x)=0\) if \(h(x)>\overline{x}\). We observe that \(\overline{x}>0\), for otherwise we would have \(I(x)=0\) for all x in [0, a].
Finally, if \(x\in (0,\overline{x})\) we have \(\mu _{1}(x)>0,\delta (x)=0,\varphi ^{\prime }(x)=0,\) and thus (30) gives \(\gamma xv^{\prime }(m(x))<u^{\prime }(R(x))\). Using (6) then yields \(\widehat{I}^{\prime }(x)>0\). \(\square\)
Proof of Corollary 1
For notational simplicity, assume \(a=1\) and \(f(x)=1\) for all \(x\in [0,1]\). Suppose \(\overline{x}<1\). Using (30) and \(h(x)=\delta (x)=0\) if \(x\in [\overline{x},1]\) gives
if \(x\in (\overline{x},1]\). The same argument as in the proof of Proposition 2 gives \(\overline{\varphi }^{\prime \prime }=\varphi ^{\prime \prime }(\overline{x})_{+}<\varphi ^{\prime \prime }(\overline{x})_{}=0\). Since \(\varphi ^{\prime }(\overline{x})_{+}\le 0\), we have \(\varphi ^{\prime }(x)<0\) for all \(x\in [\overline{x},1]\), which contradicts \(\varphi (\overline{x} )=\varphi (1)=0\). \(\square\)
Proof of Corollary 2
Assume \(f(a)=f^{\prime }(a)=0\) and \(f^{\prime \prime }(a)>0\). Suppose \(\overline{x}=a\) and thus \(h(x)>0\) for all \(x\in [0,a]\).^{Footnote 48} We also have \(\varphi ^{\prime }(x)=\delta (x)=0\) for all x. Differentiating (30) gives
where
The rest of the proof is in three steps.
 Step 1:

\(J(x)>0\) if \(x\in (0,a)\) and \(J(a)=J^{\prime }(a)=J^{\prime \prime }(a)=h(a)=0\).
Using \(K(x)<0,v^{\prime \prime }(m(x))\le 0,\mu _{1}(x)>0,\) and \(h(x)>0\) gives \(J(x)>0\) if \(x\in (0,a)\). Using \(\mu _{1}(a)=f(a)=0\) gives \(J(a)=0\). Furthermore, we have
Using \(\mu _{1}(a)=f(a)=0,\delta (x)=0\) for all x and (26) gives \(\mu _{1}^{\prime }(a)=0\). (33) and \(d\ln f(x)/{\rm{d}}x\) \(\nrightarrow \infty ,d^{2}\ln f(x)/{\rm{d}}x^{2}\nrightarrow \pm \infty\) when \(x\rightarrow a\) gives \(J^{\prime }(a)=0\). Since \(J(x)>0\) if \(x\in (0,a)\) and \(J(a)=J^{\prime }(a)=0,\) we deduce that J(x) reaches a local minimum over [0, a] at \(x=a\), which implies \(J^{\prime \prime }(a)\ge 0\).
Using L’Hôpital’s rule twice yields \(h(a)=v^{\prime }(m(a))J^{\prime \prime }(a)/\lambda aK(a)f^{\prime \prime }(a)=0\). Since \(h(x)\ge 0\) for all x, we deduce \(J^{\prime \prime }(a)\le 0\), and thus \(J^{\prime \prime } (a)=h(a)=0.\)
 Step 2:

\(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))=2\lambda.\)
Since \(f(a)=f^{\prime }(a)=\mu _{1}(a)=\mu _{1}^{\prime }(a)=0\), we deduce \(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))\) from (26) and \(\varphi ^{\prime }(x)\equiv 0\) by using the L’Hôpital’s rule twice. Furthermore, (26) gives \(\mu _{1}^{\prime \prime }(a)=0\) and (33) then yields \(J^{\prime \prime }(a)=f^{\prime \prime }(a)[2\lambda u^{\prime }(R(a))]\), which implies \(u^{\prime }(R(a))=2\lambda\).
 Step 3:

Let \(\xi (x)\equiv u^{\prime }(R(x))\varphi ^{\prime }(x)\), where \(\varphi (x)\) is defined by (25). We have \(\xi ^{\prime \prime \prime }(a)<0\), which contradicts \(\varphi (x)=0\) for all \(x\in [0,a]\) when \(\overline{x} =a\)
\(\overline{x}=a\) implies \(\xi (x)=0\) for all \(x\in [0,a]\). We may write \(\xi (x)=\lambda f(x)\Delta _{1}(x)\gamma \Delta _{2}(x)\), with \(\Delta _{1}(x)=u^{\prime }(R(x))\gamma xv^{\prime }(m(x)),\Delta _{2}(x)=\mu _{1}(x)v^{\prime }(m(x))\). We have \(\Delta _{1}(a)=0,\Delta _{1}^{\prime }(a)=\gamma v^{\prime }(m(a))\) from \(h(a)=0\) and \(u^{\prime }(R(a))=\gamma av^{\prime }(m(a))\). Using (26) and Step 2 gives
We have
and thus, using \(\Delta _{1}(a)=0\) and \(f(a)=f^{\prime }(a)=0\), we may write
\(\square\)
Proof of Proposition 3
The optimal nonlinear indemnity schedule I(m) is such that
for all \(m\in (0,\overline{m})\). Thus, (6), (7), (30), and \(\varphi ^{\prime }(x)=\delta (x)=0\) if \(x\in (0,\overline{x})\) give
which implies \(I^{\prime }(m)\in (0,1)\) for all \(m\in (0,\overline{m}),I^{\prime }(\overline{m})=0\) if \(\overline{x}=a,I^{\prime }(\overline{m})>0\) if \(\overline{x}<a\), where \(\overline{m}=\) \(m(\overline{x}).\)
All types \(x\ge \overline{x}\) choose \(\overline{m}=m(\overline{x}),\) and thus the optimal allocation is sustained by an indemnity schedule such that \(I(m)=I(\overline{m})\) for \(m\ge \overline{m}\).
Let \(I^{\prime }(0)={\lim}_{X\rightarrow 0}I^{\prime }(m)\ge 0\). The rest of the proof shows that \(mv^{\prime \prime }(m)/v^{\prime }(m)\rightarrow \eta \in (0,1)\) when \(m\rightarrow 0\) (an assumption made in what follows) is a sufficient condition for \(I^{\prime }(0)>0\). The following lemma will be an intermediary step in an a contrario reasoning. \(\square\)
Lemma 5
Suppose \(I^{\prime }(0)=0\), then (i) \(h(x)\rightarrow +\infty\) when \(x\rightarrow 0\). (ii) There exists a sequence \(\{x_{n},n\in \mathbb {N\}} \subset (0,a]\) such that \(0<x_{n+1}<x_{n}\) for all \(n,x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) and \(m(x_{n})/x_{n}>h(x_{n})\) for all \(n\in \mathbb {N}\).
Proof of Lemma 5

(i)
Note that \(I^{\prime }(0)=0\) implies \(C(x)\equiv xv^{\prime }(m(x))\rightarrow u^{\prime }(wP)/\gamma\) when \(x\rightarrow 0\). If (i) does not hold, then there exists a sequence \(\{x_{n},n\in \mathbb {N\}}\subset (0,a]\) such that \(0<x_{n+1}<x_{n}\) for all \(n,x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) and \(h(x_{n})\rightarrow \overline{h}<+\infty\) when \(n\rightarrow +\infty\). Using \(v(0)=0\) and L’Hôpital’s rule yields
$$\begin{aligned} \underset{x\rightarrow 0}{\lim }C(x)=\frac{1}{\underset{x\rightarrow 0}{\lim }\left[ \frac{v^{\prime \prime }(m(x))}{v^{\prime }(m(x))^{2}}h(x)\right] }=\frac{1}{\eta \overline{h}}\underset{x\rightarrow 0}{\lim }\left[ m(x)v^{\prime }(m(x))\right] . \end{aligned}$$Furthermore, \(mv^{\prime \prime }(m)/v^{\prime }(m)\rightarrow \eta >0\) implies \(mv^{\prime }(m)\rightarrow 0\) when \(m\rightarrow 0\). Hence, \(C(x)\rightarrow 0\) when \(x\rightarrow 0\), which contradicts \(C(x)\rightarrow u^{\prime }(wP)/\gamma >0\) when \(x\rightarrow 0\).

(ii)
Let \(x_{0}\) such that h(x) is continuous over \((0,x_{0}]\) and consider the decreasing sequence \(\{x_{n},n\in \mathbb {N\}}\) defined by \(x_{n}=\sup \{x\in (0,x_{0}]\left h(x^{\prime })\ge n\text { if }x^{\prime }\le x\right. \}\). \(x_{n}\) is well defined and such that \(x_{n}\rightarrow 0\) when \(n\rightarrow \infty\) from (i) and, using the continuity of h(x), we have \(h(x_{n})=n\) and \(h(x)>n\) if \(x<x_{n}\). Thus,
$$\frac{m(x_{n})}{x_{n}}=\frac{\int \nolimits _{0}^{x_{n}}h(x){\rm{d}}x}{x_{n}} >n=h(x_{n}),$$which completes the proof of (ii).
We are now in the position to end up the proof of the Proposition. Let us suppose \(I^{\prime }(0)=0\), and let \(D(x)\equiv \gamma xv^{\prime } (m(x))u^{\prime }(R(x))\) with \(D(x)<0\) if \(x>0\) from \(\widehat{I}^{\prime }(x)>0\), and \(D(0)=0\) from \(I^{\prime }(0)=0\). We thus have \(D^{\prime }(x)<0\) for x close to 0. We have
Consider the sequence \(\{x_{n},n\in \mathbb {N\}}\) defined in Lemma 5(ii). Using \(m(x_{n})/x_{n}>h(x_{n})\) gives
Since \(x_{n}\rightarrow 0\) when \(n\rightarrow +\infty ,u^{\prime \prime }(R(x))/u^{\prime }(R(x))\rightarrow u^{\prime \prime }(wP)/u^{\prime }(wP)\) and \(m(x)\rightarrow 0\) when \(x\rightarrow 0\), and \(v^{\prime \prime }(m)m/v^{\prime }(m)\rightarrow \eta\) when \(m\rightarrow 0\), we deduce that \(\eta <1\) is a sufficient condition for \(D^{\prime }(x_{n})>0\) when n is large enough, which is a contradiction. We deduce \(I^{\prime }(0)>0\) when \(\eta <1\). \(\square\)
Appendix 2
1.1 Computational approach
Our simulations are performed through a discretization method. Under the notations that are standard in this field, an optimal control problem is usually written as follows, by denoting x the vector of state variables and u the vector of controls that are function of time \(t\in \mathbb {R}\):
The time discretization is as follows:
We therefore obtain a nonlinear programming problem on the discretized state and control variables. In BOCOP, the discretized nonlinear optimization problem is solved by the Ipopt solver that implements a primal–dual interior point algorithm; see Wächter and Biegler (2006). The derivatives required for the optimization are computed by the automatic differentiation tool AdolC; see Walther and Griewank (2012).
1.2 Complementary proofs
Proof of Lemma 2
Let \(\widehat{I}(x),\) \(x\in [0,x^{*}],\) P, and \(c^{*}\) be given, with \(I^{*}=\) \(\widehat{I}(x^{*}),m^{*}=m(x^{*}),\) and \(I^{*}\le m^{*}\). Consider the subproblem in which \(\{\widehat{I} (x),m(x),g(x),h(x)\), \(x\in [x^{*},a]\}\) maximizes
Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables, respectively, for \(\widehat{I}(x)\) and m(x) and let \(\eta (x)\) and \(\lambda\) be Lagrange multipliers, respectively, for (11) and (12) in this subproblem.^{Footnote 49} The Hamiltonian is written as
and the optimality conditions are
for all x, with the transversality conditions \(\mu _{1}(a)=\mu _{2}(a)=0,\) and \(\eta (x)\ge 0\) for all x and \(\eta (x)=0\) if \(h(x)>g(x).\)
Let us consider \(x_{0}\in [x^{*},a]\) such that \(g(x)>0\) if x is in a neighborhood \(\mathcal {V}\) of \(x_{0}.\) Suppose \(h(x)>g(x)\), and thus \(\eta (x)=0\) if \(x\in \mathcal {V}\). Equation (35) gives \(\mu _{1}(x)=0\), and thus \(\mu _{1}^{\prime }(x)=0\) for all x \(\in \mathcal {V}\). Then (37) gives \(u^{\prime }(R(x))=\lambda\), and thus \(R(x)=wPm(x)+\widehat{I}(x)\) is constant in \(\mathcal {V}\). This implies \(m^{\prime }(x)\widehat{I}^{\prime } (x)=h(x)g(x)=0\), which contradicts \(h(x)>g(x)\). We deduce that \(h(x)=g(x)\) if \(x\in \mathcal {V}\). (35) and (36) yield \(\mu _{1}(x)=\mu _{2}(x)=\eta (x),\) and thus \(\mu _{1}^{\prime }(x)=\mu _{2}^{\prime }(x)\), for all \(x\in \mathcal {V}\). (37) and (38) then imply \(\gamma xv^{\prime }(m(x))=\lambda\) for all \(x\in \mathcal {V}\), which gives \(m^{\prime }(x)\) \(=v^{\prime }(m(x))/xv^{\prime \prime }(m(x))\).
Let \(x_{0},x_{1},x_{2}\in [x^{*},a]\) such that \(x_{0}<x_{1}<x_{2}\) with \(g(x)=0\) if \(x\in [x_{0},x_{1}]\) and \(g(x)>0\) if \(x\in (x_{1} ,x_{2}]\). Let us show that we cannot have \(g(x)>0\) if \(x\in [x_{3} ,x_{0}]\) with \(x_{3}<x_{0}\). We have \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{0},x_{1})\) and \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{1},x_{2}]\). Let \(\Psi (x)\equiv [\mu _{1}^{\prime }(x)+\mu _{2}^{\prime }(x)]/f(x)\), with \(\Psi (x_{1})=0\) because \(\mu _{1}(x)+\mu _{2}(x)\) reaches a local maximum at \(x=x_{1}\). Note that \(\Psi (x)\) is differentiable. Let \(x\in [x_{0},x_{1})\). If \(m^{\prime }(x)=0\) (and thus \(R^{\prime }(x)=0\)), we have \(d[\mu _{1}^{\prime }(x)/f(x)]/{\rm{d}}x=0\) and \(d[\mu _{2}^{\prime }(x)/f(x)]/{\rm{d}}x=\gamma v^{\prime }(m(x_{1}))<0\), and thus \(\Psi ^{\prime }(x)<0\). If \(m^{\prime }(x)>0\) (and thus \(R^{\prime }(x)<0\)), we have \(\eta (x)=\mu _{2}(x)=\mu _{2}^{\prime }(x)=0\) and \(d[\mu _{1}^{\prime }(x)/f(x)]=u^{\prime \prime }(R(x))R^{\prime }(x)<0\), and thus we still have \(\Psi ^{\prime }(x)<0\). Suppose \(g(x)>0\) if \(x\in [x_{3},x_{0}]\) with \(x_{3}<x_{0}\). In that case we would have \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{3},x_{0}],\) and since \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{0},x_{1})\), we would have \(\Psi (x_{0})=0\). This contradicts \(\Psi (x_{1})=0,\Psi ^{\prime }(x)<0\) if \(x\in [x_{0},x_{1})\).
Suppose there are \(x_{0},x_{1},x_{2}\in [x^{*},a]\) such that \(x_{0}<x_{1}<x_{2}\) with \(g(x)>0\) if \(x\in [x_{0},x_{1}]\) and \(g(x)=0\) if \(x\in (x_{1},x_{2}]\). In that case \(\mu _{1}(x)+\mu _{2}(x)=0\) if \(x\in [x_{0},x_{1}]\) and \(\mu _{1}(x)+\mu _{2}(x)\le 0\) if \(x\in [x_{1},x_{2}]\). Since \(\mu _{1}(a)+\mu _{2}(a)=0\) and \(\mu _{1}(x)\) and \(\mu _{2}(x)\) are continuous, we may choose \(x_{2}\) such that \(\mu _{1}(x_{2})+\mu _{2}(x_{2})=0\). The same calculation as above implies \(\Psi (x_{1})=0,\) \(\Psi ^{\prime }(x)<0\) if \(x\in [x_{1},x_{2}]\) and thus \(\Psi (x)<0\) if \(x\in [x_{1},x_{2}]\), which contradicts \(\mu _{1}(x_{2})+\mu _{2}(x_{2})=0\).
Overall, we deduce that there exists \(\widehat{x}\in [x^{*},a]\) such that \(\widehat{I}^{\prime }(x)=0\) if \(x\in [x^{*},\widehat{x}]\) and \(\widehat{I}^{\prime }(x)=m^{\prime }(x)>0\) if \(x\in [\widehat{x},a]\). The same reasoning—replacing \(\Psi (x)\) by \(\Phi (x)\equiv \mu _{2}^{\prime }(x)/f(x)\)—shows that there exists \(\widetilde{x}\in [x^{*},\widehat{x}]\) such that \(m^{\prime }(x)=0\), and thus \(m(x)=m^{*}\), if \(x\in [x^{*},\widetilde{x}]\) and \(m^{\prime }(x)>0\) if \(x\in [\widetilde{x},\widehat{x}]\). When \(m^{\prime }(x)>0\), we have \(\eta (x)=\mu _{2}(x)=0\), and thus \(\mu _{2}^{\prime }(x)\equiv 0\) if \(x\in [\widetilde{x},\widehat{x}]\), which gives \(u^{\prime }(wPm(x)+I^{*})=\gamma xv^{\prime }(m(x))\), and thus \(m^{\prime }(x)\equiv \gamma v^{\prime }(m(x))/[\gamma xv^{\prime \prime }(m(x))+u^{\prime \prime }(wPm(x)+I^{*})]\). When \(m^{\prime }(x)=0\), we have \(\Phi ^{\prime }(x)<0\) if \([x^{*},\widetilde{x})\) and \(\Phi ^{\prime }(\widetilde{x})=0\), and thus \(\widetilde{x}\) is given by \(u^{\prime }(wPm^{*}+I^{*})=\gamma \widetilde{x}v^{\prime }(m^{*})\) if \(u^{\prime }(wPm^{*}+I^{*})>\gamma x^{*}v^{\prime }(m^{*})\), and \(\widetilde{x}=x^{*}\) if \(u^{\prime }(wPm^{*}+I^{*})=\gamma x^{*}v^{\prime }(m^{*})\).
If \(x^{*}<\widehat{x}\), then replacing \(m^{*}\) by \(\widehat{m}\equiv m(\widehat{x})>m^{*}\) implements the same allocation with lower audit costs. Indeed, m(x) is an optimal choice of type x individuals if \(x>\widehat{x}\), because such individuals would prefer choosing \(\widehat{m}\) rather than any \(m\in [0,\widehat{m}),\) and furthermore, for such individuals, there is full coverage at the margin in \((\widehat{m},m(x)]\) and they cannot choose expenses larger than m(x). In addition, the expected audit cost decreases from \(c[1F(x^{*})]\) to \(c[1F(\widehat{x})]\) when \(\widehat{m}\) is substituted for \(m^{*}\). Thus, an optimal allocation is necessarily such that \(x^{*}=\widehat{x}.\)
Proof of Proposition 4
Let \(\mu _{1}(x)\) and \(\mu _{2}(x)\) be costate variables, respectively, for \(\widehat{I}(x)\) and m(x) and let \(\delta (x)\) and \(\lambda\) be Lagrange multipliers, respectively, for (9) and (20). The Hamiltonian is written as in the proof of Proposition 1, and the optimality conditions (25), (26), and (27) still hold. We also have \(\delta (x)\ge 0\) and \(\delta (x)=0\) if \(\widehat{I} (x)>0\), and \(\mu _{1}(x^{*})+\mu _{2}(x^{*})=0\) from the characterization of the optimal continuation allocation. The optimality conditions on \(m^{*},I^{*},x^{*},P,\) and A are written as
respectively, where \(V_{1}^{\prime },V_{2}^{\prime },...\) denote the partial derivatives of \(V(m^{*},I^{*},x^{*},P,A)\) and \(R^{*}\equiv R(x^{*})=wPm^{*}+I^{*}\). Define \(\varphi (x)\) for all \(x\in [0,x^{*}]\) by (25) as in the proof of Proposition 1.
 Step 1:

\(m(x)>0\) for all \(x>0.\)
Identical to Step 1 in the proof of Proposition 1.
 Step 2:

\(\mu _{1}(x)\) is continuous in \([0,x^{*}]\) with \(\mu _{1}(x)=0\) for all \(x\in [0,x^{*}]\) such that \(\widehat{I}(x)=0.\)
Identical to Step 2 in the proof of Proposition 1.
 Step 3:

\(\mu _{1}(x)\ge 0\) for all \(x\in [0,x^{*}]\) with \(\mu _{1}(x^{*})>0.\)
We know from Lemma 4 that \(R(x)=wPm^{*}+I^{*}\) and
for all \(x\in [x^{*},a]\). Thus,
and (40) gives \(\mu _{1}(x^{*})>0\). The remaining part of Step 3 is the same as in the proof of Proposition 1.
 Step 4:

\(\widehat{I}(x)>0\) for all \(x\in (0,x^{*}].\)
Identical to Steps 4 and 5 in the proof of Proposition 1.
 Step 5:

\(x^{*}>0.\)
We have
from the definition of V(.). Thus (41) and \(\delta (x^{*})=0\) give
which implies \(x^{*}>0\).
 Step 6:

There is \(\overline{x}\in (0,x^{*}]\) such that
$$\begin{aligned} \widehat{I}^{\prime }(x)&>0,h(x)=m^{\prime }(x)>\,0\,{\text{if}}\,0<x<\overline{x},\\ \widehat{I}(x)&=\widehat{I}(\overline{x}),m(x)=m(\overline{x} ),h(x)=\,0\,{\text{if}}\,\overline{x}<x\le x^{*},\\ \widehat{I}^{\prime }(0)&=\,0,\widehat{I}^{\prime }(\overline{x} )=\,0\,{\text{if}}\,\overline{x}=a\,{\text{and}} \widehat{I}^{\prime }(\overline{x})>\,0\,{\text {if}}\,\overline{x}<x^{*}. \end{aligned}$$
Identical to the proof of Proposition 2.
Finally, \(\mu _{1}(x^{*})>0\) shows that there is an upward discontinuity in m(x) and \(\widehat{I}(x)\) at \(x=x^{*}\). \(\square\)
Proof of Proposition 5
Using \(x^{*}>0\) and \(m^{\prime }(x)>0\) if \(x\in (0,\overline{x})\) gives \(m^{*}>0\). The remaining part of the Proposition is a straightforward adaptation of Proposition 3. \(\square\)
Proof of Lemma 3
Similar to Lemma 1, with straightforward adaptation. \(\square\)
Proof of Lemma 4
We now have
A straightforward adaptation of the proof of Lemma 1 shows that (17) is a necessary condition for incentive compatibility. (17) gives
where
Using \(U_{H^{2}}^{\prime \prime }<0\) and \(U_{RH}^{\prime \prime }>0\) gives \(A(x,\widetilde{x})>1\) if \(\widetilde{x}>x\) and \(A(x,\widetilde{x})<1\) if \(\widetilde{x}<x\), with \(A_{\widetilde{x}}^{\prime }(x,\widetilde{x} )_{\left \widetilde{x}=x\right. }>0\), and thus^{Footnote 50}
Thus incentive compatibility gives (18). Conversely, assume that (17) and (18) hold. We have
which implies incentive compatibility. \(\square\)
Proof of Proposition 6
The notations of costate variables and Lagrange multipliers are the same as in the proof of Proposition 1. Observe first that Steps 1–4 of this proof remain valid, with an unchanged definition of \(\varphi (x)\), just replacing (30) by
and \(\lambda\) by \(\lambda (1+\sigma )\) in (26).
Suppose that \(\widehat{I}^{\prime }(x)>0\) if \(x<\varepsilon\), with \(\varepsilon >0\). Hence \(\widehat{I}(x)>0\) (and thus \(\delta (x)=0\)) for all \(x>0\). Using (6) gives
if \(x<\varepsilon\). (45) implies \(\varphi (x)=\varphi ^{\prime }(x)=0\) if \(x<\varepsilon\). Furthermore, using (26) (in which \(\lambda\) is replaced by \(\lambda (1+\sigma )\)), (29), and \(\mu _{1}(a)=0\) yields
and thus \(\mu _{1}(x)<0\) for x small enough. Equations (44) and (46) then yield \(\varphi ^{\prime }(x)>0\), hence a contradiction. Since we know from Step 4 that \(\widehat{I}(x)\) is nondecreasing, we deduce that there exists \(d>0\) such that \(\widehat{I}(x)=0\) if \(x\le d\) and \(\widehat{I}(x)>0\) if \(x>d\).
The simulated trajectories of \(\mu _{1}(x)\) and \(\mu _{2}(x)\) are illustrated in Fig. 15 in the case of an exponential distribution function, with \(\sigma =0.1\) and with the same calibration as in Sect 3.3. We have \(\mu _{1}(x)=\mu _{2}(x)=0\) when \(x\le d\) and \(\mu _{1}(x)>0,\mu _{2}(x)<0\) when \(x>d\), with \(d\simeq 0.41\).
The characterization of the indemnity schedule I(m) is derived in the same way as in Proposition 3, with \(D=m(d)\).^{Footnote 51} \(\square\)
Proof of Corollary 3
Similar to Corollary 1. \(\square\)
Proof of Corollary 4
Similar to Corollary 2. \(\square\)
Rights and permissions
About this article
Cite this article
Martinon, P., Picard, P. & Raj, A. On the design of optimal health insurance contracts under ex post moral hazard. Geneva Risk Insur Rev 43, 137–185 (2018). https://doi.org/10.1057/s107130180034y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/s107130180034y