Introduction

Asymmetric information has long been alleged to cause inefficiencies in insurance markets. However, research has shown that the empirical findings regarding automobile insurance markets are ambiguous regarding the core prediction that individuals with extensive coverage are more likely to be high risks for the insurer.Footnote 1 Most previous papers have interpreted the absence of a significant coverage–risk correlation to mean that the contract-relevant information asymmetry is successfully handled by the insurer. Other explanations such as the absence of useful private information and policyholders’ inability to act on private information have also been suggested. In addition, previous research has found that both higher and lower risks demand insurance, which in aggregate suggests that those with more insurance are not higher risks, and that the market can suffer from inefficiencies even in the absence of a positive correlation between insurance coverage and risk occurrence.Footnote 2 Cohen and SiegelmanFootnote 3 argue that rather than trying to resolve the question of the existence of information asymmetries once and for all, future work should try to identify circumstances under which one may expect to find evidence of relevant information asymmetry. Since market heterogeneity may play an important role, it is difficult to generalise across insurance markets and between countries. It is furthermore reasonable that the correlation structure differs across subsets of policyholders. One reason is that the information asymmetry between the insurer and the policyholder is not constant and may therefore differ across groups, for example, between new and long-term policyholders.

This paper contributes to the empirical coverage–risk literature by testing for information asymmetries with the explicit use of policyholders’ private information on risky behaviour (traffic violations). The analysis is based on a rich data set of automobile insurance policies, provided by one of Sweden's largest insurance companies. Private information is represented by observed traffic safety violations in terms of on-the-spot fines and convictions for traffic offences. This information is unobserved and unattainable by law for Swedish insurance companies and therefore is not used in setting premiums. Additionally, because Swedish insurers are not allowed to share claim history (and other pricing characteristics), it is not possible for them to observe risky behaviour among new customers via previous claims other than by the policyholders’ self-reported claim history. This implies that high-risk drivers have incentives to switch to a new insurer and under-report claim history after a claim is made with the current insurer.Footnote 4 Hence, the information asymmetry is likely to be larger for new policyholders and decrease over time as the insurer receives more observations on the policyholder.

This study differs from previous studies in four ways: First, we include policyholders’ private information on risky behaviour (traffic violations), using a sample of new policyholders for whom the information asymmetry is likely to be largest. The advantage is that we are able to directly observe the effect of private information on risky behaviour, which implies that our conclusions are not all dependent on the existence of a coverage–risk correlation. Second, we use several subgroups of new policyholders that correspond to the insurer's group classification by age and gender, which provide more homogeneous subgroups compared to the previous literature. Third, we put a restriction on vehicle age, since it may be an important determinant of choice of coverage and how the vehicle is used. Fourth, with access to all information that the insurance company uses in setting prices, we test whether the existence of private information confirms the positive correlation between risk and coverage predicted by theory.Footnote 5 The risk–coverage correlation calls for a remark: A positive and significant correlation is a central prediction of both adverse selection and moral hazard and only suggests that the presence of adverse selection or moral hazard cannot be rejected. However, disentangling adverse selection and moral hazard as well as propitious selection and preventive actions from each other is beyond the scope of the present paper.

Two approaches are used. The first is the widely used correlation test suggested by Chiappori and Salanié.Footnote 6 If there exists a significant correlation between risk and coverage, the null hypothesis of no residual asymmetric information is rejected. Second, we use an approach similar to that suggested by Finkelstein and McGarry,2 where the effect of private information on traffic violations (risky behaviour) is directly observed.

The results suggest that there is residual private information that is positive and statistically significant for three groups, which is unaffected when including private information in risky behaviour; hence the correlation test does not seem to capture private information on risky behaviour.

According to theory, adverse selection models predict that high risk types will purchase higher insurance coverage when policyholders have private information about their risk type. In contrast to the theoretical prediction, we find that private information that the insured has higher risk actually decreases the probability of full insurance coverage. One exception is risky behaviour in terms of speeding, since the speeding coefficient varies for different groups of policyholders: Younger age groups are more likely to have full insurance while older age groups are less likely to have full insurance. The varying results of speeding may be explained by “common behaviour”; that is, speeding is generally viewed as an accepted violation and may therefore not be thought of as a risky behaviour by the policyholder. Hence, the policyholder may not account for, or use, this information when purchasing insurance, which results in a varying pattern by chance. Major infractions and traffic offences other than speeding tend to be unacceptable, and, according to psychological research, individuals with these infractions perceive themselves as less risky.Footnote 7 In that sense it is rational to purchase less insurance, which can explain why policyholders with higher risk (other than speeding) have less than full insurance. Without information on the policyholders’ perceived risk, it is difficult to fully understand why speeding tends to reflect a varying pattern and why higher risks are essentially less likely to be fully insured. Further research could benefit from survey data on infractions, perceived risk and insurance. Moreover, previous research has established that traffic violations have a significant effect on crash rates.Footnote 8 Our results weakly support this since few of the coefficients of risky behaviour are significant. However, if high-risk individuals tend to have less coverage, they will have even less incentive to report a claim to their insurer.

All in all, our results may explain why there previously has been ambiguity as to whether or not empirical findings support the presence of adverse selection and/or moral hazard in the automobile insurance market. If high-risk drivers essentially are less prone to have extensive insurance, we cannot expect to find a positive correlation predicted by theory. We conclude that it is preferable to study private information explicitly, since it is possible to directly observe how the market is affected by asymmetric information, rather than trying to interpret (the lack of) a significant correlation.

The rest of the paper is organised as follows. The next section provides a summary of prior theoretical and empirical research with a focus on insurance markets. The section also contains information on insurance coverage and risk classification in the Swedish automobile insurance market. The subsequent section describes the empirical approach in terms of data and econometrics in more detail. The penultimate section presents the results and the last section concludes the paper.

Background

Previous work

Ever since the 1970s, the theoretical research on asymmetric information has developed at a quick pace. The prediction is that asymmetric information is a fundamental problem in most insurance markets: Policyholders are heterogeneous in risk and this risk level is private (hidden) information that is important for the contract, but unobservable to the insurer. According to the standard interpretation, the asymmetry results in a situation where high-risk individuals are associated with extensive insurance coverage, which predicts a positive correlation between ex post risk and extensive coverage.Footnote 9 Several studies, both theoretical and empirical, have also suggested the possibility of propitious (favourable) selection. These individuals have a high demand for insurance and are good risks ex post, and this selection predicts a negative correlation between insurance coverage and ex post risk occurrence.Footnote 10,Footnote 11

Empirical research on asymmetric information lagged behind and did not significantly evolve until the 1990s. As discussed by Chiappori and Salanié,Footnote 12 data from insurers are well suited for studies of asymmetric information, because they record choice of coverage and outcome (claim or not), as well as many characteristics of policyholders. Empirical studies have used data from different insurance markets and found evidence of a coverage–risk correlation.Footnote 13 Yet, empirical tests on property/liability insurance, where automobile insurance data have been used, do not provide any strong evidence of information asymmetries that affect the level of risk in the contract.Footnote 14 Three early studies suggested the presence of a positive correlation, but these were later criticised as unreliable.Footnote 15 Dionne et al.Footnote 16 suggest that the insurers’ information set is sufficient if non-linear effects, not considered by Puelz and Snow, are taken into account. A sufficient risk classification implies that there is no residual adverse selection in each risk class, since groups are homogeneous in risk. Neither do they find evidence of information asymmetries using French automobile insurance data. To overcome previous difficulties with estimation, Chiappori and Salanié6 (hereafter C&S) introduced a simple and general test of the presence of asymmetric information. When this test was applied to a homogeneous sample of inexperienced drivers in the French automobile insurance market, no significant correlation was found.

CohenFootnote 17 argues that young drivers may not have private information since they have not learned their own risk type; that is, when policyholders learn their risk type, they develop private information. The study takes several implications of the previous critique into account, and uses a rich data set of the first five years of one start-up insurer in Israel. When applying the C&S correlation test to policyholders with less than three years of driving experience, the results are confirmed, since no significant correlation is found. However, for a group with more than three years of driving experience, Cohen finds a significant correlation between risk and coverage. The main conclusion, as drawn from results indicating that low deductible contracts are associated with more claims, is that the market is characterised by asymmetric information.

Finkelstein and McGarry2 further consider the policyholder's private information on risk in the long-term medical care insurance market. Their findings indicate that two types of individuals buy insurance: Those with private beliefs that they are high risks and those with a strong taste for insurance. Ex post, the former are a higher risk and the latter a lower risk to the insurer. They conclude that, in aggregate, individuals with more insurance are not higher risks, and that an equilibrium with multiple forms of private information is unlikely to be efficient relative to the first best. One reason is that premiums may not be actuarially fair.

This paper differs from the previous literature mainly because we include policyholders’ private information on risky behaviour (traffic violations) in the analysis. Since pricing characteristics and previous claims are not shared between insurers, we expect that the market will suffer from this asymmetric information if policyholders use their private information in their insurance decision. Another advantage is that we, via the insurance company, received access to all information that the insurance company uses in setting prices. We also put a restriction on vehicle age since the value likely affects the insurance decision. New vehicles generally do not need full coverage since they are often covered by a warranty, and old vehicles may have too low a value. Hence vehicle age will determine how much insurance to purchase. In line with previous research, we divide policyholders into homogeneous groups, but in contrast we perform the analysis on smaller subgroups with respect to age and gender (as used by the insurer).

Automobile insurance and premium pricing in Sweden

Swedish law requires all vehicle owners to purchase traffic insurance, which is a liability insurance that covers accident damage to other drivers and their cars. Hence, the vehicle owner is equivalent to the policyholder. Note that other drivers do not have to be included in the insurance contract in order to drive the vehicle, which implies that drivers other than the policyholders are not considered when setting prices. The main reason is that Swedish automobile insurance is a property insurance, and driver and passenger costs, such as hospital care, in general, are covered by social insurance, which in turn is financed via tax.

Table 1 provides a summary of the Swedish automobile insurance policies. All-risk insurance (ARI) is the most extensive coverage on offer since it also indemnifies damages to the policyholder's own car when s/he is at fault in the claim. ARI is typically differentiated by the value of the deductible where the lower deductible provides the most extensive coverage. Additional insurance provides extra service such as a replacement car while the insured car is being repaired. The most typical comprehensive coverage in Sweden is ARI, which we focus on in this paper.

Table 1 Swedish automobile insurance

Previous studies have pointed out the importance of careful conditioning on the information set available to the insurance company. The information set is equivalent to all information that is observable and used in premium pricing by the insurance company. However, an important distinction must be made between the information set available to the insurer and the actual risk classification used in premium pricing. The information set is the basis for the actuarial prediction that results in a risk classification, and the preferred approach is therefore to condition on the companies’ actuarial risk classification. The main reason is that groups of individuals with similar risk classification are considered as homogeneous by the insurer. A proper implementation of the positive correlation test therefore requires that insurance demand is analysed across homogeneous groups of individuals who are likely to face the same set of possible insurance contracts. A misspecification may result in a spurious correlation; accuracy is therefore crucial.

All information on the insurance company and how it sets prices has been obtained by interviewing one of the actuaries at the company. According to the insurer, all Swedish automobile insurance companies base their risk classification on three main categories: Risk characteristics related to the driver, the vehicle and the residential area (see Table 2). There are several variables in most of these categories, which imply that the data are fairly rich. Information that statistically affects the expected cost of offering insurance is used to establish pricing. In this way, insurers develop a risk classification that is associated with observable characteristics. The insurance contracts are thereafter divided into homogeneous groups of risk according to observable characteristics, and individuals in the same group are charged the same insurance premium. Since the 1990s, each Swedish insurer has used its own formula for determining insurance premiums. The company studied in this paper used to have a bonus-malus system, but this was gradually phased out. However, the policyholder does receive a discount for every year s/he does not make a claim.

Table 2 The pricing variables

The insurers are not allowed to share information about previous claims, so the market structure is similar in that respect to the Israeli market studied by Cohen.17 The implication of not sharing claims is that policyholders may under-report their claim history when joining a new insurer in order to obtain a lower premium. This further implies that high-risk drivers have an incentive to switch insurer.

In addition, several pricing variables are based on policyholders’ self-reports, for example:

  1. i)

    Annual mileage, which consists of 1–5 risk classes. Most policyholders claim risk class 2 (10,000–15,000 km/year).

  2. ii)

    Vehicle owner (=policyholder) vs chief user of the vehicle. A common problem in Sweden is that a parent, who generally has favourable premium ratings due to seniority, is the signed owner, while the vehicle is used by a son or daughter, who generally has unfavourable premium ratings due to inexperience. Besides, it is often an advantage to let the woman in the household own and insure the vehicle.Footnote 18 This gender difference in premium rating evens out with age. The insurer cannot observe this chief user problem except when an accident, where a driver other than the owner is involved, is reported.

  3. iii)

    Residential area, which is the national registration address. It is often more expensive to insure the vehicle in a large city; hence, registering in a smaller town means a lower premium.

It is clear that policyholders have incentives to report untruthfully in order to receive a lower premium, and that the Swedish automobile insurance market may suffer from the above information asymmetry, since it obstructs the construction of homogeneous groups and premium pricing.

The empirical framework

Data

To investigate the nature of private information, we use a rich data set that includes all the information the insurer has about its policyholders. We add data on the policyholder's risky behaviour (traffic violations), which represent the policyholder's private information. It is possible for researchers to gain access to data on traffic violations after various applications, but otherwise this information is not available, attainable or possible to observe for Swedish insurers.Footnote 19

The insurer makes three main assumptions regarding the contracts. First, there is independence between contracts, the outcome for different insurance policies being independent. This implies that each contract is treated separately; an individual who owns and insures two vehicles is considered to own two contracts, one independent of the other. If the policyholder causes an accident with one of the vehicles, only the insurance contract associated with that vehicle is affected. Second, the company assumes that the cost of a claim in this period does not affect the cost in the next period (time independence). The argument is that an individual involved in an accident will drive more carefully and may be (i) less likely to have an accident in the future or (ii) a reckless driver who is always more prone to cause accidents. However, the insurer reduces the probability of claims by including a discount for each claim-free year with the company.Footnote 20 Third, homogeneity is assumed; an outcome with the same exposure has the same distribution within a risk group.Footnote 21 We therefore regard a repeated contract as a new observation and do not consider dependency between contracts owned by the same individual.Footnote 22 Accordingly, since the company lacks this information about new policyholders, our subsample consists of new policyholders for whom we do not take into account the number of years without claims.

The automobile insurance data used in this study come from an automobile insurance provider with 24 regional subsidiaries located in all the counties in Sweden; its market share is approximately 32 per cent of the property insurance market. All in all, the data set contains information on 2,424,525 policy-ids and 584,425 claims, and covers three years (2006–2008). Most of the contracts are repeated and the number of observations when including those is 9,342,749.Footnote 23 Each observation includes all the information that the insurer has about the policyholder, vehicle and contract characteristics.

Data on the number of convictions for traffic safety violations are registered by the Swedish National Council for Crime Prevention (BRÅ). These represent major infractions such as driving while intoxicated and driving very carelessly. Data on on-the-spot fines come from the central fines register of the Swedish National Police Board (RIOB) and represents minor infractions such as speeding, running red lights, overtaking at crossings, and other offences due to risky behaviour or vehicle flaws. Since RIOB is cleared periodically, it is possible to obtain at most five years from the current year.

Fines for speeding are separated out from traffic offences, since speeding is generally viewed as a socially acceptable violation, while convictions and other traffic offences are not.Footnote 24 Social acceptance may affect how policyholders perceive and account for their risk, which in turn affects how much insurance they purchase.Footnote 25 We further separate major infractions into one conviction (=1) and more than one conviction (>1). As we believe that relapsed criminals are higher risks, one conviction may be random, but not several. Fines for speeding will be referred to as “Speeding” and all other traffic offences as “Other traffic offences”. One conviction will be referred to as “One conviction” (=1) and more than one conviction (>1) as “Several convictions”.

The probability of detection when committing a traffic violation is unknown, but it is reasonable to assume that a repeat offender is caught at some time. All data are matched to personal identity numbers. Data in respect of fines and convictions have been merged with the insurance and claim files by BRÅ for our project. We have also merged the insurance and claim files and cleaned the data. Appendix D provides a list of all the information that is included in each observation.

The subsample used and descriptive statistics

We consider new policyholders, on whom the insurer has no previous observations, during their first year with the insurer, which provides us with a smaller subsample of 295,846 observations. Since automobile insurance is property/liability insurance, the contracts rather than the policyholders are considered.Footnote 26 More specifically, we sort data for policyholders who joined the insurer in 2007 and 2008, and include all contracts signed by new policyholders in 2007 and observe these contracts until they expire. For new policyholders in 2008, we observe all contracts signed in 2008 until they expire or until the end of 2008 when the data were collected. This implies that data are censored for contracts that began in 2008 and ended in 2009.Footnote 27 This is a general problem with insurance data, since it is possible to sign up for a one-year contract at any time in most countries.Footnote 28

We further divide the policyholders into homogeneous age and gender groups that correspond to the actuarial model used during 2006–2008. This gives us 10 groups on which we perform the analysis.

We restrict our analysis to vehicles of age 3–20 years. The restriction on vehicle age is due to new vehicles generally having a motor vehicle damage warranty that corresponds to ARI. This affects the choice of purchasing more extensive coverage.Footnote 29 We also expect that ARI is less likely for older vehicles due to their lower value. As can be seen in Figure 1 , the data confirm that the number of vehicles with ARI increases when the vehicle is three years old and decreases as the vehicle gets older. We also perform a sensitivity analysis of this restriction (Tables A1–A3).

Figure 1
figure 1

All-risk insurance and vehicle age. Note: Vehicle age is −1 to 20 years, a negative age is possible in cases where the policyholder owns a vehicle of the latest vehicle year model.

Table 3 provides descriptive statistics of the private information variables for those choosing full coverage and those that choose less than full coverage for both the whole sample and the subset of new policyholders. The table shows that individuals with less than full coverage tend to have more convictions and traffic offences. Until recently, age was not a restriction when owning and insuring a vehicle, implying that in our data very old individuals and small children are included as chief users. Individuals under 18 years of age are sorted out from the analysis, since they are too young to have a driving licence. Some individuals appear to be too old to drive; there are 168 observations of individuals aged 90 years and over in the sample of new policyholders. We do not exclude them from the analysis, since there is no upper restriction on driver age in Sweden.

Table 3 Policyholders with full coverage, less than full coverage and new policyholders with full coverage and less than full coverage

Further, the maximum number of convictions is very high in all groups. Less than 1 per cent of those with convictions have more than 10 convictions, and the mean of convictions is 1.6. Those with inordinately many convictions are likely to be individuals known, and often checked, by the police. These individuals are probably not allowed to drive, but this is not a restriction on insuring a vehicle. All in all, a higher share of those with less than full insurance coverage have fines or convictions compared with those with full insurance.

Econometric approach

First, we use the standard positive correlation test by C&S to examine the relationship between insurance coverage and ex post risk occurrence where the policyholder is held fully or partially responsible in the reported claim. That is, the policyholder fully, or partially, caused the accident in the claim. The insurer uses several degrees of causation such as fully, partial or slight. The main reason for using at-fault claims is that the insurer does not include information on other drivers than the vehicle owner (=policyholder) in the contract. Taking into account all claims, including claims involving drivers other than the policyholder, may generate a positive correlation. The reason is that the variable additional drivers is not included among the control variables.Footnote 30

We apply the bivariate probit model suggested by C&S to test for residual asymmetric information:

The dependent variable of Eq. (1) represents the choice of a particular contract; c i =1 if the policyholder has the highest possible coverage, that is, ARI with low deductible (3,000 SEK) and c i =0 if less coverage is bought (ARI with high deductible (5,000 SEK), limited damage insurance or traffic insurance).Footnote 31 The dependent variable of Eq. (2) represents an at-fault claim; y i =1 if the policyholder has a claim where s/he is fully, partially or slightly at fault, y i =0 if the policyholder is not at fault or if no claim is made. X is a vector of covariates that is included to control for the risk classification used by the insurer in 2006–2008.

C&S argue that the policyholder's probability of owning a certain contract depends on the risk classification X and some random shock ɛ i . In a similar manner, for any X, the occurrence of an accident at-fault also depends on some random shock η i . The error terms are aimed at capturing any residual heterogeneity across agents when the risk classification has been taken into account. The variable of interest is the correlation between the error terms (ρ). If ρ > 0, there is an indication of adverse selection and/or moral hazard since, conditional on risk classification, the choice of a contract and the occurrence of an accident are not independent: Contracts with more complete coverage predict a higher probability of an ex post risk.

In the next step of our analysis we study the effect of private information head on, by using an approach introduced by Finkelstein and McGarry.2 This approach suggests that the null hypothesis of symmetric information can be rejected if, conditional on the information used by the insurer in setting prices, the econometrician observes some other characteristics of the individual that are correlated with both insurance coverage and ex post risk occurrence. This characteristic must be unknown, or unused, by the insurer. Finkelstein and McGarry argue that this approach provides a more robust test for asymmetric information compared to the correlation test. The reason is that it includes variables that represent the policyholder's private information, which opens up the possibility of directly observing the effect of private information. In our approach, we include the policyholders’ private information about risky behaviour, which makes it possible to study the effect of private information on demand for insurance and outcome (at-fault claim or not). The null hypothesis of no residual asymmetric information is rejected if, conditional on X, private information about traffic behaviour is correlated with both insurance coverage and ex post risk occurrence. We test the effect of private information by estimating the following probit models:

The added information compared to Eqs. (1) and (2) is four indicator variables that take the value 1 if the policyholder has at least one fine for speeding, at least one fine for other traffic offences, one conviction for traffic safety violation (=1), and more than one conviction (>1) for traffic violation, respectively.

The coefficients of interest in Eqs. (3) and (4) are β2 and δ2. From them we can conclude whether the policyholder's private information on risky traffic behaviour has any effect on choosing extensive coverage, and/or the probability of being at fault in a claim. A positive correlation prediction is that β2>0 and δ2>0, which imply that violations of traffic law regulations are associated with more coverage and culpa in claims.

The variables in X in all regressions are age and gender of policyholder, vehicle age, kilometre class, vehicle risk classification and residential area risk classification. We also apply the analysis to the age and gender groups used by the insurer in the actuarial model for 2007 and 2008.Footnote 32 Note that all coefficients are not reported here since the risk classification variables and summary statistics are the insurance company's classified information.

Results

Replication of previous studies

As discussed earlier, Cohen17 found a statistically significant correlation for the more-experienced driver group. We replicate these findings by dividing the policyholders into similar groups; the results are reported in Table 4. Our approach differs in that we focus on at-fault claims and more extensive coverage compared to Cohen, who is concerned with whether low-deductible policyholders are associated with more claims. Our data do not allow such an approach, since we do not have information about indemnities; thus, we are not able to drop claims that are lower than the highest deductible when comparing the number of claims. Furthermore, since driving experience is not used in the risk classification, the data do not contain information about it. We therefore use a proxy for driving experience by considering the age group 18–20 years to have less than three years of driving experience and older drivers to have more than three years of driving experience. Group one has no statistically significant correlation between risk and coverage, which confirms the results of both Cohen17 and C&S. The second group, which corresponds to drivers with more than three years of driving experience, has a statistically significant correlation. Our results confirm the findings of Cohen in that we find a statistically significant correlation between risk and coverage. The correlation coefficient is low for both groups, and the reason why the correlation coefficient is statistically significant in the more experienced driver group may be that N is much larger and not because the correlation is stronger. The same interpretation applies to Cohen's results, since N differs between inexperienced drivers (1,358) and experienced drivers (103,279).

Table 4 Correlation test for policyholders with different driving experience

Another potential caveat is that the group of inexperienced, or young, drivers is more likely to be homogeneous compared with a sample of senior drivers.Footnote 33 This potentially biases the correlation test. However, inexperienced drivers will not have an informational advantage over their insurer, as they do not have enough experience of their own driving on which to make inferences on how risky they are. A related study by ArvidssonFootnote 34 uses this data set and shows that new policyholders who stayed with the insurer for a year or less are more likely to make a claim than long-term customers. Both groups consist of inexperienced and experienced drivers. The conclusion is that, since insurers do not share claim history, high-risk drivers have an incentive to switch insurer when their type is revealed.

The standard positive correlation test

Table 5 reports the results from the bivariate probit model of Eqs. (1) and (2) for our sample of new policyholders aged 18 years and over with a vehicle 3–20 years old.

Table 5 Correlation test between all risk insurance and culpa

Overall, it seems that the insurance company is able to handle the information asymmetry, since there tends to be no significant correlation in the majority of groups. But, conditional on the risk classification, the correlation coefficient is positively significant at the 5 per cent level for women in the age group 18–21 years, at the 1 per cent level for women in the age group 30–39 years and at the 1 per cent level for policyholders of both sexes in the age group 50+ years.Footnote 35 This indicates that there exists residual asymmetric information, which supports the adverse selection/moral hazard prediction.

Women aged 30–39 years and policyholders aged 50+ years may be a result of the chief user problem. Even though we have sorted out claims where some other driver was named in the accident report, there may be cases where the policyholder was falsely named as the car driver. The younger group of women (age group 18–21 years) may be riskier than expected by the insurer; women generally pay premiums that are much lower compared with men in this age group, because the latter are viewed as very risky.

Our results are not consistent with those of Cohen. She did, however, use a sample of inexperienced drivers, and it is unclear if she identified new policyholders based on contract or individual, which makes a crucial difference: An individual might have stayed with the insurer for a long period of time before deciding to buy a second vehicle. In our data, this second vehicle will appear as a new contract even though the policyholder is loyal. It is therefore important to sort data on new policyholders, and perform the analysis on their contracts, rather than sort data on new contracts only. Most papers in the literature do not make a clear distinction between policyholder and contract.

The results in Table A4 of the sensitivity analysis of the correlation test include at-fault claims for all drivers; that is, cases where the policyholder and other drivers, not included in the contract, caused the accident in the claim. The correlation coefficient also becomes significant for men (age group 30–39 years) and the age group 40–49 years. Table A5, which contains the results for all claims, shows that the correlation coefficient becomes significant for all groups. As the insurer lacks information on additional drivers, we cannot include this information in X, and we expect to find a positive correlation in line with our prediction.

Since we use the insurer's risk classification as the control, the result will likely reflect that the market is characterised by asymmetric information. The sensitivity analysis in Appendix A (Tables A1–A6) further shows the importance of an accurate conditioning on the insurers’ risk classification and the group to whom we apply the test.

Including private information

Tables 3 and 4 report the marginal effects from estimating the relationship between private information on risky traffic behaviour, more insurance coverage and culpa in Eqs. (3) and (4), respectively. A bivariate probit model is used in groups where there is a significant correlation between Eqs. (3) and (4): Similarly, the equations are estimated independently in groups where there is an insignificant correlation.

Table 6 shows the results from estimating the relationship between private information on risky behaviour and insurance coverage in Eq. (3). The results indicate that speeding increases the probability of more insurance, except for the mixed gender and age groups 40–49 and 50+ years. Moreover, private information on other traffic offences and several convictions for traffic safety violations tends to essentially decrease the probability of more insurance coverage. A sensitivity analysis is performed in Appendix C, where we (i) include all private information variables as one single variable and (ii) drop each variable as a crude check for multicollinearity, and the result seems robust. One possible explanation of these results is that speeding is a socially acceptable violation and is not perceived as a risky behaviour, hence the policyholder may not use this information in the decision on whether or not to purchase insurance. Other traffic offences and major infractions are generally viewed as unacceptable, and individuals committing those may perceive themselves as less risky or represent a type that may not bother to purchase insurance. Since we lack information on how the policyholders perceive their risk, it is difficult to fully understand why speeding reflects a varying pattern. Future research could benefit from survey data on infractions, especially speeding, perceived risk and insurance.

Table 6 Relationship between new policyholders’ private information and extensive coverage

Table 7 reports the results from estimating the relationship between private information and at-fault claims in Eq. (4). The results, where significant, indicate that private information on risky traffic behaviour tends to increase the probability of claims where the policyholder is fully or partially at fault, that is, risky drivers have more accidents. A note of caution is that the rather low number of significant variables may be associated with an under-reporting of culpa claims. That is, high-risk drivers who do not purchase extensive insurance report fewer claims to the insurance company.

Table 7 Relationship between new policyholders’ private information and at-fault claims

A potential caveat is that we cannot observe all contracts until they expire since data are censored for contracts that start in 2008 and end in 2009. We therefore perform a sensitivity analysis of the effect of private information on culpa, where we include only new policyholders for 2007 (see Tables B1–B3). The reason is that the censoring may lead to an under-reporting of culpa claims. The results indicate the same pattern as for new policyholders in 2007 and 2008, the conclusion being that our results are not sensitive to the censoring.

We have also performed a sensitivity analysis, where deregistered vehicles are excluded from the analysis. Having the vehicle deregistered may affect the choice of coverage; if the vehicle is not in use, there may be no reason to purchase full coverage. Speeding and one conviction, which is significant at the 10 per cent level in Table 3, do not have a significant effect on extensive coverage in the age group 50+ years, otherwise the results are robust (see Table B4). For this reason, we have also performed a sensitivity analysis of at-fault claims; speeding becomes significant at the 10 per cent level for women aged 18–21 years, while speeding becomes insignificant for men of the same age group (see Table B5). Hence, our results seem robust.

Conclusions

All in all, our results suggest that there is residual private information in three groups and that high-risk drivers are less likely to purchase full insurance. The data enable us to compare the outcome of C&S with and without private information by using the Finkelstein and McGarry2 approach. Taken together, the results in Tables 6 and 7 show that the correlation coefficient is not affected when including private information. If the correlation arises from policyholders’ private information on traffic offences, it is reasonable to expect that this would be captured by the correlation test and that the correlation would vanish. However, since high-risk drivers are less likely to purchase full coverage, the correlation may not follow our expectation. In most studies, a potential caveat with the correlation test, no matter the accuracy of conditioning on the insurers’ information set, is that the results are biased by information observed by the insurer and not the researcher. This potential caveat is not an issue in this study since we have access to the insurer's risk classification. It may be hazardous, though, to study asymmetric information based on only a coverage correlation test. Even with access to necessary information from the insurer, we can only conclude whether the market is characterised by asymmetric information or not. More importantly, the conclusion may be vicious if the correlation structure does not correspond to our theoretical expectations. The advantage of the Finkelstein and McGarry test is that private information is included explicitly and it is possible to directly observe the outcome of asymmetric information.

It is reasonable to question whether we should expect to find any evidence of information asymmetries in the insurance market. The reason is that, an accurate conditioning on the insurer's risk classification would eliminate any correlation, at least if the risk classification used by the insurer is efficient. We believe that the answer to this question depends on potential information asymmetries in each market, keeping in mind that private information in some markets may be public in others. A general challenge for any empirical analysis of insurance data is the difference in structure across insurance markets. For instance, market heterogeneity, as imposed by laws and regulations, may explain why some markets tend to have a negative correlation, while others tend to have a positive, or even no correlation between risk and coverage. We therefore suggest that empirical work in this area should not try to find a correlation that generally holds for all insurance markets. It is reasonable to believe that the ambiguity found across insurance markets does not necessarily imply a contradiction; it may rather be a consequence of market heterogeneity. We suggest that future research should consider specific market characteristics and subsets of policyholders that are likely to be affected by, or take advantage of, information asymmetries. If high-risk drivers are less likely to purchase full coverage, the market may not be characterised by the positive correlation expected by theory. The solution is to include policyholders’ private information in the analysis when studying information asymmetries.