1 Introduction

Fairness in insurance has always been a critical concern as insurance addresses inequalities, and, therefore, the nature and origin of such inequalities are key ingredients in designing insurance systems and contracts. Historically, discriminatory practices in insurance led to social and economic disparities. This includes not just practices such as gender-based pricing but also redlining [35]. Ensuring fairness in insurance is essential for social justice and equal access to crucial financial services. Biases must be addressed to maintain trust in the insurance industry and to contribute to a more equitable future.

Insurance is not just about justice, trust, and disparities, though. It is also about compensation for unpredictable financial losses under strict conditions. Alongside the legal contracts and stipulations lie their formalization and methodical analyses written in the language of mathematics and statistics, offering the needed precision.

Statistics and computer science have given birth to machine learning and artificial intelligence methods, updating the toolbox of the insurance industry’s quants. These technologies support the analysis of vast amounts of data to identify patterns and predict outcomes. Insurers may use machine learning algorithms to customize policies and premiums based on individual risk characteristics or to enhance customer experience and satisfaction. These methods have, however, their own notions of fairness related to but not rooted in the historically crucial insurance context, for example, group fairness criteria such as demographic parity and individual fairness criteria such as counterfactual fairness; see also Table 3.5 in [1] for an overview of group fairness criteria. It is essential to understand the relations, consistencies, and contradictions, not only mutually among statistical notions of fairness but also between the legal and statistical notions of fairness. This is a basic condition for the insurance industry: To deliver the ‘best solutions’, given the demand from policyholders and society as well as the conceptual supply from statistics and computational supply from technological advances. That equilibrium is moving with the societal and computational evolution as well as the innovation power of the industry. Actuarial scientists and statisticians worldwide will be uniquely positioned to deliver the gold standard—if they seize the emerging opportunities.

Many authors have attacked that challenge already, including several of the authors of the present paper. Prominent examples include [30], in which it is discussed that the conception of fairness is dynamic and, in particular, changes over time. One recurring theme is whether characteristics that an individual has no control over and hence cannot change should be considered in insurance pricing [36]. Another theme is whether insurance should be understood as an expression of solidarity between homogeneous groups or as a fair contract between an insurer and an individual insured [2, 11]. The actuarial literature also offers specialized fairness concepts and technical solutions, see for instance [5, 23]. For an overview of different fairness concepts with a view towards insurance, consult [4, 10].

The remainder of the article is structured as follows. Section 2 contains the main insights drawn from the workshop and some avenues for future research; this includes findings related to plurality and causality as well as implications on privacy and regulation. Section 3 provides an outlook and concludes. Finally, Appendix A contains summaries of the eight workshop talks by the invited speakers.

2 Findings, implications, and opportunities

This section collects the main insights across the workshop’s talks and discussions. This includes areas where consensus was established and areas with potential for academic discord. In both of these cases, important avenues for future research are highlighted.

2.1 Plurality of fairness

Models and algorithms for prediction and risk analysis must have discriminatory effects, but these may be undesired or even unlawful. Discrimination and fairness depend on context, and even within a narrow context, fairness may be contested by the different parties involved, including the insured and the insurers. One party may believe they have been harmed or otherwise negatively impacted by discrimination acting through an algorithm or model, and another party that designed or used the algorithm may believe their decisions can be legitimately justified. Thus, even without disagreement about the observed facts, there can be a disagreement about their legal, ethical, or social interpretation. It should not be surprising that different definitions of fairness can be mutually incompatible [12]. It is still worth noting the consequence: a regulation attempting to prevent one type of discrimination may also enforce discrimination of another kind. For example, a rule requiring formal non-dependence—where a model or algorithm cannot use a sensitive or protected variable as an input—can still allow so-called proxy or indirect discrimination; confer, for instance, with the ‘red car scenario’ of [21].

In recent years, emerging research in the actuarial community has focused on mitigating potential indirect insurance discrimination in insurance risk pricing. Meanwhile, various fairness criteria have been proposed and flourished in the statistics and machine learning literature [29]. In evaluating these methods and criteria, one should examine what they imply in concrete cases and if their application leads to counterintuitive consequences. If our intuitions about concrete cases conflict with the abstract definitions, we may try to modify the definitions. In other cases, our intuitions about concrete cases may be uncertain, and the abstract definitions can inform us about what should be done. Coming up with definitions that are robust to different scenarios is crucial.

It is important to note that fairness is a broad term that needs to be made more precise. The aforementioned actuarial, statistics, and machine learning literature is concerned with algorithmic fairness: a rule-based approach that aims at ensuring uniform treatment of groups of people. Usually, this entails statistical metrics. In addition, there is discrimination in the stricter legal sense—enshrined in consumer protection legislation and concerning individual rights—and finally bias, for instance, in the data used for training an algorithm, but also introduced by the user when applying the algorithm.

In [23], a discrimination-free insurance pricing technique is proposed. It is consistent with the model introduced in [32] and aims to mitigate proxy discrimination. It has been shown that the resulting fairness concept conflicts with group fairness: satisfying the one does not mean the other is satisfied; adjusting for one may undermine the other [25]. The degree to which the technique of [23] specifically, and fair pricing techniques in general, rely on the notion of causality is contested—and no consensus was reached during the workshop. However, the technique may be embedded in a causal framework, whence its appropriateness in different scenarios may be studied through that lens [15].

2.2 Causal notions

From the previous subsection, it is evident that some mathematically founded guidance on comparing and choosing among different fairness definitions could be helpful, given the mutual incompatibility of a plurality of definitions. Causal fairness is one high-level framework or research program that attempts unification by asserting that fairness or discrimination must be understood in terms of causal relationships using explicitly causal methodology [6, 18, 20,21,22, 31]. For a recent survey, see [27]. This can also be viewed as one application area of a larger program on causal machine learning [17].

Many statistical fairness criteria are defined in terms of conditional probability distributions. One source of the multiplicity of definitions is the choice of which variables to include in the condition of such conditional distributions. By distinguishing between observation and intervention, causal models can help decide which variables to condition on for observational purposes. Causal fairness criteria recast the choice of which variables to condition on as a choice about pathways in a graphical causal model [6, 18, 21, 22, 31]. Visual representation of causal pathways in a graph could help guide such choices and facilitate understanding for a wider group of stakeholders. More generally, using interpretable models or visualization methods [26] could help all parties better understand the limitations and consequences of using a particular model. Intersectional fairness and discrimination can become an important issue when there are multiple categories or sensitive variables involved [37], and in such cases, the expressiveness of causal models may be particularly helpful [3].

There is a long tradition of using predictive models, with no explicitly causal assumptions, to make decisions. This is based on hopes that decision-makers understand the differences between observation and action, prediction and intervention. This status quo has been broadly criticized [38]. And now, the increasing prevalence of causal modeling provides an alternative, shifting the focus from passive prediction to actions, interventions, and consequences.

Justification processes for indirect discrimination are potentially too permissive. There may be many associated or correlated variables, and an algorithm or model user seeking to avoid responsibility can search among these to find ones that excuse the appearance of discrimination. A trustworthy causal model for how the world works could limit such a search. When a model or algorithm is used for many impactful decisions—consider credit risk scores—the output of such a model becomes what we might call a ‘universal collider’, and its use for decisions will induce associations among the input variables [9]. Measuring the harms or costs of this problem is empirically challenging and may be a Sisyphean task if these same dynamics invalidate the attempt to measure them. But this may be unavoidable: these issues can invalidate justifications for any given fairness criteria if, for example, the reason for a particular association is due to something like collider bias rather than individual circumstances and decisions.

2.3 Data availability and privacy

To achieve many of the notions of fairness, we need to collect the protected attributes of individuals. However, this information is usually wholly missing or only partially available. How can the lack of data be overcome while satisfying privacy requirements? Discrimination-free insurance prices may still be calculated if partial data on sensitive attributes are available [24]. Still, depending on the jurisdiction and the attribute, even partial collection of such data may be problematic—this also highlights the role of regulators in establishing when, how, and for which purposes sensitive data can be collected, used, and stored. Furthermore, regulators and insurers must be able to communicate the privacy implications to policyholders.

In some areas of insurance, the many discriminatory effects of pricing would perhaps dissipate if sufficiently detailed policyholder information is collected. Indeed, demographic characteristics are often only proxies for risk drivers such as policyholder behavior. Policyholder behavior may, increasingly, be measured via individualized data collection (for example, wearables and telematics). But even beyond concerns around surveillance and privacy, words of caution are due. Constructivist theories about social categories such as race and gender imply that almost all other variables will be associated with these because historical inequalities make these attributes influence nearly all life experiences [14, 19]. After all, categorizing attributes as sensitive may itself be the result of historical circumstances.

While the increasing use of data does not alter the fundamental issues of insurance discrimination, it is changing how insurers do business [28]. A concern is that the resulting highly individualized or personalized rates can make insurance unaffordable or unavailable for some high-risk consumers. For example, as the insurance market has moved towards more individualized risk pricing in the past decades, with the availability of new technologies to measure better and understand risks, more homes in the most disaster-prone regions are facing difficulties in purchasing home insurance.

Considering the broader picture, decision systems based on predictions will often be favored in well-known, data-rich environments. If we rely on these too much and refuse to take action outside such environments, we could miss a lot of potential benefits. This lost potential would be distributed unequally, concentrated among the most under-served people as they often reside in the most data-poor environments. Further, decision systems targeting small units, such as individual people, can miss both risks and opportunities of investments in groups of dependent units or the transformation of environments. More generally, the closed nature of fully specified formal models artificially constrains action spaces. This leads to lost opportunities when those models and constrained action spaces differ from the real world and its action spaces. The growth of systems of data collection and automated decisions should provide us with more free resources and options for innovation and not simply predict—and enforce—more of the same.

2.4 Societal attention

Insurance can be treated as a social good, an economic commodity, or something in between. When insurance is mandatory, or nearly so, it becomes less of a financial commodity and more of a social good, resulting in different attitudes toward fairness. Modern-day insurance is generally sponsored at significant levels, either by governments or private corporations, of which policyholders can own the latter, as is the case for mutual and takaful companies, or by investors. Pool members may feel a form of insurance solidarity, but the responsibilities of the pool depend on its nature. For example, in the case of the pool of contracts issued to individuals by a for-profit stock company, the pool can be thought of as a sum of bilateral agreements that leaves out the collective dimension of insurance. Actuarial fairness applied in this context might mean that each customer should pay for their own risk and only their own risk. However, subsidies from one group to another are typical for social insurance, where a government entity owns the pool.

The pool’s responsibility, protected attributes, and economic considerations such as adverse selection, moral hazard, and financial efficiency jointly determine the appropriateness of insurance discrimination (differentiation), which depends on the context and varies by lines of business and jurisdiction. No intent on the insurers’ side is necessary for discriminatory effects to occur.

Consumer protection is a cornerstone of financial services regulation. Financial services such as banking or insurance services must not unfairly discriminate against customers. The European Union’s legal framework distinguishes between direct and indirect discrimination. While unfair discrimination has always been part of a supervisor’s agenda, machine learning, and the corresponding ever-growing data requirements can scale the problem, for example, in automated decision-making with little or no human oversight.

Accordingly, the European Union is introducing new legislation in the form of the AI Act. This aims to classify the use of machine learning in three risk tiers. The AI Act guides the use of such models in the ’high risk’ category, such as disclosure towards customers, governance, and human oversight. This risk tier currently explicitly includes risk assessment and pricing in life/health insurance, but the scope may be adjusted at a later stage.

The challenge for financial institutions, regulators, and supervisors alike is to find a practical way to ensure practices are free of unfair discrimination against customers. Supervisors can expect banks and insurance companies to proactively identify sources of such unfairness and take measures to avoid unfair discrimination. Such measures will likely exceed simple metrics and require extensive business knowledge and additional human oversight.

3 Outlook

The above discussions have, at least at certain stages, made the implicit assumption that the technical actuarial prices are the prices charged in an insurance market. However, insurance pricing is a complicated process that may involve cost modeling (risk pricing), demand modeling, and price optimization, depending on the line of business and jurisdiction. The existing actuarial research predominantly focuses on the risk pricing stage, which is a narrow focus as discrimination could appear at all stages of the pricing process; see however [33]. More research—and broader research—is needed, covering a more comprehensive range of insurance practices, including underwriting, pricing, marketing, claims processing, fraud detection, etc.

Major open questions can be split into three related dimensions. First, from a technical perspective, mathematical methods are needed to directly avoid or at least identify potential cases of unfair discrimination with a high accuracy rate. Typically used explainability tools, which supposedly improve machine learning methods’ transparency, are lacking in many regards. For instance, it is easy to construct examples where an algorithm suggests a highly unfair decision, while these tools do not identify this issue [34]. This calls for further research on interpretability, building on sophisticated statistical frameworks such as causal graphs.

Furthermore, regulators and supervisors must—in conjunction with the financial services industry and based on current technical developments—define supervisory expectations relating to fairness and unfair discrimination, not least in the context of artificial intelligence and machine learning. These expectations should include minimum skills, processes, and human oversight requirements.

Finally, the broader actuarial community should develop guidelines that support informed decision-making regarding fairness criteria across insurance contexts. The fairness and accuracy trade-off usually discussed in the machine learning literature conflicts with the goals of business decision-making. On the contrary, machine learning scholars might take insight from the interdisciplinarity of the actuarial literature [13]. Nevertheless, a more business- and society-focused framework based on stakeholder analyses is needed. This is closely related to another critical yet underexplored question: How does fairness impact stakeholders, and who pays the cost of fairness? While a first attempt to empirically answer this question is provided in [33], more research is needed to understand the impacts of various fairness policies to inform optimal decision-making by businesses and regulators. We, therefore, call for high-quality datasets to perform empirical research in this area.

Across all these dimensions, technical actuarial considerations are essential to clarify the contours. Let us, however, conclude by echoing the message of [7]: Resolutions require the interdisciplinary engagement of expertise from fields such as actuarial science, statistics, and computer science, but also the social sciences, including jurisprudence and economics.