Modeling and pricing cyber insurance

The paper provides a comprehensive overview of modeling and pricing cyber insurance and includes clear and easily understandable explanations of the underlying mathematical concepts. We distinguish three main types of cyber risks: idiosyncratic, systematic, and systemic cyber risks. While for idiosyncratic and systematic cyber risks, classical actuarial and financial mathematics appear to be well-suited, systemic cyber risks require more sophisticated approaches that capture both network and strategic interactions. In the context of pricing cyber insurance policies, issues of interdependence arise for both systematic and systemic cyber risks; classical actuarial valuation needs to be extended to include more complex methods, such as concepts of risk-neutral valuation and (set-valued) monetary risk measures.


Introduction
Cyber risks constitute a major threat to companies worldwide. 1 In the last years, the estimated costs of cyber crime have continuously been increasing-from approximately USD 600 billion in 2018 to more than USD 1 trillion in 2020, cf. CSIS [25]. Consequently, the market for cyber insurance is experiencing strong growth, providing contracts that mitigate the increasing risk exposure-with significant potential ahead. However, cyber insurance differs from other lines of business in multiple ways that pose significant challenges to insurance companies offering cyber coverage: • Data on cyber events and losses is scarce and typically not available in the desired amount or granularity. • Cyber threats are evolving dynamically in a highly non-stationary cyber risk landscape. • Aggregate cyber risks arise due to common IT architectures or complex interconnections that cannot easily be captured. • The term 'cyber' risk itself comprises many different types of risk with different root causes and types of impact.
Insurance companies cannot solely rely on standard actuarial approaches when modeling and pricing cyber risks. Their traditional methods need to be complemented by novel and innovative techniques for both underwriting and quantitative risk management. The current paper provides the following main contributions: (i) We present a comprehensive overview of the state of the art of modeling and pricing cyber insurance. In contrast to other surveys (see, e.g., [39]) that focus on a high-level review of the literature, we explain the underlying mathematical concepts and discuss their advantages and drawbacks. 2 (ii) The second main contribution of the paper is a classification of cyber risks into three different types: idiosyncratic, systematic, and systemic cyber risks. While the distinction between idiosyncratic and systemic risks is common in the current cyber insurance literature (see, e.g., [116]), a further refinement is necessary. The three risk types can be described as follows: • Idiosyncratic risks refer to cyber risks at the level of individual policyholders that are independent from risks of others parties. This might, for example, be caused by internal errors within the company. Prototypical idiosyncratic risks are independent risks in large insurance pools that allow to apply classical actuarial techniques.
• Systematic risks are cyber risks that result from common vulnerabilities of entities affecting different firms at the same time, e.g., firms belonging to the same industry sector or region, or firms that utilize the same software, server, or computer system. These risks can be modeled via common risk factors. In classical actuarial and financial mathematics, systematic risks include financial market risks as well as stochastic fluctuations and evolutions of mortality rates within a population. • Systemic risks are cyber risks caused by local or global contagion effects in interconnected systems or by strategic interaction. Examples are worm-type malware or supplier attacks. These risks are similar to important feedback mechanisms observed in financial crises, e.g., contagion in networks of counterparties or fire sales of stressed market participants in illiquid markets. Models include random processes with feedback, or locally and globally interacting processes. We will also include strategic interactions in this category which are studied in game theory.
Idiosyncratic and systematic cyber risks can be captured by classical approaches of actuarial and financial mathematics; systemic cyber risks require different methodologies such as epidemic network models which focus on the interconnectedness of the entities. We suggest pricing techniques that adequately incorporate interdependence for both systematic and systemic cyber risks by combining the concepts of risk-neutral valuation and risk measures.
The paper is structured as follows. Section 2 reviews classical actuarial approaches. We begin with an introduction to the frequency-severity approach in the context of cyber risk and discuss how to model both idiosyncratic and systematic risks in this framework. We explain how dependence is captured in such models. Systemic cyber risks are considered in Sect. 3. Three different modeling approaches for interconnectedness, contagion, and interaction between entities are discussed, with a special focus on their advantages and possible drawbacks. In Sect. 4, we describe pricing methods for cyber insurance contracts that are applicable in the face of idiosyncratic, systematic, and systemic risks. Section 5 discusses open questions for future research.

Classical actuarial approaches applied to cyber risks
The pricing of cyber insurance contracts as well as quantitative cyber risk management require sound models for the loss distributions, customized to the application purpose. While classical actuarial premium principles are essentially related to the expected claims amount (plus a safety loading), quantitative risk management particularly refers to extreme losses in the tail of the distribution and their quantification in terms of risk measures such as Value at Risk or Average Value at Risk, see Sect. 4. In actuarial mathematics, a standard model for insurance losses-used across all lines of business-is the frequency-severity approach, also called collective risk model. For a certain time interval [0, t], t > 0 (typically t = 1 year), a collective of policyholders causes a random number of claims N t (frequency) with correspond-ing random loss sizes Y 1 , Y 2 , . . . (severity) generating the total claim amount Calculations within the frequency-severity approach typically rely on the following mathematical assumptions (see, e.g., [88]): (C1) Claims occur at arrival times 0 ≤ T 1 ≤ T 2 ≤ . . .. The number of claims in the time interval [0, t], t ≥ 0, is defined by i.e., N = (N t ) t≥0 constitutes a counting process on [0, ∞). (C2) The jth claim arriving at time T j causes the claim size Y j . It is assumed that the sequence (Y j ) j≥1 of claim sizes consists of independent and identically distributed random variables. (C3) Claim sizes and claim numbers are assumed to be independent from each other.
In contrast to classical insurance risks, however, cyber risk is more challenging in different ways. In particular, the standard assumptions of the frequency-severity approach as well as classical statistical techniques 3 are no longer applicable: • Claims data is not available in sufficient quantity or in the required granularity.
• Technology and cyber threats are evolving rapidly, i.e., the cyber environment is highly non-stationary. • Cyber incidents 4 may affect different policyholders at the same time, i. e., the typical assumption of independence for insurance risks does not hold any longer. Moreover, there is-in contrast to natural catastrophe risk-no simple geographical delimitation of dependent risks.
Nonetheless, the frequency-severity approach can be customized to account for cyber risk-at least as a first approximation and for certain types of non-systemic cyber risks, which can be subdivided into idiosyncratic and systematic risks (as defined in Sect. 1). In the frequency-severity approaches presented below, we explicitly distinguish between techniques suitable for modeling idiosyncratic or systematic incidents. In the context of cyber insurance, however, a third class of risks can be identified, namely systemic risks, i.e., cyber risks resulting from contagion between interconnected entities. Proper modeling of such risks goes beyond the classical framework of actuarial modeling and requires appropriate models for networks, (cyber) disease spread, and strategic interaction. Hence, we discuss the modeling of systemic cyber risks separately in Sect. 3, while the pricing for all types of cyber risks is discussed in Sect. 4.
To present frequency-severity approaches in the context of cyber risk in a unified and practically applicable way, we use the following notation and definitions. We consider an insurer's portfolio of n policyholders (firms) exposed to the considered type of cyber risk incidents. Each firm admits an individual risk profile characterized by a vector of covariates, e.g., industry sector, size, IT security level, which are elicitable, for example, via a questionnaire or from public information. Using the covariates, the insurer's portfolio is decomposed into homogeneous groups, labeled {1, . . . , K }, with covariates vector x k for group k. We denote by n k , k = 1, . . . , K , the number of firms in group k, i.e., n 1 + . . . + n K = n. For pricing purposes, these homogeneous groups can be viewed as tariff cells, i.e., the insurance firm should charge all firms 5 within group k the same premium π k . In particular, if n k is large, then the premium of the idiosyncratic cyber risk can be derived from the law of large numbers as the expected claims amount per firm of group k plus a suitable safety loading to avoid ruin in the long run.
Both idiosyncratic and systematic incidents can be grouped into different cyber risk categories, labeled {1, . . . , C}. Categories may include, for example, data breach, fraud, and business interruption. Two exemplary actuarial classification approaches are sketched and discussed in Appendix A. Cyber risk is modeled per risk category c ∈ {1, . . . , C} and per group k ∈ {1, . . . , K }. A pair m := (c, k) is called a cyber risk module. The total number of modules C · K is a trade-off between homogeneity and availability of data for statistical estimation.
Within this framework, we model the losses for an insurance company -for each cyber risk module as well as on an aggregate level. For this purpose, we first focus on frequency-severity based approaches to modeling cyber risks in the spirit of the classical collective risk model. Second, we add dependence to our cyber risk model in order to capture accumulation risks. Note that appropriate dependence modeling is particularly important for calculating capital requirements in quantitative risk management, since the underlying risk measures refer to events in the extreme tail of the loss distribution.

Frequency and severity
A frequency-severity model may be applied on the level of each cyber risk module m = (c, k). For simplicity, we describe the losses per risk category of individual firms by a collective risk model. This can be justified as follows: Since all firms in any group are (approximately) homogeneous, they will be charged the same premium for any given risk category. From the point of view of the insurance company, only aggregate losses are relevant, i.e., an artificial allocation of losses to individual companies for pricing purposes will produce the correct implications. We thus describe the losses per risk category at the level of any individual firm by a collective risk model with the same severity as the corresponding module, but with a suitably reduced frequency.
For a firm i in group k and a fixed risk category c, i.e., a cyber risk module m = (c, k), we consider the frequency and severity model (N m,i t , (Y m,i j ) j≥1 ). Then the total claim amount of firm i up to time t can easily be obtained by summing up: In mathematical terms, all quantities correspond to random variables on a suitable probability space ( , F, P), where P plays the role of the statistical measure that models the relative frequency with which events occur. As outlined in the introduction of this section, one of the most common assumptions in the frequency-severity model is assumption (C3), i. e., claim numbers and sizes are independent of each other. This assumption facilitates and simplifies many calculations regarding the compound total claim amount process. In particular, the expected total claim amount and its variance follow from Wald's formulas: However, the independence assumption may not always be reasonable-e.g., if hidden factors influence both frequency and severity: Sun et al. [106] detect a positive nonlinear dependence between frequency and severity in hacking breach risks at the firm level. A firm with a strong cyber self protection is expected to experience both fewer and weaker hacking attacks than companies with weak self protection mechanisms. In mathematical terms, the authors capture this dependence between frequency and severity by the Gumbel copula, see also Sect. 2.2.

Frequency
Let N m,i t denote the number of incidents in module m = (c, k) until time t that are allocated to a firm i in group k, and let (N m,i t ) t≥0 denote the corresponding counting process. At the aggregate level, will count the total number of incidents per module m = (c, k) and the total number of incidents per cyber risk category c, respectively.

Poisson Process
A simple counting process for incidents-reflecting non-stationarity of cyber risk-is a time-inhomogeneous Poisson process with intensity function λ m per firm for cyber risk module m.

Definition 2.1 (Time-inhomogeneous Poisson process) A counting process (N t ) t≥0
is called a time-inhomogeneous Poisson process on ( , F, P) with locally integrable rate (or intensity) function λ : [0, ∞) → [0, ∞) if: 1. N 0 = 0, 2. the process has independent increments, 3. for any time interval (s, t], the number of incidents is Poisson distributed with mean t s λ(u) du, i.e., Unless the intensity function is constant, the increments of a time-inhomogeneous Poisson process are non-stationary. The cumulative rate function t 0 λ(u) du corresponds to the expected number of incidents up to time t.
Zeller and Scherer [116] adopt this approach for idiosyncratic incidents. For each policyholder i of group k and module m = (c, k), the number of idiosyncratic incidents (N m,i t ) t≥0 is assumed to follow a time-inhomogeneous Poisson process with intensity λ m = λ (c,k) . Clearly, for each cyber risk category c, the intensity at the level of an individual firm i depends on the covariates x k of group k (but not on the individual policyholder i), and Zeller and Scherer [116] propose a generalized additive model to estimate the intensity rates. 6 In particular, similarities and deviations of the risk profiles of the K groups-expressed in terms of the covariate vectors x k , k = 1, . . . , K -are reflected by the intensity functions λ (c,k) .
Since idiosyncratic incidents are independent across firms, the total number of incidents N m,agg t , t ≥ 0, per module m = (c, k) as well as the total number of incidents N (c) t , t ≥ 0, per cyber risk category c, respectively, are again time-inhomogeneous Poisson processes with respective intensities More delicate, however, is the case of systematic cyber risk incidents. In particular, frequency distributions of different policyholders might be subject to dependencies due to joint underlying cyber risk factors R 1 , . . . , R d , representing, for example, the random discovery of exploits in commonly used software, improvements in cyber security, or the technological progress of tools for cyber attacks.

Cox Process
Such dependencies between counting processes can be captured in the context of Cox processes, also called doubly stochastic Poisson processes, extending the notion of a time-inhomogeneous Poisson process to a random intensity. Definition 2.2 (Cox process) A Cox process (N t ) t≥0 is a counting process described by a random intensity process (λ t ) t≥0 such that conditional on the specific realization t → λ t (ω), ω ∈ , the process (N t ) t≥0 is a time-inhomogeneous Poisson process with intensity t → λ(t) = λ t (ω).
A reasonable assumption could be that the intensity is a function of the current state of random cyber risk factors, i.e., for an R d -valued stochastic process R t = (R 1 t , . . . , R d t ), t ≥ 0, of cyber risk factors and a function λ : R d → [0, ∞), the intensity process is defined as More generally, the intensity process could be modeled as a function of the whole history of cyber risk factors, i.e., In summary, in the case of systematic cyber risk, a reasonable model for the number of incidents N m,i t up to time t allocated to policyholder i in group k for module m = (c, k) could be to assume that (N m,i t ) t≥0 follows a Cox process with intensity pro- . . , C, k = 1, . . . , K , are independent time-inhomogeneous Poisson processes. In particular, conditional independence implies that-conditional on the specific realization t → λ m t (ω)-the total number of incidents N m,agg t , t ≥ 0, per module m = (c, k) and the total number of incidents N in analogy to (1).
In contrast to the time-inhomogeneous Poisson process, the increments of a Cox process (N t ) t≥0 are in general no longer independent, but subject to autocorrelation. More precisely, for any s < t ≤ u < v, the tower property of conditional expectation implies i.e., the autocorrelation depends on the random intensity process. The statistical analysis of Bessy-Roland et al. [7] yields empirical evidence for autocorrelation in the number of attacks, and thus provides an additional rationale for Cox processes when modeling claims frequency. The specification of an intensity process that reproduces the empirically observed autocorrelation appears to be challenging.

Severity
Every claim occurring in the frequency-severity model triggers a loss size that is modeled as a random variable. We let Y m,i j denote the claim size of the jth event allocated to firm i for module m = (c, k) and assume that (Y m,i j ) j≥1 , i = 1, . . . , n k , is a collection of non-negative independent 7 and identically distributed random variables. One among many different possible approaches is to assume that the key governing parameter for the choice of the claim size distribution is the incident category c; characteristics of group k then determine distributional details, e.g., parameter values.
Due to the limited availability of loss data, empirical research on cyber risk severity distributions has mostly focused on the category of data breaches. For this category, open source data bases, such as the Privacy Rights Clearinghouse Chronology of Data Breaches, are available and regularly updated. Data breach severities are found to follow strongly heavy-tailed distributions such as power-law (see, e.g., [80]), log-normal (see, e.g., [37]) or generalized Pareto distributions (GPD) (see, e.g., [112] or [106]). For cyber risk categories different from data breaches, less data is publicly available. Consequently, fewer papers have appeared that empirically analyze the respective severity distributions.
An exception is Dacorogna et al. [29] who study a non-public database of the French Gendarmerie Nationale on cyber complaints and describe a process for cleaning the data. Their analysis suggests that losses are heavy-tailed. Dacorogna et al. [30] refine the analysis and provide a tool for classifying attacks based on the fatness of the tail. Another promising direction are studies based on data on operational risk such as Biener et al. [9] or Eling and Wirfs [41]. These approaches offer the benefit of being able to analyze all categories of cyber incidents simultaneously. In particular, Eling and Wirfs [41] detect distributional differences between small and large claim sizes for all considered cyber incident categories. The authors propose a composite distribution approach, where excess losses over a threshold are modeled using a GPD and the remaining smaller losses are modeled using a simple parametric distribution such as a gamma or log-normal distribution. In general, composite distribution approaches constitute a flexible modeling tool to take the empirically observed distributional differences between body and tail of severity distributions adequately into account. A composite distribution approach can be formalized as follows.
For each module m, we choose a threshold θ m distinguishing small from large cyber claims. Small and large claims, i.e., the body and tail of the severity distribution, are then modeled separately: The i.i.d. claim sizes follow a composite distribution with density The composite distribution approach is well-suited for modeling non-life insurance severity distributions in general, and cyber risks in particular. 8 As discussed here, the methodology is independent of time, i.e., it provides only a snapshot of the current cyber environment. In the light of the fast-evolving, non-stationary cyber landscape, the suitability of the model must, however, be regularly validated and updated. For further details and discussions, we refer the interested reader to the excellent summaries provided by Zeller and Scherer [116], Sect. 2.1, or Eling [39], in particular Tables 4 and 6, and to Cooray and Ananda [23] for an application of composite distributions in a non-cyber specific context.

On calibration and application
In general, frequency-severity models are well-understood, easy to implement and to calibrate if a sufficient amount of data is available. They are also straightforward to explain, for example, to an executive board of an insurance company; this is partly due to their prevalence in actuarial modeling. For frequency modeling, intensities can, e.g., be fit to data using generalized additive models (as in Zeller and Scherer [116] and described above), maximum-or marginal likelihood, or Bayesian methods. Cox processes are generally more difficult to estimate -the choice of a calibration method critically depends on the law of the underlying common risk factor processes. 9 For the statistical analysis of the severity, there exist well-known estimation techniques including maximum-likelihood, see, e.g., Maillart and Sornette [80] or Edwards et al. [37] for applications in a cyber severity context, or the peaks-over-threshold method for fitting a GPD to the tail of a distribution, see, e.g., McNeil et al. [85] and Embrechts et al. [42]. For a general review on methods for the parameter estimation of GPDs, including maximum-likelihood, the method of moments, the probability weighted moments method, and Bayesian approaches, see also de Zea Bermudez and Kotz [32] and de Zea Bermudez and Kotz [33].
The practical application of frequency-severity models to cyber risk is challenging, in particular due to the limited amount of available data and its insufficient quality. Moreover, Poisson and Cox processes do not capture the systemic interaction between different (groups of) policyholders; see also Reinhart [98] for a discussion of the frequency-severity model presented by Zeller and Scherer [116]. An alternative are Hawkes processes that incorporate systemic self-excitation into frequency models, see Sect. 3.1. Like Cox processes, Hawkes processes are able to capture autocorrelation observable in the data.

Dependence modeling
The distribution of the total claim amount per module and at the portfolio level is affected by the underlying dependence structures. For cyber risk, dependencies may be present at different levels including: • dependence between frequency distributions or between severity distributions of different policyholders in the same homogeneous group (e.g., due to the random evolution of common cyber security measures and cyber threats over time), • dependence between frequency and severity-in contrast to the classical framework of frequency-severity models (e.g., due to unobservable random factors within a tariff class such as heterogeneous levels of cyber self protection).
One approach to deal with the first type of dependencies are Cox processes as described in Sect. 2.1.1. In this section, we review further approaches to model dependence in the context of cyber risk that have been proposed in the literature.

Common risk factors
Common risk factors capture dependence for systematic risks; the factors are random quantities to which all risks are jointly exposed. Common risk factors appear in static as well as in dynamic models and have been widely used in the cyber risk modeling literature. For example, they are key elements of the cyber risk models proposed by Böhme [10], Böhme and Kataria [11] and Zeller and Scherer [116]. Cox processes, as introduced in Sect. 2.1.1, are an example of dynamic factor models. Böhme [10] captures dependence using one common risk factor in a static model. The factor represents a common vulnerability in a portfolio of n individual risks. The connection between individual risks and the latent risk factor is studied on the basis of linear correlation. 10 Common risk factors also appear in the cyber risk model of [116]. The authors use marked point processes with two-dimensional marks: the first component describes the strength of an attack, and the second component represents the subset of companies affected. Dependence among firms occurs due to the restriction of incidents to certain industry sectors which is modeled via a common risk factor. The paper suggests a conceptual framework, but does not yet calibrate the model to real data. 10 Linear correlation, defined as ρ(X , , captures a possible linear relationship between the random variables X and Y . The maximum and minimum values of 1 and −1 are not always attainable. While often used to impose ad-hoc dependence assumptions in practice (in a cyber context, see, e.g., [10], [11]), linear correlation suffers from many well-known fallacies, see, e.g., [85] for a detailed discussion.

Copulas
In actuarial applications, copulas are a standard tool that fully characterizes the dependence structure of the components of finite-dimensional random vectors. A ddimensional copula C : [0, 1] d → [0, 1] is the distribution function of a d-dimensional random vector with uniform one-dimensional marginal distributions.
If all F i are continuous, then C is unique. 2. Conversely, for a given copula C and given one-dimensional distribution functions F 1 , . . . , F d , the function F in (2) is a d-dimensional distribution function with copula C and marginal distribution functions F 1 , . . . , F d .
Property 1 states that a copula extracts the dependence structure of a random vector from its multivariate distribution, while property 2 provides a flexible construction principle of multivariate models by combing marginal distributions and copulas to multivariate distributions. Prominent examples of copulas are: • Gaussian copula: Letting −1 be the quantile function of the standard normal distribution and the joint cumulative distribution function of a multivariate normal distribution with covariance matrix , the corresponding Gaussian copula is given by • t-copula: Let t ν, signify the distribution function of a d-dimensional tdistribution t d (ν, 0, ) for a given correlation matrix and with ν degrees of freedom, and let t ν denote the distribution function of a univariate t-distribution with ν degrees of freedom. The corresponding t-copula takes the form Like the Gaussian copula, the t-copula is an implicit copula that is extracted from a given parametric multi-variate distribution. • Archimedean copulas: Explicit copulas are constructed from given functions; the prime example are Archimedean copulas. We consider a suitable continuous function ψ : [0, ∞) → [0, 1] with ψ(0) = 1, lim x→∞ ψ(x) = 0, and ψ strictly decreasing on [0, ψ −1 (0)], where ψ −1 denotes its generalized inverse. The Archimedean copula with generator ψ is given by A special case is the Gumbel copula for ψ θ (s) = (− ln(s)) θ , θ ∈ [1, ∞) that is applied in the cyber model of Sun et al. [106].

On calibration and application
Common risk factor models are able to capture dependence from bottom-up and are widely used in economics. From a practical perspective, they are particularly useful when a modeler is confident that random outcomes are influenced by common external factors. In Cox processes, described in Sect. 2.1.1, the common factors enter the model via the intensity. Their estimation depends on the specific choice of the distribution of the underlying risk factors. For the class of linear factor models, a large amount of statistical estimation methods exist. Important techniques are time series regression, cross-sectional regression (at each time point), and principal component analysis, see, e.g., McNeil et al. [85] and the references therein. Another approach are copulas; these are theoretically able to represent every form of static dependence. They can be viewed as a top-down approach that imposes a dependence structure without modeling the underlying mechanisms, as contrasted with factor models that can be interpreted as a bottom-up approach. Copulas have already been used in the literature on cyber risk. Herath and Herath [62] model the loss distribution at a single firm using a copula that captures the dependence structure between the number of affected computers of the firm and the overall severity of the loss. In Böhme and Kataria [11] dependence between different firms is captured using a t-copula with a given linear correlation coefficient.
Another example is an application of copulas in a modified collective risk model in which the standard independence assumption is relaxed. For the incident category c of hacking data breaches, Sun et al. [106] observe upper tail dependence between frequency and severity. This may be caused by hidden factors such as the degree of cyber self protection. They propose to model this dependence for any firm i in module m up to time t via a Gumbel copula.
Eling and Jung [40] and Liu et al. [79] apply vine copulas in the context of data breaches. Vine copulas are very flexible, and their calibration is quite tractable, since high-dimensional dependence structures are decomposed into components of lower dimension. For detailed information on vine copulas we refer to Czado [26], Czado and Nagler [27], and an online collection of material on vine copulas, see TU Munich, In general, the choice of a suitable copula estimation method depends on the structure of the chosen copula model: parametric, semiparametric or nonparametric. A good survey on various methods is Hofert et al. [65]. In a fully parametric model, both the copula and the marginal distributions are completely characterized by (vector) parameters. The maximum likelihood (ML) method can be applied to the dependence and the marginal part either jointly or sequentially. The sequential approach is often referred to as the method of inference functions for margins (IFM), see, e.g., the surveys in Choroś et al. [22], Sect. 2.1, or McNeil et al. [85], Sect. 7.6. Semiparametric approaches typically still involve a parametric copula model, but a nonparametric model for the marginals. Here, classically, the marginal distributions are estimated via their empirical distribution functions. Estimation of the full model can then be performed using a maximum-pseudo likelihood approach, in which the nonparametric marginal estimators are inserted, see the seminal paper of Genest et al. [53]. This approach is considered to be more robust than the parametric ML and IFM methods in many practical applications, see Kim et al. [70], unless substantial information is available on a parametric class to which the margins belong to. Nonparametric copula models may be estimated on the basis of different variations of nonparametric marginal and joint distribution function estimates, see, e.g., the seminal paper of Deheuvels [34] using empirical distribution functions or Chen and Huang [21] (and the references therein) for kernel-based estimators of the copula (or copula density).

Systemic cyber risks
Systemic risk generally refers to the possibility that distortions in a system may spread across many entities and be augmented due to local or global feedback effects. This is in contrast to systematic risk that introduces dependence via exogenous factors. Systemic risk refers to the internal mechanism of a system in which the behavior of the various entities has a sequential impact. It is often associated with a cascading risk propagation such that "in case of an adverse local shock (infection) to a system of interconnected entities, a substantial part of the system, or even the whole system, finally becomes infected due to contagion effects." 11 As a consequence of the 2008 financial crisis, systemic risk was intensively studied in systems of interdependent financial institutions, see, e.g., Staum [105]. This concept is also important in the context of cyber risk, since agents and organizations in cyber systems are interconnected, for example within IT networks or via business contacts. 12 The relevance of systemic cyber threats has been emphasized by leading regulatory and macroprudential institutions, cf. WEF [110] and ESRB [46]. Examples of contagious threats include the WannaCry and NotPetya cyber attacks where the corresponding malware spread through networks of interconnected IT devices and firms, causing tremendous losses to cyber systems worldwide. 13 Modeling systemic cyber risks requires models of feedback effects, local and global interaction, as well as strategic interaction. We describe three concrete methodological approaches (see Fig. 1): Firstly, self-excitation of cyber incidents can be captured by Hawkes processes on an aggregate level (Sect. 3.1); in this respect, Hawkes processes can be interpreted as a top-down approach. Secondly, epidemic network models (Sect. 3.2) capture the interconnectedness and cascading propagation of risks; this bottom-up approach may focus on local connections, but can also capture global interaction via aggregate, mean-field quantities. Both approaches can be viewed as mechanistic interaction models in which rational or strategic behavior of agents is typically not mirrored. This is the focus of the third approach, game-theoretic models

Hawkes processes
Systematic dependence of cyber incidents can be modeled by Cox processes; these permit to capture empirical features such as the autocorrelation of cyber attacks. Cox processes focus on common factors, but they do not model contagion in interconnected systems. An alternative are Hawkes processes, self-exciting processes, that mirror feedback effects, a specific form of systemic cyber risk; they also capture the stylized fact of autocorrelation of the number of events. Definition 3.1 (Hawkes process) A one-dimensional Hawkes process (N t ) t≥0 is a point process with jump times T 1 , T 2 , . . . and with random intensity t → λ t , given by where μ(·) is a baseline intensity of jumps, and where ϕ is the excitation function or kernel function resp. which expresses the positive influence of past incidents at time T n on the current value of the intensity.
From a conceptual point of view, Hawkes processes allow to capture-besides autocorrelation of the number of cyber risk incidents-excitation effects, by coupling the arrival rate of events with the number of past incidents. In particular, this allows modeling systemic incidents that affect a very large number of counterparties at the same time, e.g., the spread of worm-type malware.
Self-excitation of cyber incidents for each policyholder as well as the excitation between policyholders of different groups can be modeled by a multivariate Hawkes model. More precisely, for all cyber risk modules m = (c, k) and for any policyholder i of group k, the intensity of the counting process (N m,i t ) t≥0 takes the form is the deterministic base intensity function, depending on the cyber In this multivariate Hawkes model, the kernels ϕ c,k,k i,i describe the self-excitation for policyholder i of group k, while the ϕ c,k,l i, j for different policyholders i = j model contagion between policyholders and across groups.

On calibration and application
Using suitable parametric functions for both the baseline intensity and the kernels of Hawkes processes can in principle be estimated by maximum-likelihood methodsprovided that data is available in the desired amount and granularity. Data availability is, of course, still a major challenge in cyber insurance. Model calibration and statistical parameter estimates in a cyber context are, e.g., presented in Bessy-Roland et al. [7] focusing on data breaches. Further, Hawkes processes are also used in an empirical study of cyber risk contagion in Baldwin et al. [4]. In the context of financial data, maximum-likelihood methods and graphical goodness-of-fit are, e.g., discussed in Embrechts et al. [43]. Da Fonseca and Zaatour [28] develop an estimation by the method of moments which is fast compared to likelihood estimation. A general discussion including Bayesian estimation is presented in Daley and Vere-Jones [31], see also Giesecke [54], Errais et al. [45], and Aït-Sahalia et al. [1].
Since Hawkes processes can be easily incorporated with a classical actuarial frequency model for systemic cyber risk, they can be integrated into the standard collective risk model if complemented by an appropriate severity modeling approach. In principle, the severities of systemic events could be chosen as described in Sect. 2.1.2 for idiosyncratic and systematic events. Due to the limited amount of data and uncertainty about the possible impact of future systemic cyber incidents, accurate modeling of systemic severities is extremely challenging in practice.
Hawkes processes take a top-down approach to modeling systemic cyber risk and neglect the specific infection processes that underlie risk contagion in interconnected systems. Important aspects of risk amplification and possible accumulation scenarios may not be adequately captured. This is the main attractive feature of epidemic network models; their disadvantage is their increased complexity.

Epidemic network models
Interconnectedness constitutes a key characteristic of cyber systems. Systemic cyber risks may spread and amplify in networks of interconnected companies, economic actors, or financial institutions. Cyber network models for contagious risk propagation consist of the following three key components: 1. A network (also called graph) whose nodes represent components or agents.
These entities could be individual corporations, subsystems of computers, or single devices. The edges of the network correspond to possible transition channels, e.g., IT connections or exchange of data/computer code, see Sect. 3

Networks
The network structure is encoded in its adjacency matrix A = (a i j ) i, j∈{1,...,N } ∈ {0, 1} N ×N , which is defined by its entries By definition, G is undirected if and only if A is symmetric. Examples of undirected network topologies with N = 8 nodes are depicted in Fig. 2.
In applied network analysis, the exact network structure is often unknown. In this case, random network models enable sampling from a class of networks with given fixed topological characteristics (such as the overall number of nodes). 16 In the cyber insurance literature, network models are mainly applied to study risk contagion, e.g., modeling the propagation of malware in IT networks of interconnected firms or devices. In addition to an underlying network, an appropriate model of the contagion process that captures epidemic spread is needed.

Epidemic spread processes
Models of infectious disease spread dynamics have been studied extensively in mathematical biology and epidemiology, dating back at least to the seminal work of Kermack and McKendrick [68]. 17 In this paper, we focus on epidemic network models for populations of entities.
At any point in time, each node is in a particular state, which may change over time as it interacts with other nodes. According to their state, individuals are divided into various compartments, e.g., individuals that are susceptible (S) to an infection, infected (I ) individuals, or individuals who have recovered (R) from the infection. For a network of N nodes, the spread process at time t can be described by a state vector where E is the set of compartments. Both Markov and non-Markov processes have been considered in the context of epidemic spread processes. 18

Markovian Spread Models
In Markovian spread models on networks, the evolution of the state vector X (t) is described by a (in many cases: In both models, a transition of X from one state in E N to another is possible only if exactly one node changes its state X i in E. State changes may occur through infection or recovery: It is assumed that each node may be infected by its infected neighbors, but can be cured independently of all other nodes in the network. Each node is endowed with an independent exponential clock and changes its state when the exponential clock rings. Letting τ > 0 and γ > 0, the rates of these transitions are illustrated in Fig. 3 and given as follows (i = 1, . . . , N ): where Z = S, for the SIS, and Z = R for the SIR model, respectively.
The exponential transition times enable an intuitive stochastic simulation algorithm: the well-known Gillespie algorithm, first introduced in Gillespie [55] and Gillespie [56]; see Appendix C for details.
For practical purposes such as the pricing of cyber insurance contracts, we often do not need the full information provided by the Markov chain evolution, but only the dynamics of specific quantities such as moments or (infection) probabilities. Of particular interest are the dynamics of the state probabilities of individual nodes P(X i (t) = x i ), t ≥ 0. They can be derived from Kolmogorov's forward equation and written in general form as (i = 1, . . . , N ) where q zy denotes the transition rate of the entire process X from z → y. In natural sciences, this equation is also known under the term master equation. For the SIS and SIR models, using Bernoulli random variables , the dynamics of state probabilities of individual nodes (4) can conveniently be written via moments: • SIS model: 19 Since i.e., the evolution of X is fully described by the evolution of the vector I (t) = (I 1 (t), . . . , I N (t)), and the single node infection dynamics for i = 1, . . . , N are given by Equation (4) corresponds to: for i = 1, 2, . . . , N . Again, the system is not closed due to the presence of second order moments.
The main problem with systems (5) and (6) is the fact that they are not closed: They depend on second order moments, which, in turn, depend on third order moments, etc. For example, the fully closed SIS model yields N i=1 N i = 2 N − 1 moment (i.e., infection probability) equations. Solving these systems exactly becomes intractable for networks of realistic size. To deal with this issue, the following two approximation approaches have been proposed: 1. Monte Carlo simulation: Monte Carlo simulation using the Gillespie algorithm (see Appendix C) constitutes a powerful tool to obtain various quantity estimates related to the evolution of the epidemic spread. In particular, this includes the state probability dynamics of individual nodes (4). 20 2. Moment closures: If a set of nodes J ⊂ V is infected, this increases the probability of other nodes in the network (that are connected to the set J via an existing path) to become infected as well. Node states do not evolve independently and are to some extent correlated. To break the cascade of equations and to make ODE systems tractable, the moment closure approach consists in factorizing moments beyond a certain order k, substituting all higher-order moments. This is done by considering the exact moment equations up to this order k and closing the system by approximating moments of order k + 1 in terms of products of lower-order moments using a mean-field function. A detailed description of two different types of moment closures is provided in Appendix D. However, a major problem with moment closures is that only little is known about rigorous error estimates. 21 This presents an important avenue for future research.

Non-Markovian Spread Models
Non-Markovian models possess conditional distributions that may depend on the past and on further random factors. In contrast to the Markovian setup, where transition times are necessarily exponential, non-Markovian models might allow additional flexibility to freely choose the distributions of infection and recovery times. In addition, dependence among the infection times may be included. This generality may improve the quality of a fit to real-world data. However, the extended generality in comparison to Markov models is typically associated with reduced tractability. For this reason, non-Markovian models are less commonly considered. In addition, a similar scope of flexibility can also be achieved within the class of Markovian models by extending the dimension of the state space; but this comes again at the price of increased complexity and possibly reduced tractability.
A simple example of a non-Markovian model for the spread of cyber risks has been proposed by Xu and Hua [114]. The model does not include immunity, i.e., the underlying compartment set is the same as for the Markovian SIS model. The considered waiting times in the model are: To simulate the process, the waiting times for all nodes are generated according to their current state (i.e., recovery times for all infected nodes, and internal and external infection times for all susceptible nodes). The minimum of these waiting times determines the next event (infection or recovery). After this change, all quantities are recomputed and the process is repeated until a prespecified stopping criterion is met. 22 Finally, note that a Markovian SIS model with outside infections 23 can be obtained as a special case by choosing exponentially distributed infection and recovery times and assuming independence between all waiting times.

Loss models
Given the underlying network, and the epidemic spread process X on it, the third and final ingredient of a cyber risk network model is given by a suitable loss model Y i, j for each node i = 1, . . . , N , where j describes the number of loss events. In the existing literature, loss models are kept rather simple as the focus lies on modeling the cyber-epidemic spread. We give two examples: 1. In Fahrenwaldt et al. [47], cyber attacks are launched in a two-step procedure: First, using a random process, times of attacks on the entire network (loss events) t 1 , t 2 , . . . are generated. Second, for each node i, a possible random loss L i, j is modeled, where j describes the index of the corresponding attack time. The loss, however, only materializes if node i is infected at the attack time. This is captured by the loss model [114], the loss model Y i, j is given by for each event j is obtained from the infection dynamics, while the data loss sizes D i, j are assumed to follow a beta distribution.

On calibration and application
Epidemic network models in the cyber insurance literature mostly focus on a general assessment of the underlying structure of systemic cyber risks: aspects of risk contagion and propagation are characterized in a qualitative sense. For example, Fahrenwaldt et al. [47] study the effect of homogeneous, star-shaped, and clustered topologies on the resulting overall insurance losses in regular networks, demonstrating the strong impact of the network topology on risk propagation. Further, epidemic network models could also be applied to identify critical initial infection locations or critical network links that may augment cyber losses. The models are thus particularly useful for counterfactual simulations and have not yet been calibrated to real-world data.
More applications of epidemic network models to cyber risk contagion can be found in the engineering literature. However, these works do not study risk emergence on a global level. Instead, they analyze cyber risks which are building from the microstructure of interconnected IT devices in local environments. For example, Powell [97] focuses on local IT authentication procedures, where the corresponding vectors of lateral movements within a network can be interpreted as edges of a directed mathematical graph. Possible attack vectors are evaluated using classical metrics from network theory and epidemic spreading models of SIR type. More technical and ITrelated aspects of cyber security issues in smart grids, i.e., networked power systems for energy production, distribution, and consumption, are surveyed and discussed in Wang and Lu [108].
However, for the quantitative assessment of systemic cyber risk from a regulatory or actuarial perspective, contagion among different companies needs to be studied on a global scale. A major challenge for accurate modeling is the estimation of the exact network structure and the epidemic parameters of past and future incidents-particularly due to data limitations and the speed of technological evolution. In Appendix E, we provide a brief overview and classification of existing estimation approaches for epidemic network models that are not necessarily related to cyber; in our view, however, it is conceivable that such approaches could also be implemented and further developed in a cyber context in future cyber risk research.
To overcome the estimation challenge, top-down approaches have been proposed in the literature. In Hillairet and Lopez [63], the impact of massive global-scale cyberincidents, like the WannaCry scenario, on insurance losses and assistance services is determined. While network contagion is implicitly considered, it is not modeled within an actual network framework; instead, the authors choose the original populationbased SIR model of Kermack and McKendrick [68] which describes deterministic dynamics of the total numbers of susceptible, infected, and recovered individuals within the global population of IT devices. The corresponding ODE system is given by : i.e., the local hazard rates are assumed to be proportional to the number of infected individuals in the global population.
Most recently, this model has further been extended by replacing the homogeneous global population model with a network scenario of interconnected industry sectors, see Hillairet et al. [64]. The underlying directed and weighted network structure is derived from OECD data that measures the economic flow between different industries, and this data is interpreted as a reasonable estimate of the digital dependence between these sectors. Contagion between sectors is modeled using a deterministic multi-group SIR model for the total numbers of susceptible, infected, and recovered companies in the sectors. Due to the scarcity of data currently available, such top-down approaches present promising avenues for risk management and actuarial modeling.
Additionally, future research should analyze the implementation of more realistic loss models, that, e.g., contain different types of cyber events and capture their characteristic severity distributions (see also the discussion on classical frequency-severity approaches in Sect. 2.1.2). This would further strengthen the applicability of network models in practice.

Game-theoretic models and strategic interaction effects
In addition to contagion due to the interconnectedness of entities in cyber networks, potentially different objectives of the actors and their strategic interaction constitute a key characteristic of systemic cyber risk. The risk exposure of individuals is often interdependent, since it is influenced by the behavior of other actors. Game theory provides a suitable framework to study this dimension in the cyber ecosystem.
In the first part of this section, we briefly review and provide a short mathematical introduction to game theoretic approaches applied to study cyber risk and cyber insurance (Sect. 3.3.1). For an exhaustive review of the corresponding literature, we refer to the surveys Böhme and Schartz [19], Böhme et al. [12], and Marotta et al. [81]. We will adopt the notation from Marotta et al. [81]. Sect. 3.3.2 evaluates the considered models.

Game theoretic modeling approaches
The majority of game theoretic contributions focuses on self protection of interdependent actors in the presence or the absence of cyber insurance. A key question is whether and under which conditions cyber insurance provides incentives for self protection and improves global IT security. In this section, we present 24 the main ideas and characteristics of such models.

Three Different Types of Actors in the Game
We consider three types of strategic players with different objectives: potential buyers of insurance (for simplicity, called agents), insurance companies, and the regulator. Pal [93], and Pal et al. [94], allow for heterogeneous preferences, the majority of models assumes homogeneous preferences, i.e., U i = U across all agents. • W i is the financial position of agent i at the end of the insurance period. The value W i depends on whether the agent has bought an insurance contract or not, on her investment C i in cyber security, and on potential losses L i in case the agent is affected by a cyber attack.
The agent's self protection level x i is a crucial model component when studying interdependence. 25 Most of the existing literature falls into either of the following two distinct categories: Some assume that only two security states are possible, secured or not, with the corresponding constant cost C or 0. Others propose a continuous scale of security levels, e.g., x i ∈ [0, 1]. The value of x i affects • the cost of self protection C i : For a continuous spectrum of security levels, i.e., x i ∈ [0, 1], C i = C(x i ) is typically assumed to be an increasing convex function of x i , reflecting that user costs rapidly increase when improving security. • agent i's probability of becoming infected p i := P(I i = 1): Obviously, this probability depends on the individual security level x i of the agent i, but-due to interdependence-it may also be influenced by the individual security levels of other network participants.
Within this framework, agent i's expected utility can be computed (a) without insurance: where • W 0 i denotes the initial wealth of agent i. • π i is the insurance premium of agent i set by the insurer. This premium depends on the type of insurance market; we will discuss different models below. • L i is the potential loss of agent i that is governed by a binary distribution: only two possible scenarios are considered. Either the agent experiences a cyber attack with a fixed loss size, or she is not attacked which corresponds to no loss. This particular setting excludes the possibility of different types of cyber attacks. Multiple attacks are also not considered. 26 The majority of game theoretic models relies on the assumption of constant homogeneous losses for all agents, i.e., L i ≡ L.
•L i is the cover in case of loss which is specified in the insurance contract. Most papers assume full coverage, i.e.,L i = L i , but some consider alternatively partial coverage, e.g., in order to mitigate the impact of information asymmetries, cf. Mazzoccoli and Naldi [84], Pal [93], Pal et al. [94].

Insurance Companies:
The insurer sets the cyber insurance premiums and specifies the insurance coverL i . Insurance premiums depend on the market structure: • Competitive market: This is the prevailing model in the literature. The profits of the insurers are zero in this case; customers pay fair premiums. Competitive markets are a boundary case that almost surely leads to the insurer's ruin in the long run. • Monopolistic market / Representative insurer: Another extreme is a market with only one insurance company. In these models, the impact of a monopoly can be studied. An alternative consists in studying objective functions that are different from the insurer's profit. This situation is mostly studied in the context of regulation: The insurer represents a regulatory authority and is not aiming for profit maximization, but focuses on the wealth distribution in order to incentivize a certain standard of IT protection. 27 • Immature market/Oligopoly: Instead of a monopoly, imperfect competition is studied with multiple insurers that may earn profits. The increments between the fair price and the insurance premium is determined by the markets structure. 28 3. Regulator: Market inefficiencies and a lack of cyber security may be mitigated by regulatory policies. Regulatory instruments include mandatory insurance, fines and rebates, liability for contagion, etc. The choice of policies and their impact can be studied 29 by introducing a third party, the regulator. The objective of the regulator is to maximize a social welfare function. This could, for example, be chosen as the sum of the expected utilities of the agents

Interdependent Self Protection in IT Networks
The strategic interaction of the three types of players introduced above is modeled as a game. The agents form an interconnected network and optimize their expected utility. Their individual security level and the amount of cyber insurance coverage serve as their controls. The insurance companies are provider of risk management solutions. In some models, a regulator is included as a third party with the aim to improve welfare, e.g., by implementing standards of protection in cyber systems. 27 Market models of this type are studied in Naghizadeh and Liu [90] with a zero-profit insurer. Profits are still possible in Pal [93], Pal et al. [94] and Pal et al. [95], and maximized in Khalili et al. [69]. 28 Immature markets are considered, e.g., in Martinelli et al. [82], Martinelli and Yautsiukhin [83], Ogut et al. [92]. 29 The effects of such regulatory instruments were, e.g., studied in Bolot and Lelarge [15], Naghizadeh and Liu [90], Pal [93], Pal et al. [94].
Agents are interdependent in the network, since the infection probability p i depends on the local security level x i and levels of the other nodes y i : =  (x 1 , . . . , x i−1 , x i+1 , . . . , x N ) (or at least of i's neighbors). In some cases, p i is assumed to depend on an overall network security level as well. 30 However, in contrast to the models from Sect. 3.2, attacks do not result from a dynamic contagion process; instead, the infection is assumed to be static and the values p i are derived from ad hoc schemes. The most common one 31 assumes a continuous spectrum of security levels and computes p i as the complementary probability of the case that neither a direct nor an indirect attack occurs:

the probability of direct infection of i through threats from outside the network. It is interpreted as a function of the individual security level
) is the probability for node i to become infected through contagion. The probability for i to be infected via node j is given by h i, j , i.e., h i, j = 0 only if i and j are adjacent. This is where the underlying network topology comes into play.
In the absence of information asymmetries, the game between agents and the insurer(s) involves three perspectives: 32 1. A legal framework is set by the regulator (if a regulator is present). 2. Agents specify their levels of self protection and insurance protection and select from the available contract types to maximize their expected profits. 3. Insurance companies compute the corresponding contract details, i.e., premiums π i and indemnitiesL i . In absence of information asymmetries between agents and the insurer(s), the protection levels of policyholders can be observed by the insurer and are reflected by the contract.
The model may be augmented to incorporate information asymmetries: 30 This is the case in Shetty et al. [103], Shetty et al. [102] and Schwartz and Sastry [100]. 31 An alternative approach using a simplified two-state scenario of security investments is analyzed in Bolot and Lelarge [13], Bolot and Lelarge [14], and Yang and Lui [115]. Infection probabilities are derived from a recursive branching process. 32 Variations of the game design are possible; e.g., in Laszka et al. [77] the authors use a signaling game instead of a game model to study the adverse selection problem, allowing insurers to audit the agents' security. A similar game is considered in Khalili et al. [69] who introduce a pre-screening procedure.
• Moral hazard: A dishonest policyholder may behave in a way that increases the risk, if the insurer cannot properly monitor the policyholder's behavior. In the game, this is represented by the possibility for agents to modify their self protection 33 levels. • Adverse selection: Agents with larger risks have a higher demand for insurance than safer ones. The degree of the policyholders' risk tolerances cannot be observed by the insurer. The self protection levels of policyholders is not precisely known by the insurer when the contract details are computed.
In most papers, cyber insurance is not associated with additional incentives to enhance self protection. In contrast, agents may prefer to buy insurance instead of investments in self protection, i.e., from a welfare perspective, they underinvest in security. These observations may be interpreted as an indication that regulatory interventions are necessary, such as fines and rebates, mandatory cyber insurance, or minimal investment levels. 34

On calibration and application
Many questions remain to be answered in future research, since the existing game theoretic models of cyber insurance and cyber security are oversimplified-thus, not yet fully applicable to real-world data: • Simplified network topologies: In the vast majority of the discussed literature, networks are assumed to be homogeneous. However, agents are typically heterogeneous in reality which substantially alters the cyber ecosystem. Network contagion and cyber loss accumulation are highly sensitive to the topological network arrangement; for example, important determinants are the presence (or the absence) of central hub nodes or clustering effects, see, e.g., Fahrenwaldt et al. [47]. For appropriate risk measurement and management these aspects need to be taken into account explicitly. • Static contagion: A key feature of cyber risk in networks is the systemic amplification of disturbances. From the insurer's perspective, the contagion dynamics will clearly influence tail risks; an example are catastrophic incidents that affect a large fraction of its portfolio. Such events may be critical in terms of the insurer's solvency. An understanding of cyber losses and an evaluation of countermeasures requires dynamic models of contagion processes. 33 Some authors distinguish between self insurance-a reduction in the size of a loss-and self protectiona reduction in the probability of a loss, as suggested by Ehrlich and Becker [38]. While such a distinction may be intuitive in models with simple loss distributions or in frequency-severity models, it is sometimes more appropriate to model loss exposure by random variables or distributions and analyze action on that level. Safety measures often influence loss sizes and probabilities together. How useful the distinction of Ehrlich and Becker [38] is, depends on the modeling framework chosen and the particular application. The term self protection in this paper refers to any activity to reduce physical risk-including both the size of losses and their probabilities. 34 The effect of fines and rebates was studied in papers [15], [90], [93], and [94]. In the presence of information asymmetries, fines and rebates cannot easily be applied. An alternative regulatory instrument are requirements on minimal investment levels for IT security. However, Shetty et al. [103], Shetty et al. [102] and Schwartz et al. [101] argue against such requirements.

• Constant losses:
In all considered game-theoretic models, the agent's losses are assumed to be constant, i.e., modeled as binary random variables. However, in reality we observe that the severity of instances varies substantially due to the heterogeneity of cyber events, ranging from mild losses (e.g. malfunctioning of email accounts) to very large losses (e.g. attacks on production facilities, or breakdowns of systems).
Cyber insurance and instruments to control cyber risk depend on the structures of networks, the dynamics of epidemic spread processes, as well as loss models-and vice versa. These feedback loops need to be properly incorporated in future research. Key ingredients of systemic cyber risks-the interconnectedness captured by epidemic network models, and strategic interaction described in game-theoretic models-must be combined.

Pricing cyber insurance
Cyber risk comprises both non-systemic risk, further subdivided into idiosyncratic and systematic cyber risk, cf. Sect. 2, and systemic risk, cf. Sect. 3. Classical actuarial pricing, however, relies on the principle of pooling, and it is thus applicable for idiosyncratic cyber risks only. For systematic and systemic cyber risk, the appropriate pricing of insurance contracts requires more sophisticated concepts and techniques. A discussion of current industry practice for pricing cyber risks can be found in Romanosky et al. [99]. However, the described approaches do not yet cover the full complexity of cyber risk such that further (scientific) efforts are necessary. In this section, we explain and suggest suitable pricing techniques 35 tailored to the three different components of cyber risk.

Pricing of non-systemic cyber risks
In non-life insurance, contracts are usually signed for 1 year. At renewal time, the insurer may adjust premium charges as well as terms and conditions, while the policyholder can decide whether or not to continue the contract. Premium calculation thus typically refers to loss distributions on a one-year time horizon. In this section, we adopt this market convention and consider premiums payable annually in advance. 36 In this chapter, we are concerned with a general pricing approach, and we do not restrict ourselves to frequency-severity models. We do, however, adopt some of the notations presented earlier. As introduced in Sect. 2, losses and associated premiums 35 Another challenge is the insurability of (systemic) cyber risks. Many insurers report that they limit their exposure to this line of business due to a lack of data and models. At the same time, a comparison of rough estimates of supply and potential demand reveals a significant gap in cyber insurance. This is detrimental to agents who are exposed to the risk and do not receive insurance coverage. But insurance companies are also missing out on potentially large business opportunities. In addition to the problems with data and models, however, there is also a fundamental question about the insurability of cyber risk in light of systemic risk. A structured pricing methodology can provide a realistic assessment. 36 For simplicity, we assume that interest rates are zero, or alternatively that insurance claims are already discounted. are considered in the granularity of cyber risk categories c ∈ {1, . . . , C} and homogeneous groups k ∈ {1, . . . , K } of policyholders. Each pair m = (c, k) is called a cyber risk module. In terms of a modular system, the premium per risk category serves as a component for the overall premium. Homogeneous groups-specified for example in terms of covariates-correspond to tariff cells, i.e., any policyholder in group k should pay the same premium π m,non-sys per risk category c. We denote by n k the number of policyholders in group k and assume that volumes and distributions of risks within a group are identical. Although adopting the previously introduced notation, we do not necessarily consider a frequency-severity approach, but discuss pricing strategies that may also be applied in a more general framework. The methodology is inspired by Wüthrich et al. [113] and Knispel et al. [74].
To decouple the pricing of idiosyncratic and systematic cyber losses in the absence of systemic risk, one possible approach is to construct a decomposition of the total non-systemic claims amount S In order to obtain a decomposition (7), we consider the σ -algebra F that encodes the systematic information. This is, for example, the information that is generated by observing the underlying exogenous stochastic factors. The full information at the time horizon of one year is jointly generated by the σ -algebra F and idiosyncratic fluctuations, also called technical risks, sometimes explicitly encoded by another σalgebra T . A decomposition (7) can be obtained by setting is square-integrable and const = 0, Eq. (7) corresponds to an orthogonal decomposition in the Hilbert space of square-integrable random variables. An adjustment of the constant might be desirable for allocating the total premium for non-systemic risk to its two components.

Pricing Idiosyncratic Risk
As a special case, we consider the case that idiosyncratic cyber risks in a portfolio of individual claims are conditionally independent given F. For homogeneous groups of policyholders, defined in terms of covariates vectors x k , k ∈ {1, . . . , K }, this type of risk is thus subject to pooling of risk, and hence a conditional version of classical actuarial pricing is still applicable. A valuation based on F-conditional means with respect to the statistical or real-world measure P is mathematically justified by a conditional version of the strong law of large numbers.
More precisely, for each firm i with cyber risk module m = (c, k), annual losses 37 S m,i 1 , i ∈ N, are identically distributed given F, and we suppose that the increments are conditionally independent given F. Then the average claims amount tends asymptotically 38 to the conditionally expected claims amount per policyholder: When setting const = 0, this is exactly the systematic claims amount for any firm i according to decomposition (7), suggesting that any premiums per policyholder for idiosyncratic cyber fluctuations should-for a large number of policyholders n k in group k-be equal to zero and only 39 the systematic part should be priced. This is analogous to the net risk premium in the unconditional setting. But the net risk premium is known to be insufficient! Indeed, in a multi-period model, ruin theory states that ruin of the insurer occurs-no matter of the initial capital-in the long run P-a.s. if only the net risk premium is charged, see, e.g., Mikosch [88] and the references therein. A related result already holds in the one-period setting: Suppose that the premium charged from each firm admits the perfect replication of the systematic part E P [Ŝ m, 1 1 |F] (e.g., in a complete financial market). Letting the number of policyholders n k tend to infinity, the F-conditional one-period loss probability converges to 50%, due to the central limit theorem. To stay on the safe side, a safety loading is necessary in addition to the net risk premium. In our specific construction, the idiosyncratic part S m,idio 1 for firm i in (7) equals ε m,i possessing expectation zero; but in alternative decompositions, a non-zero expectation corresponding to a non-zero net risk premium for the idiosyncratic part would also be admissible. This is an issue of premium allocation between idiosyncratic and systematic cyber risk only, but does not affect the total premium for non-systemic cyber risk, if cash-additive premium principles are used to price idiosyncratic risks.
Classical actuarial premium principles provide explicit safety loadings in a transparent manner, based on the first two moments of the loss distribution: 37 Since we are not assuming a frequency-severity model as in Sect. 2.1, we introduce a slightly different notation to indicate that we are not generally referring to the specific setting discussed in Sect. 2.1. 38 We recall that n k denotes the number of policyholders in group k. The safety loading parameter a can be chosen for each cyber risk module m = (c, k) separately, for example, depending on the specific loss distribution and the number of contracts n k in tariff cell k. In addition to these simple explicit premium principles, the safety loading can be imposed implicitly, e.g., in terms of convex principles of premium calculation including the well-known exponential principle or Wang's premium principle as special cases, cf. Example 4.1.

Pricing Systematic Risk
Systematic cyber incidents affect different firms at the same time-in contrast to idiosyncratic cyber incidents. Thus, (perfect) pooling of risk is no longer applicable and classical actuarial valuation has to be replaced by a more complex analysis. In the insurance context, systematic risk is not limited to cyber risk only, but also plays a prominent role in the valuation of unit-linked policies or the calculation of the marketconsistent embedded value (MCEV) of an insurance portfolio, whereby idiosyncratic actuarial risk and systematic financial market risks must be evaluated together. For an overview on financial pricing methods in insurance we refer to Bauer et al. [6].
In this section, we propose a valuation of systematic cyber risk in terms of modern financial mathematics, combining the principle of risk-neutral valuation with the theory of monetary risk measures, see Knispel et al. [74] for a similar discussion related to the Market-Consistent Embedded Value (MCEV) of insurance portfolios. This comprehensive approach requires two different probability measures: While the assessment of risk in terms of a monetary risk measure is based on the real-world measure P that models the relative frequency with which events occur, valuation of contingent claims in financial mathematics relies on a technical measure Q, called risk-neutral measure or martingale measure. In concrete application, tractable models may be obtained by assuming that the systematic one-year losses S m,systematic 1 in Eq. (7) is triggered by common risk factors to which all policyholders are jointly exposed. The total claim amount may be viewed as a contingent claim, depending on the evolution of these common factors.
Generally speaking, contingent claims are contracts between two or more parties which determine future payments to be exchanged between the parties conditional or contingent on future events. Formally, a contingent claim with payoff at terminal time t = 1 is described as a random variable. In financial mathematics, the valuation of contingent claims relies on a financial market model on a filtered probability space ( , F, (F t ) t∈ [0,1] , P) with a number, say d + 1, of liquidly traded primary products with price processes (P 0 t ) t∈[0,1] , (P 1 (specifying the number of shares ϑ i t of primary products in the portfolio at time t) whose terminal wealth V ϑ 1 coincides with the payoff H 1 for almost all scenarios. In absence of arbitrage, the price H 0 of a replicable contingent claim H 1 is unique and equals the cost of perfect replication. The calculation of this price can, however, be decoupled from the calculation of the replication strategy itself by the principle of risk-neutral valuation. Formally, risk-neutral valuation resembles the classical actuarial valuation in the sense that prices are computed as expectation of future discounted payments. The real-world measure P must, however, be replaced by a technical probability measure Q, called risk-neutral measure or martingale measure. The latter name is motivated by the fact that discounted prices (P i t /P 0 t ) t∈[0,1] , i = 0, 1, . . . , d, must be martingales with respect to Q, i.e., the current discounted price at some time t is the best prognosis of the expected discounted price at a future date s > t given the available information F t : The risk-neutral valuation formula transfers this martingale property to the discounted prices of contingent claims. In particular, i.e., the cost of replication can be obtained as expectation of the discounted payoff with respect to any arbitrary (equivalent) martingale measure Q. 41 Markets are, however, typically incomplete 42 in the sense that not every contingent claim can be replicated in terms of liquidly traded primary products. In particular, contingent claims arising from cyber risks cannot be hedged perfectly in the financial market. For non-replicable contingent claims, risk-neutral valuation is still applicable, but now provides-depending on a whole class of martingale measures-an interval of prices which are consistent with the absence of arbitrage. Our evaluation of nonreplicable contingent claims, however, is based on monetary risk measures and capital requirements.
Let us denote by X the set of financial positions with maturity t = 1 whose risk needs to be assessed. Mathematically, the family X is a vector space of realvalued mappings X 1 on ( , F, P) that contains the constants. By sign-convention, negative values of X 1 correspond to debt or losses, i.e., the claims amount S m,systematic 1 corresponds to the financial position X 1 = −S m,systematic 1 . A monetary risk measure 43 ρ : X → R quantifies the risk of a contingent claim that cannot be priced by the cost 41 The martingale measure Q is said to be equivalent to the underlying real-world measure P if both measures have the same null sets, i.e., for any A ∈ F we have Q[A] = 0 if and only if P[A] = 0. Conceptually, this means that market events that are not relevant with respect to the real-world measure also play no role for the evaluation of contingent claims under Q and vice versa. 42 In absence of arbitrage, incomplete financial market models are characterized by the existence of a whole class of equivalent martingale measures. 43 For a rigorous introduction to the theory of risk measures we refer to Föllmer and Schied [51]. of perfect replication. Intuitively, a monetary risk measure can be viewed as a capital requirement: ρ(X 1 ) is the minimal capital that has to be added to the position X 1 to make it acceptable, e.g., from the perspective of a financial supervisory authority, a rating agency, or the board of management. To capture the idea that homogeneous risks are assessed in the same way, we assume that ρ is distribution-based, i.e., ρ(X ) = ρ(Y ) whenever the distributions of X and Y under P are equal. Prominent examples of distribution-based monetary risk measures are Value at Risk (VaR) and Average Value at Risk (AVaR). 44 Combining these two approaches, an algorithm for the calculation of the premium π m,systematic can be summarized as follows (see also [74]): , in accordance with the company's risk strategy. Conversely, if the insurer's risk budget has not yet been exhausted, it might be helpful to limit 44 For a financial position X 1 , its Value at Risk at level λ ∈ (0, 1) is the smallest monetary amount that needs to be added to X 1 such that the probability of a loss does not exceed λ: In particular, VaR λ (X 1 ) = −q + is the upper quantile function of X 1 . The Average Value at Risk, also called Expected Shortfall, at level λ ∈ (0, 1] is defined by VaR α (X 1 ) dα. 45 Observe that H m 0 is typically negative, thus −H m 0 positive. 46 This lower bound could also directly be computed via a modified risk measure that is constructed according to Föllmer and Schied [50], see also Chapter 4.8 in Föllmer and Schied [51]. the hedging expenses by a bound −H m 0 ≤H max and to accept the remaining risk ρ(R m 1 ). 47 This concept can be applied for each cyber risk module standalone, but provides additional benefits at portfolio level. If the underlying risk measure ρ is even subadditive (and thus provides incentives for diversification), then the lower bound for the actual premium charge can be further reduced. More precisely, for any given decomposition −E P [Ŝ m,i 1 |F] = H m,i 1 + R m,i 1 of the systematic term in Eq. (7) per cyber risk module m = (c, k) and policyholder i = 1, . . . , n k in group k into a hedgeable part H m,i 1 and a residual summand R m,i 1 , the risk of the residual term of the aggregated systematic risk satisfies while the costs of perfect replication are additive. Thus, the total premium required at portfolio level is in fact lower than the aggregated premiums: The diversification effect can be allocated-according to the insurer's business strategy-to the cyber risk modules, yielding a reduction of the lower bound for the actual premium charge per module.

Pricing of systemic cyber risks
Systemic risk is an important issue in cyber insurance. If entities are interconnected, risks may spread and amplify in cyber networks. In addition, this process depends on investments in cyber security and self protection of the agents in the network. Insurance premiums may, in turn, influence investment decisions and thereby modify the safety of the system, cf. Sect. 3. How to deal with complex cyber systems and the computation of systemic cyber insurance premiums is a topic of current research.
In this section, we develop some new ideas and introduce a preliminary, stylized approach that builds a bridge between the pricing of cyber insurance contracts and systemic risk measures. We consider N interconnected insurance customers in a cyber network that are also subject to idiosyncratic and systematic risk. For simplicity, we suppose that there exists only a single insurance company that offers J types of contracts. There are two dates, t = 0 and t = 1. The initial prices of the insurance contracts, represented by a matrix π = (π i, j ) i, j ∈ R N ×J , are π i, j where i = 1, 2, . . . , N denotes the insurance customer and j = 1, 2, . . . , J the contract type. Each customer 47 To ensure the existence of a decomposition under constraints, the bounds on risk ρ max and hedging expensesH max must satisfy the lower bounds ρ max ≥ inf{ρ(R i chooses a contract type j i from this menu and is charged a premium π i, j i . Customers decide simultaneously about insurance contracts and their investments into cyber security resulting in random losses Y = (Y i ) i=1,2,...,N at date t = 1, the end of the considered period.
In this setting, we discuss the notion of systemic premium principles. Suppose that-excluding the considered cyber business-the random net asset value of the insurance firm at date t = 1 is given byẼ. Including the cyber contracts, the net asset value 48 of the insurance firm is The computation of the net asset value implicitly considers network effects that influence losses and the underlying investment decisions of the insurance customers, i.e., the systemic risk inherent in the network. Systemic premium principles 49 refer to the family of premium matrices that are consistent with solvency requirements or risk limits and admissibility requirements of the insurance company. These can, for example, be formalized in terms of two acceptance sets 50 A E and A Y of monetary risk measures. The solvency condition or risk limit is satisfied, if E ∈ A E . An admissibility requirement is that the stand-alone business is viable, i.e., Conditions (8) and (9) characterize the systemic premiums, i.e., the family M of admissible premium matrices . Idiosyncratic risk and systematic risk can also be priced within this framework. Idiosyncratic risk can be priced by classical actuarial premium principles. This was discussed in Sect. 4.1. Many premium principles correspond to monetary risk measures that can be encoded by acceptance sets, leading to a framework that is consistent with our suggested approach for pricing systemic risk. The same applies to the residual part of systematic risks that is not replicated. If the insurance firm has access to a financial market that is itself not exposed to systemic risk, it may use this market to partially hedge its exposure. In the absence of systemic risk, this was outlined in Sect. 4.1. In the current setting, the impact on insurance pricing of trading in financial markets can be included by adjusting the acceptance sets A E and A Y according to Föllmer and Schied [50], see also Chapter 4.8 in Föllmer and Schied [51].  48 The interest rate over the considered period is set to 0 in this example. 49 The suggested concept of systemic premium principles parallels the notion of systemic risk measures of Feinstein et al. [48], see also Biagini et al. [8]. 50 See [52] and [51] for reviews on monetary risk measures. 51 To be more precise, the implementation of the regulatory rules are based on Mean-VaR and Mean-AVaR. Details are, e.g., discussed in Weber [109], Hamm et al. [60]. the risk measures VaR and AVaR, respectively. These risk measures would define the acceptance set A E in our setting.
The acceptance set A Y , in contrast, corresponds to a classical premium principle. Indeed, important actuarial premium principles are based on convex risk measures ρ (defined w. r. t. financial positions) and their acceptance sets A, respectively, 52 by choosing as a premium for a loss position L ∈ X ⊆ L 0 + ( , F). 53 Examples 54 of risk measures ρ corresponding to well-known actuarial premium principles are: • The family of entropic risk measures: Here, M 1 is the set of all probability measures on ( , F), and is the relative entropy of Q with respect to a reference measure P, for example, the real-world measure. Using a variational principle for the relative entropy, the entropic risk measure ρ γ takes the explicit form ρ γ (X ) = 1 γ log E P [exp(−γ X )] and thus corresponds to the exponential premium principle for the claims amount Y = −X . Note that ρ γ (X ) is increasing in γ and satisfies  52 A monetary risk measure can be recovered from its acceptance set A via ρ(X ) = inf{m ∈ R|X +m ∈ A}, i.e., ρ(X ) is the smallest capital amount that has to be added such that the financial position becomes acceptable, see, e.g., Föllmer and Schied [51]. 53 For details, see Section 8 in Föllmer and Knispel [49] and the references therein. defines a distortion risk measure, a comonotonic risk measure. If the distortion function is concave, the distortion risk measure corresponds to Wang's premium principle that guarantees a non-negative loading for any loss position Y = −X ≥ 0. In particular, the limiting case ψ = id again corresponds to the negative expected value which provides a lower bound for the actuarial premium.
If we introduce a weak partial order ≤ on the space of real-valued (N × J )-matrices R N ×J by component-wise ≤-comparison, the smallest admissible premiumsM in the family M of admissible premium matrices may be characterized. Although we are dealing only with one insurance firm in our specific construction, the heuristic argument of competitiveness might motivate to focus on premiums inM only. Typically, the admissible premium allocations will not be unique. A remaining question is the choice of a specific premium allocation. Further criteria or objectives need to be specified for this purpose. We briefly discuss three options: • Competition: The heuristic argument of competitiveness might also be used to argue that total premium payments should be as small as possible. This would lead to those allocations where N i=1 π i, j i is minimal. • Competitive segments: If some insurance customers are more price-sensitive and more important than other, one might introduce positive weights v i , i = 1, 2, . . . , N , and focus on allocations with minimal N i=1 v i π i, j i . • Performance optimization: If insurance customers were willing to accept any premium allocation inM , one could formulate an objective function of the insurance company that determines specific premium allocations. This could be a utility functional or a performance ratio such as RoRaC.
A detailed analysis of systemic premium principles in specific models and their statistical and algorithmic implementation are challenging and important questions for future research.

Conclusion and future research
In this paper, we provided a comprehensive overview of the current literature on modeling and pricing cyber insurance. For this purpose, we introduced a basic distinction between three different types of cyber risks: idiosyncratic, systematic and systemic cyber risks. Models for both non-systemic risk types were discussed within the classical actuarial framework of collective risk models. The (separate) discussion of modeling systemic cyber risks then focused on risk contagion among network users in interconnected environments as well as on their strategic interaction effects. Finally, we presented concepts for an appropriate pricing of cyber insurance contracts that crucially relies on the specific characteristics of each of the three risk types.
For both practitioners and academic researchers, modeling and pricing cyber insurance constitutes a relatively new topic. Due to its relevance, the area of research is growing rapidly, but modeling and pricing approaches are still in their infancy. In the introduction, we highlighted four important challenges: data, non-stationarity, aggregate cyber risks, and different types of risk.
Classical actuarial approaches rely heavily on claims data. To date, for cyber insurance this data is sparse and often inaccessible in the actuarial context due to confidentiality issues. As more data becomes available, different modeling approaches could be more easily tested and evaluated. Therefore, the development of (freely accessible) data collections for cyber risks is an important topic for future research. This requires collaboration between researchers, insurance companies, IT firms, and regulators. However, due to the evolutionary nature of cyber risks and their non-stationarity, the evaluation of data needs to be adaptive, and the relevance of historical information will most likely decrease over time. For this reason, it is important to combine expert opinions supported by advanced modeling techniques with the statistical evaluation of data.
Our systematic differentiation of risk types-idiosyncratic, systematic and systemicstructures the development of models and the evaluation of data. This facilitates an appropriate consideration of different types of risks. We advocate a pluralism of models that provides multiple perspectives in an evolving environment where issues of data availability and data quality remain unresolved. Aggregate cyber risks represent an important challenge that needs to be addressed at the systematic and systemic level. In this regard, we have identified the following promising avenues for future research: • Data on contagion. Epidemic network and contagion models require a special kind of data, namely connectivity data for designing realistic network topologies and epidemic spread data for determining epidemic parameter values. The nonstationarity of the cyber environment remains a challenge that must be addressed in this area as well. • Networks and contagion processes. The analysis of large-scale network models and epidemic processes is a difficult task. Developing and improving models and assessment methods is an important research task. Monte Carlo simulations are computationally intensive, and moment closures in mean-field approaches lack estimates of the resulting approximation error. Implementation procedures need to be refined and validated. In addition, realistic loss models are needed for assessing contagious cyber risks. • Top-down approaches. To capture contagion effects in digital networks without rendering the models impossible to handle, a number of top-down approaches has already been developed. These employ population-based models that omit the detailed structure of the underlying networks and processes. However, network topology, e.g., centrality or cluster effects, has a significant role to play. Existing models should therefore be improved via more elaborate refinements that bridge the gap between bottom-up network modeling and population-based top-down approaches. • Dynamic strategic interaction. The analysis of strategic interaction effects in cyber models has focused almost exclusively on static frameworks. Such an oversimpli- The list of research tasks that we provide here is not exhaustive. Many further challenges exist. Addressing them will contribute to a more resilient and safer cyber landscape in the future. The evolutionary nature of cyber risk, however, precludes all challenges from ever being finally resolved.  Table 2 Zeller and Scherer [116] Classification Examples (see Table 2  • Independent approximation: Using the identity as mean-field function, F(y) = y, the factorization of the moment of order k + 1 is done as if the split components were independent: For the special case k = 1, this equals the first order independent approximation derived above. In the SIS model, since and Cov(I I 1 , I I 2 ) ≥ 0, cf. Cator and Mieghem [20], the independent approximation leads to an upper bound of infection probabilities. • Hilbert approximation: The space of square-integrable random variables forms a Hilbert space with scalar product Y , For Y , Z ∈ L 2 , the scalar product defines an angle φ between the elements: Hence, taking the mean-field function F(y) = √ y, and using the idempotence of Bernoulli random variables, a moment of order k + 1 can be split into: Due to (D1), the resulting approximation error is low, if the angle φ is close to 0 • or 180 • . This observation may be used to determine an optimal split (J 1 , J 2 ) of a given index set J . In the SIS model, the Cauchy-Schwarz inequality implies that the first order Hilbert approximation leads to a lower bound of infection probabilities.
To apply these approximations, an appropriate partition scheme (J 1 , J 2 ) for index sets J of order k + 1 needs to be found. For the SIS model, a first optimal split procedure for both approximation types is suggested in Fahrenwaldt et al. [47], Algorithm 3.13. 2. Kirkwood closures: These closures constitute the main approach used in the epidemic literature. The underlying idea originates from statistical physics, precisely from the so-called BBGKY hierarchy, which describes the evolution dynamics of an interacting N -particle system, originally proposed by Kirkwood [71]: Considering a set J ⊂ V of k + 1 nodes and the corresponding moment E[B J ], we only take correlations into account which are stemming from infectious transmissions over paths of length at most k − 1, i.e., passing through a maximum of k nodes.
This idea reflects the original statistical physics approach that particle states may be assumed to be independent, if their distance exceeds a certain critical threshold. Now, assuming the independence of node states which are sufficiently far apart, the Kirkwood where J i ⊂ J denotes a subset of size i, i ≤ k, and ∈ {1, . . . , m i }, i.e., m i denotes the number of such subsets. A detailed derivation can be found, e.g, in Sect. V of Singer [104]. The Kirkwood approximation can be interpreted as generalization of the following scheme: For k = 1, states of any two nodes are assumed to be independent, i.e., the approximation equals the first order independent approximation described above.
For k = 2, we obtain a so-called pair-based model. Here, the system is closed on the level of triplets and correlations over edges are considered. In this case, the closure reads Two different cases for the node triplet { j 1 , j 2 , j 3 } may be considered: For closed triplets, i.e., triplets in which edges exist between all pairs of nodes (triangles), node states are pairwise correlated, and hence, the closure cannot be reduced. In contrast, for an open triplet only consisting of edges ( j 1 , j 2 ) and ( j 2 , j 3 ), the states of nodes j 1 and j 3 are assumed to be independent, and therefore, the closure may be reduced to This equals the exact result for E[B j 1 B j 2 B j 3 ] under the independence assumption, using Bayes' theorem. Thus, in the SIR Markov model, exact closures can be derived when considering cut-vertices i, i.e., nodes connecting two otherwise disconnected subgraphs G 1 and G 2 of the network: If i is in the susceptible state of the SIR model, the infection has not yet passed through this node. Hence, the infection processes in the subgraphs G 1 and G 2 , that are connected solely through i, are independent of each other, see, e.g., Kiss et al. [73]. This result in particular applies to tree graphs, where, by definition, all nodes with degree higher than one are cut-vertices and all triplets are open with B j 2 = S j 2 . For tree networks, the SIR pair-based model is thus exact.

Appendix E Estimation of (cyber) epidemic network models
Statistical estimation relies first on an underlying statistical model that specifies a range of probabilistic mechanisms that might have generated the data, and second on the observable data. Both components, the model framework and the available data, define the statistical challenge that needs to be addressed. We briefly review some work that focuses on inference for SIS, SIR, or related models. In all cases, the resulting propagation process is simply denoted by (X (t)) t≥0 , although we consider different models. The specification of the interaction between entities in the underlying probabilistic mechanisms in the statistical model can be interpreted as a graph G in this framework. The graph G may simply be encoded by an adjacency matrix in some models; in other, heterogeneous models, it might be described as a weighted graph corresponding to a matrix with entries different from 0 and 1 that encodes the interaction in the underlying statistical model. In the context of statistical inference, some parameters of the interaction dynamics are unknown, such as overall infection and recovery rates, but in some problems the interaction graph G might still be known a priori, while in others the graph must be inferred on the basis of available data. We classify the estimation approaches for (cyber) epidemic network models of SIS-, SIR-or related type roughly in the following way: First, we distinguish if on the level of the underlying statistical model the interaction graph G is known; second, on the level of the data, we differentiate two situations, i.e., the realization of the infection process X might be directly observable or, alternatively, only some related data might be observable, while the spread process itself is hidden. We refer to Sects. 3.2.1 and 3.2.2 for background on spread models. In this section, we provide a brief review of some papers that belong to the following possible categories: −− G unknown & (X (t)) ≥0 not directly observable: In a cyber epidemic network context, Antonio et al. [3] propose a graph mining approach in a generalized SIS network model (in which infection rates are heterogeneous and self-infection is possible) where the process X is not directly observable, but only auxiliary communication data is available. A filtering mechanism is applied that deletes low-weighted edges below a minimum weight threshold. However, the model is not yet calibrated with real-world data. For readers interested in more general (inverse) problems, we refer to the book by Kolaczyk [75]. ++ G known & (X (t)) t≥0 observable: If G is known and the realization of X is observable, the statistical problem boils down to inference of the epidemic parameters τ and γ of the epidemic spread model in the case of a Markovian SIR network model. This is discussed in Sect. 6.1 of Britton [16]; the estimation can be implemented using a maximum-likelihood approach. +(−) G known & only terminal individual information on (X (t)) t≥0 observable: Britton [16], Sect. 6.1., discusses the case that G possesses fully connected subgraphs in the case of a Markovian SIR network model, i.e., a so-called household structure. In this case, a maximum-likelihood approach to esti-mate the epidemic parameters is still feasible, if only the realization of X at a final date is observable, but not its whole evolution. However, the author emphasizes that, without making this specific structural assumption, epidemic parameter estimation is not straight-forward for arbitrary known graphs, if e.g. only the realization of the final number of infections is observable. One approach to overcome this difficulty could thus be to gather some additional time-dependent spread data. (+)(+) G unknown, but network model class fixed & only individual recovery information on (X (t)) t≥0 observable: Another example are random graph models. For example, Britton [17] develop a Bayesian approach in a Markovian SIR model to estimate the epidemic parameters τ and γ jointly with the connection probability p in Erdős-Rényi type networks, if the spread process (X (t)) t≥0 or only the individual recovery processes (R i (t)) t≥0 = (1 X i (t)=R ) t≥0 , i = 1, . . . , N , are observable (see also [59] for a generalization to the SEIR model and Gammadistributed infection arrival times). Samples from the posterior distribution can be generated using MCMC methods. (−)(−) G unknown, but set of possible network model classes fixed & only aggregate infection information on (X (t)) t≥0 observable: Often the individual time-dependent spread data is not observable, but only the evolution of the aggregate number of infections over time is known. To overcome this issue, for example, in a Markovian SIS framework, Lauro et al. [78] suggest a so-called birth-death process approximation (see also [117] for an extension of this approach to the question of forecasting an ongoing epidemic). Such birth-death processes keep track of the number of infected nodes at population level and thus present an approximation of the original Markovian spread processes in a reduced dimension. Lauro et al. [78] provide a Bayesian estimation framework in which the epidemic and network parameters for certain random network classes can jointly be estimated; in particular, the method is able to identify the most likely network class out of a regular, Erdős-Rényi, or Barabási-Albert model. −+ G unknown & (X (t)) ≥0 observable: In the previously discussed approaches with an observable epidemic process, the network G is (partially) known-at least it belongs to a set of random network classes. How can one proceed if no prior information is available on the network on the level of the statistical model, but the realization of the infection spread process is observable? One suggestion is a cascade approach in (possibly non-Markovian) SI-models (also: activation/information diffusion models). The goal is to infer the network structure under the assumption that the cascades of infections are fully observable using a likelihood approach. The proposed methods in the literature mostly differ in their assumptions on the spread process. For example, Myers and Leskovec [89] and Gomez-Rodriguez et al. [58] assume homogeneous parametric infection arrival distributions (see also [57] for dynamically evolving networks), while Du et al. [36] do not impose distributional assumptions, but propose a kernel estimation method to estimate the network structure.
Our overview is not meant to be exhaustive, but intends to highlight different perspectives possibly implied by the structure of a specific application. We also refer to the surveys Brugere et al. [18] or Kolaczyk and Csárdi [76]. The current literature on (epidemic) network estimation is fragmented with each approach tackling only a specific problem at a time. A unifying methodology does not yet exist-and is maybe also not realistic to expect. Many questions remain open, see, e.g., the discussion in Britton [16]. Statistics for (cyber) epidemic network models will most likely continue to be a very active field of research in the future.